TLMs: Tiny LLMs and Agents on Edge Devices with @cormacb
https://www.youtube.com/watch?v=-TiET_K-E_g
Function Gemma ships at 270 million parameters and runs nearly 2,000 tokens per second prefill on a Pixel 7. Out of the box, it hits 46% accuracy on a fixed set of app intents. Fine tune on a synthetically generated dataset and that clears 90% on eight of ten functions.
Cormac walks through the two paths developers have for on device AI: a skill harness built on Gemma 4 with a restaurant roulette demo running fully on device. Then Eloquent, a production transcription app built by chaining two sub billion parameter models together.
cc @osanseviero