19h ago

Function Gemma Runs 2,000 Tokens Per Second On Pixel 7 With 270M Parameters

0
Original post

TLMs: Tiny LLMs and Agents on Edge Devices with @cormacb https://www.youtube.com/watch?v=-TiET_K-E_g Function Gemma ships at 270 million parameters and runs nearly 2,000 tokens per second prefill on a Pixel 7. Out of the box, it hits 46% accuracy on a fixed set of app intents. Fine tune on a synthetically generated dataset and that clears 90% on eight of ten functions. Cormac walks through the two paths developers have for on device AI: a skill harness built on Gemma 4 with a restaurant roulette demo running fully on device. Then Eloquent, a production transcription app built by chaining two sub billion parameter models together. cc @osanseviero

3:42 PM · May 21, 2026 View on X