Gemma 4 E2B runs locally on iPhone 17 Pro
Google's Gemma 4 E2B model executes fully locally on the iPhone 17 Pro using MLX optimization for Apple Silicon. It delivers roughly 40 tokens per second offline with a 128K context window, reasoning capabilities, image understanding and multi-turn chat. The implementation achieves state-of-the-art coding and math performance on mobile hardware through an included thinking mode. A 13-second demonstration generates a 7-day Paris trip itinerary inside an iOS chat interface.
So much possibilities for on-device small models.
Here @adrgrondin is running Google’s Gemma 4 E2B on iPhone 17 Pro. ~40tk/s with MLX optimized for Apple Silicon SOTA coding & math on mobile with 128K context. Fully offline with thinking mode.
ON-DEVICE AI JUST HIT LIGHTSPEED.
Google’s Gemma 4 E2B is running **fully locally** on the iPhone 17 Pro at ~40 tokens/sec🔥
No cloud. No latency. Full reasoning + image understanding.
Watch it instantly build a complete 7-day Paris itinerary (neighborhoods, restaurants, attractions, the works) like it’s nothing.
This isn’t a demo. This is the new normal.
Huge props to @adrgrondin & @LocallyAIApp for making it real 👏
Local AI will win. Who’s ready? $QCOM👇
However, are open-source models only good for chat?