4h ago

Gemma 4 E2B runs locally on iPhone 17 Pro

149014489.7K

——0——

Google's Gemma 4 E2B model executes fully locally on the iPhone 17 Pro using MLX optimization for Apple Silicon. It delivers roughly 40 tokens per second offline with a 128K context window, reasoning capabilities, image understanding and multi-turn chat. The implementation achieves state-of-the-art coding and math performance on mobile hardware through an included thinking mode. A 13-second demonstration generates a 7-day Paris trip itinerary inside an iOS chat interface.

Original post

Rohan Paul#1032@ROHANPAUL_AI

So much possibilities for on-device small models. Here @adrgrondin is running Google’s Gemma 4 E2B on iPhone 17 Pro. ~40tk/s with MLX optimized for Apple Silicon SOTA coding & math on mobile with 128K context. Fully offline with thinking mode.

6:53 AM · May 17, 2026

Cluster engagement

20 snapshots

ORIGINAL POST

#1032Rohan Paul@ROHANPAUL_AI

So much possibilities for on-device small models.

Here @adrgrondin is running Google’s Gemma 4 E2B on iPhone 17 Pro. ~40tk/s with MLX optimized for Apple Silicon SOTA coding & math on mobile with 128K context. Fully offline with thinking mode.

1:53 PM · May 17, 2026 · 8.2K Views

ORIGINAL POST

#1598Carlos E. Perez@INTUITMACHINE

ON-DEVICE AI JUST HIT LIGHTSPEED.

Google’s Gemma 4 E2B is running **fully locally** on the iPhone 17 Pro at ~40 tokens/sec🔥

No cloud. No latency. Full reasoning + image understanding.

Watch it instantly build a complete 7-day Paris itinerary (neighborhoods, restaurants, attractions, the works) like it’s nothing.

This isn’t a demo. This is the new normal.

Huge props to @adrgrondin & @LocallyAIApp for making it real 👏

Local AI will win. Who’s ready? $QCOM👇

2:53 PM · May 17, 2026 · 1.5K Views

QUOTE POST

#1598Carlos E. Perez@INTUITMACHINE

However, are open-source models only good for chat?

5:54 PM · May 17, 2026 · 90 Views