Google’s Gemma 4 E2B runs fully on-device on the iPhone 17 Pro at roughly 40 tokens per second using MLX optimization, supporting a 128K context window and offline reasoning with image understanding. · Digg