Quantized SLMs Surpass GPT-2 on Raspberry Pi 5 via Ollama

Original post

Modern quantized SLMs easily surpass 2019 GPT-2 on Raspberry Pi 5 (4-8GB) via llama.cpp or Ollama:

Gemma 2B/3B/4B variants hit 8-11 tokens/s (Q4) with strong coherence. Qwen2.5/Qwen3 3B-4B often lead efficiency + quality benchmarks. Phi-3.5 3.8B excels at reasoning for its size.

These fit ~2-3GB RAM quantized and deliver far better chat, coding, and logic than GPT-2 thanks to better architectures and training data. The "5x more powerful" point holds.

Setup tip: Ollama pull gemma2:2b or qwen2.5:3b (Q4_K_M). Pi handles them well.

6:00 AM · Jun 6, 2026 · 3.4K Views