Liquid AI releases LFM2.5-230M, a 230-million-parameter open-weight model designed for edge deployment on CPUs and NPUs

Original post

Wow!

Liquid AI just released LFM2.5-230M, their smallest and most efficient model yet. This 230 million parameter powerhouse is built for speed and real-world use on phones, robots, Raspberry Pi devices, and other edge hardware.

We love how it pushes the boundaries of what tiny models can achieve.

The numbers speak for themselves. LFM2.5-230M delivers up to 213 tokens per second decode speed on a Galaxy S25 Ultra CPU. On a Raspberry Pi 5, it hits 42 tokens per second.

These speeds make it one of the fastest options in its class while using the smallest memory footprint.

It outperforms many models more than twice its size on key tasks like instruction following, data extraction, and tool use. This efficiency comes from its LFM2 architecture, pre-training on 19 trillion tokens, and distillation from the larger 350M model.

The result is a compact model with a 32K context window that feels much smarter than its size suggests.

We especially like this model for practical agentic applications. Liquid AI demonstrated it running entirely on-device on a Unitree G1 robot powered by an NVIDIA Jetson Orin.

The model takes natural language instructions and turns them into structured multi-step plans with tool calls. Imagine robots, home automation, or phone-based agents that work offline, privately, and instantly.

It shines in large-scale data extraction pipelines and lightweight agentic tasks. Whether you need fast summarization, structured output, or reliable tool calling,

LFM2.5-230M delivers without relying on the cloud.

The model supports a wide range of frameworks right out of the box: •llama.cpp (GGUF) for edge devices •MLX for Apple Silicon •vLLM and SGLang for GPU serving •ONNX for cross-platform compatibility

Both the instruct version (LFM2.5-230M) and base version (LFM2.5-230M-Base) are available now.

Liquid AI reminds us that intelligence density matters most. A 230M model that runs this fast and performs this well opens the door to truly ubiquitous AI.

No more cloud dependency for many everyday tasks. Faster, cheaper, and more private experiences become possible on hardware people already own.

It is a big win for developers building on-device applications, robotics teams, and anyone who values speed and privacy.

Hugging Face.https://www.liquid.ai/blog/lfm2-5-230m

10:14 AM · Jun 25, 2026 · 7.3K Views

VIEWS9.9KLIKES116

Liquid AI@liquidai

As an early look at ongoing work, we deployed LFM2.5-230M on a Unitree G1, running entirely on-device on its onboard @nvidia Jetson Orin.

The model acts as a skill-selection layer, taking in natural-language instructions and decomposing them into sequences of tool calls.

After a quick fine-tune, "Hold still for 2s, walk forward at 1 m/s for 3 m, hold a one-leg kneel for 5s, walk back at 0.5 m/s for 3 m" becomes a structured multi-step plan automatically.

(3/n)

12h9.9K11612

BOOKMARKS17

Liquid AI@liquidai

It is especially well-suited for large-scale data extraction pipelines and lightweight on-device agentic workloads on phones, robots, home & network automation devices. LFM2.5-230M and LFM2.5-230M-Base are available now.

> Blog post: https://www.liquid.ai/blog/lfm2-5-230m > LFM2.5-230M: https://huggingface.co/LiquidAI/LFM2.5-230M > Docs: http://docs.liquid.ai

12h3.1K5317

RETWEETS126

Liquid AI@liquidai

Introducing LFM2.5-230M: our smallest model yet, built to run fast anywhere (CPUs, NPUs, and GPUs) to enable agentic tasks on phones, robots, home and network automation devices.

> 230M parameters, built on the LFM2 architecture > Pre-trained on 19T tokens, with a 32K context extension > Post-trained with distillation from LFM2.5-350M > 213 tok/s decode speed on Galaxy S25 Ultra (CPU) > 42 tok/s on a Raspberry Pi 5 (CPU) > Competes with and often beats models more than twice its size on instruction following, data extraction, and tool use. > use it for large-scale data extraction pipelines or lightweight on-device agentic workloads.

🧵

12h121.4K1.3K675

REPLIES4

The Son of Vinci@TheSonOfVinci

@liquidai anyone that use these small models actually? genuinely asking

12h1.5K5

Liquid AI@liquidai

CPU Performance: LFM2.5-230M is considerably faster than similar-sized attention-based and hybrid models. On a @raspberry Pi 5 and a @Qualcomm Snapdragon Gen4 (@Samsung Galaxy S25 Ultra), it delivers the highest prefill and decode throughput in its class while keeping the smallest memory footprint. It is available today on all platforms:

> llama.cpp (GGUF) for edge > MLX for Apple Silicon > vLLM and SGLang for GPU serving > ONNX for cross-platform

(2/n)

12h5.5K667

Liquid AI@liquidai

GPU Performance: For production-grade enterprise deployments, we have also developed an internal GPU inference stack that delivers extremely low-latency serving. We benchmark it against other small models running on SGLang, and across all concurrency levels, LFM2.5-230M achieves considerably lower end-to-end latency.

(4/n)

12h3.2K402

Tom Turney@no_stp_on_snek

@liquidai link for those that don't want to dig https://huggingface.co/LiquidAI/LFM2.5-230M

10h16493

RasputinKaiser@RasputinKaiser

@TheSonOfVinci @liquidai I use Gemma 4 E2B on my iPhone 15 Pro Max, not as small as Liquid etc, but it’s useful for when you run out of a subscription or you’re trying to do local testing.

I have it set up to my camera to analyze things

9h16711

asatoucan@asatoucan

@TheSonOfVinci @liquidai me! focusing research on practical small models for edge computing. it won't replace hundreds of billion param for sure, but it has its own use cases.

11h2419

Javier@javi_22_dev

@liquidai Lol. 19T for 230M is like a gazillion times the Chichilla ratio 😂 Congrats

11h8156

Trust⭕️@intuition_trust

@liquidai llama-cli -hf LiquidAI/LFM2.5-230M-GGUF

10h22531

Rob@vRobM

@BrianRoemmele That is what the World models are proving.. specialize and be good at xWorld.

Let's see what it does in LM Studio

5h371

Sᴏʀʙᴜs 🌊@sorbusCobPhiil

@liquidai Summarising the whole Newcomb Wikipedia article in Spanish gives more than 90 t/s on A15 Bionic chip in Q6_K with LlamaCPP.

Really impressive multilingual performance in a very tiny model.

11h6406

Mike@_cosmocrator_

@liquidai @nvidia Well done! Currently, have small Gemma e4b MTF model on my Jetson Nano robot, maybe your 230m could do the same task but smaller footprint 🤔

12h1312