Liquid AI releases LFM2.5-8B-A1B, an open-weight, device-optimized Mixture-of-Experts model with 1.5 billion active parameters

VIEWS49.1KBOOKMARKS513LIKES649RETWEETS55REPLIES29

Lotto@LottoLabs

A very cool model for the GPU poor bros

Trained on an ungodly amount of tokens for a 8b a1b model

Gonna be super fast excited to try this out

https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF

27d49.1K649513

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

small models are becoming absurdly powerful This + the latest MiniCPM + Zyphra's ZAYA1 are very impressive

Liquid AI@liquidai

Today, we're releasing LFM2.5-8B-A1B, a device-optimized model designed to power real-life applications on phones, laptops, PCs, robots, and fast & lightweight server-side use-cases.

> 8B MoE, 1.5B active > Expanded 128K context > LFM2.5 flagship hybrid MoE architecture > Trained on 38T tokens + large-scale RL > fast, reliable tool calling, punching above its weight, comparable to models with up to 4x its size > customizable on a single GPU for any specialized task > LFM2 open-weight license

🧵

27d11.3K21268

Joscha Bach@Plinz

LiquidAI has built a tiny reasoning model that achieves useful performance for on device use cases with only 1B active parameters https://www.liquid.ai/blog/lfm2-5-8b-a1b

27d3.9K6113

🍉 Abubakar Abid@abidlabs

Remarkable for an 8B model! Check out the @Gradio app here: https://huggingface.co/spaces/LiquidAI/LFM2.5-8B-A1B

Liquid AI@liquidai

Today, we're releasing LFM2.5-8B-A1B, a device-optimized model designed to power real-life applications on phones, laptops, PCs, robots, and fast & lightweight server-side use-cases.

> 8B MoE, 1.5B active > Expanded 128K context > LFM2.5 flagship hybrid MoE architecture > Trained on 38T tokens + large-scale RL > fast, reliable tool calling, punching above its weight, comparable to models with up to 4x its size > customizable on a single GPU for any specialized task > LFM2 open-weight license

🧵

27d9.4K3712

Liquid AI@liquidai

LFM2.5-8B-A1B is a step change from LFM2-8B-A1B:

> Training toks: 12T → 38T > Context length: 32k → 128k > instruction following (IFEval): 79.44 → 91.84 > IFBench: 26.00 → 56.47 > Muti-IF: 58.54 → 79.93 > Tau2Telecom: 13.60 → 88.07 > BFCLv3: 45.07 → 64.36 > BFCLv4: 25.52 → 48.50

This delivers reliable agentic behavior at 8B parameters. (2/n)

27d808303

Liquid AI@liquidai

To show what that looks like in practice: LocalCowork, our open-source desktop agent, now runs on LFM2.5-8B-A1B.

> single laptop > 67 tools across 13 MCP servers > no cloud, no API keys > well under a second per dispatch

3-minute demo → https://www.loom.com/share/bc3faf8befb643baae3434cde098e95e (4/n)

27d365145

vr8vr8@vr8vr8

@liquidai Just tested it out and could say prism-ml/Bonsai-8B-gguf beats this model like 10 times. Results are so bad that it can't even generate solar system lineup... But speed is insane.

27d90575

Liquid AI@liquidai

A model build for the full agentic loop on a single machine:

> chains tool calls across complex instructions > fast dispatch loop: ask, propose, confirm, run, repeat > doubled vocabulary for non-Latin language support > LFM2.5 flagship hybrid MoE architecture

This enables a different model: a capable agent running entirely on your hardware, no API keys, no data leaving the machine. (3/n)

27d512182

ScalaWilliam!@ScalaWilliam

You are too kind! @wesleimade 's solution worked, I did this: wget https://raw.githubusercontent.com/ggml-org/llama.cpp/refs/heads/master/models/templates/LFM2-8B-A1B.jinja

and then llama-server -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q6_K -ngl 99 --host 0.0.0.0 --port 8002 -c 100000 --temp 0.1 --top-k 50 --repeat-penalty 1.05 --chat-template-file LFM2-8B-A1B.jinja

@pa_schembri the above fixes the same error you are facing. I am getting tool calls now but not uber reliable yet.

27d16224

Liquid AI@liquidai

@amaliometria this is a 1B active MoE model, we did not compare with dense models. only other MoEs which are up to 4x larger.

27d1.1K261

Liquid AI@liquidai

Day 0 support across the stack:

> Inference: llama.cpp, MLX, vLLM, SGLang > Hardware: @AMD, @intel, @Qualcomm, @nvidia, @Apple

(5/n)

27d511131

ScalaWilliam!@ScalaWilliam

@liquidai Hi ! I am trying this out with v9730 of llama.cpp: llama-server -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q6_K -ngl 99 --host 0.0.0.0 --port 8002 -c 100000 --temp 0.1 --top-k 50 --repeat-penalty 1.05 with the latest Mistral Vibe, but get no tool calls made:

27d1.5K21

Liquid AI@liquidai

LFM2.5-8B-A1B: built for boosted quality without compromising speed. > Blog: http://www.liquid.ai/blog/lfm2-5-8b-a1b > Weights: https://huggingface.co/LiquidAI/LFM2.5-8B-A1B > Docs: http://docs.liquid.ai > Playground: http://playground.liquid.ai

27d458141

Jacob Beers@JTBeers

@liquidai I just tried this out and I must say, WOW! It is crazy fast and has so far given me very good results. I'm only running a 4060 but can comfortably fit the whole thing in memory with 32k context and I'm getting 100-140 tps. Very excited to try out this model more.

27d2.6K11

amaliometria@amaliometria

@liquidai Good to compare it to Gemma 4.

Bad not to compare it to Qwen 3.6-27B🙄

27d1.3K4

Yashraj@yashrajmaher

@nathanrchn @liquidai it was a template issue l updated the default template with the one from this PR https://github.com/ggml-org/llama.cpp/pull/21242 and everything worked as it should have. I am sure, you guys will fix it soon, in the hg model repo itself.

26d8931