/AI20h ago

Liquid LFM2.5-8B-A1B Beats OpenAI Gpt-Oss-20B On Local Tool Calling

13548288.6K

Quote posts

#1032

Reposts

#1032

Original post

Rohan Paul@rohanpaul_ai#1032inAI

atomic[.]chat (a desktop app that runs LLMs locally) ran a very revealing comparison for local AI agents, on a MacBook Pro M5 Max, 64GB.

Liquid’s much smaller LFM2.5-8B-A1B beat gpt-oss-20b by finishing every required tool call, cutting runtime by more than half, and using 4.8GB RAM instead of 11GB.

The task was not normal chat, because the model had to plan a trip by calling outside tools for 3 weather checks, 2 currency conversions, 1 email, and 1 reminder.

The striking part is that LFM2.5-8B-A1B is much smaller in active compute, yet it hit every required call at 266tok/s, while gpt-oss-20b used 11GB RAM, made only 3/7 calls, and ran at 146tok/s.

Now, tool calling is a control problem before it is a language problem.

The model has to preserve a checklist across context, decide when language should stop and action should begin, and resist the temptation to answer as if partial completion were enough.

A smaller mixture-of-experts model with only a fraction of its parameters active can win if its training shaped those control habits more sharply than a larger model’s general fluency did.

1:31 PM · May 30, 2026 · 8.6K Views

/AI20h ago

Liquid LFM2.5-8B-A1B Beats OpenAI Gpt-Oss-20B On Local Tool Calling

--0--

Quote posts

#1032

Reposts

#1032

Original post

Rohan Paul@rohanpaul_ai#1032inAI

atomic[.]chat (a desktop app that runs LLMs locally) ran a very revealing comparison for local AI agents, on a MacBook Pro M5 Max, 64GB.

Liquid’s much smaller LFM2.5-8B-A1B beat gpt-oss-20b by finishing every required tool call, cutting runtime by more than half, and using 4.8GB RAM instead of 11GB.

The task was not normal chat, because the model had to plan a trip by calling outside tools for 3 weather checks, 2 currency conversions, 1 email, and 1 reminder.

The striking part is that LFM2.5-8B-A1B is much smaller in active compute, yet it hit every required call at 266tok/s, while gpt-oss-20b used 11GB RAM, made only 3/7 calls, and ran at 146tok/s.

Now, tool calling is a control problem before it is a language problem.

The model has to preserve a checklist across context, decide when language should stop and action should begin, and resist the temptation to answer as if partial completion were enough.

A smaller mixture-of-experts model with only a fraction of its parameters active can win if its training shaped those control habits more sharply than a larger model’s general fluency did.

1:31 PM · May 30, 2026 · 8.6K Views

Sentiment

Positive users praise the Liquid LFM2.5-8B-A1B's reliable local tool calling and runtime speed, noting that disciplined routing and control habits outperform raw parameter counts in larger models.

Pos

100.0%

Neg

0.0%

7 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

RETWEETS7

Rohan Paul@rohanpaul_ai

atomic[.]chat (a desktop app that runs LLMs locally) ran a very revealing comparison for local AI agents, on a MacBook Pro M5 Max, 64GB.

Liquid’s much smaller LFM2.5-8B-A1B beat gpt-oss-20b by finishing every required tool call, cutting runtime by more than half, and using 4.8GB RAM instead of 11GB.

The task was not normal chat, because the model had to plan a trip by calling outside tools for 3 weather checks, 2 currency conversions, 1 email, and 1 reminder.

The striking part is that LFM2.5-8B-A1B is much smaller in active compute, yet it hit every required call at 266tok/s, while gpt-oss-20b used 11GB RAM, made only 3/7 calls, and ran at 146tok/s.

Now, tool calling is a control problem before it is a language problem.

The model has to preserve a checklist across context, decide when language should stop and action should begin, and resist the temptation to answer as if partial completion were enough.

A smaller mixture-of-experts model with only a fraction of its parameters active can win if its training shaped those control habits more sharply than a larger model’s general fluency did.

20h8.6K5428