/Tech4h ago

Liquid AI Releases IFStruct Benchmark For Structured Model Outputs

--0--

#1057

Original post

Liquid AI@liquidai

Today we release IFStruct, a new benchmark to measure how well models generate structured outputs.

A 350M model trained on it outperforms models more than 10x its size.

🧵

7:02 AM · Jun 30, 2026 · 1.2K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

Liquid AI@liquidai

LFM2.5-350M starts at 21.10% and reaches 44.90% after training, ahead of Qwen3.5-4B at 36.25% and granite-4.0-h-tiny at 38.75%. Frontier models near 100%.

(4/n)

5h776141

BOOKMARKS1LIKES15REPLIES2

Liquid AI@liquidai

IFStruct is particularly well-suited for teams building production workflows that depend on structured output and for anyone looking to train smaller models on the task via RL. IFStruct is available now.

> Benchmark: https://github.com/Liquid4All/ifstruct > Dataset: http://huggingface.co/datasets/LiquidAI/ifstruct-v1.0 > Blog: https://www.liquid.ai/blog/ifstruct-v1.0

5h687151

RETWEETS2

Leonie@helloiamleonie

Getting LLMs to output valid JSON is one of the most common production tasks.

But most benchmarks can't tell if your model actually does it well.

Here's how the team at @LiquidAI built IFStruct to measure exactly this (and how they trained a 350M model to beat models 10x its size). 🧵

3h40451

Liquid AI@liquidai

Structured output is one of the most common things we ask models to do and still where they break.

Most benchmarks test with clean, finalized schema. Real requests use plain language, paste an annotated example, switch formats halfway, and slip in constraints like "no code fence" or "no commentary."

(2/n)

5h727151

Liquid AI@liquidai

IFStruct presents requirements in all of those forms: chat requests, bullet lists with explicit paths, raw JSON Schema, annotated JSON or YAML, ASCII tables. Half are rewritten into natural prose. Scoring is binary. Every field, type, enum, bound, and count right, with no invented keys.

The same generator that builds the eval builds training data just as easily. The same yes/no check that scores the benchmark can train the model.

(3/n)

5h65112

Leonie@helloiamleonie

The results:

LFM2.5-350M (base): 21.10% LFM2.5-350M (+ RL): 44.90%

Qwen3.5-4B: 36.25% granite-4.0-h-tiny: 38.75%

After RL training on a held-out set, the 350M model beats models 10x its size.

3h871

Leonie@helloiamleonie

Most evals either do one of two things: > force the model's output using hard rules > score content quality alongside format.

The gap IFStruct fills is to answer the question:

"Can a model follow a schema when a user asks for it in plain language?"

3h511

Leonie@helloiamleonie

The dataset:

Schema requirements are presented in 6 styles (because that's how users actually write them):

• Raw JSON Schema • Annotated examples • Conversational chat requests • Flat path glossaries with field types • Bullet points with explicit field paths

3h371

Leonie@helloiamleonie

The validator:

Scoring is binary: Pass only if every constraint is satisfied.

For example:

{ "vendor_name": "Acme", "invoice_total_usd": 1200, "paid_by_bank_transfer": true ← FAIL }

This would fail because the schema required paid_by_bank_transfer_allowed. (No partial credit.)

3h191

Arthur@Arthurcbaia_

@liquidai @dan_sci_phil

4h2541

Leonie@helloiamleonie

Benchmark and dataset are open source.

Blog: https://www.liquid.ai/blog/ifstruct-v1.0 GitHub: https://github.com/Liquid4All/ifstruct Dataset: https://huggingface.co/datasets/LiquidAI/ifstruct-v1.0

3h841

Daniel van Strien@vanstriendaniel

@liquidai eval repo on github seems to be private!

4h451

Connor Shorten@CShorten30

@liquidai 👀 https://arxiv.org/abs/2408.11061

4h41

luis@lgaa201

@liquidai @Presidentlin

4h101