New @datologyai work: a 4B VLM curated for concision answers correctly for 35× less compute than Qwen3.5-4B, with similar performance.
Same size, same task. The whole gap is how many tokens each model spends. 🧵
New @datologyai work: a 4B VLM curated for concision answers correctly for 35× less compute than Qwen3.5-4B, with similar performance.
Same size, same task. The whole gap is how many tokens each model spends. 🧵
No Digg Deeper questions have been answered for this story yet.
For two years we've made the same case: data is the most underinvested, highest-leverage lever in ML.
This is one more dimension of it: output length isn't a fixed property of a model, it's a property of the data it learned from.
New @datologyai work: a 4B VLM curated for concision answers correctly for 35× less compute than Qwen3.5-4B, with similar performance.
Same size, same task. The whole gap is how many tokens each model spends. 🧵
And because it's learned at training time, the saving compounds: pay once, collect on every inference the model ever runs. As inference becomes the dominant cost of AI, that's the whole game.
Paper: https://arxiv.org/abs/2606.25432
Blog: https://www.datologyai.com/blog/brevity-is-the-soul-of-inference-efficiency
For two years we've made the same case: data is the most underinvested, highest-leverage lever in ML.
This is one more dimension of it: output length isn't a fixed property of a model, it's a property of the data it learned from.
And check out @leavittron's thread here:
What if you could induce models to be more concise via pretraining data curation?