NEW paper worth reading.
A full agentic workflow can be distilled into model weights and run at roughly 100x lower inference cost while preserving near-frontier task quality.
The workflow includes multi-step LLM calls, tool invocations, intermediate scratchpads, and decision structure.
Instead of expressing all of that at runtime through a framework, the paper amortizes the behavior into a compiled model through targeted distillation.
This is the strongest economic argument for agent compilation so far. Runtime loops are flexible, but expensive. Compiled workflows trade some flexibility for a massive inference-cost reduction.
Paper: https://arxiv.org/abs/2605.22502
Learn to build effective AI agents in our academy: https://academy.dair.ai/