In agentic pipelines, one LLM can play many roles: writer, reviewer, judge, editor, and more. Does it use the same understanding of a concept across them?
In our new ICML paper, we measure conceptual consistency and test whether more consistent models make fewer mistakes.
