3d ago

Mingqian Zheng releases CARRYONBENCH benchmark for LLM clarification

0

Mingqian Zheng released CARRYONBENCH, an interactive benchmark of 5,970 simulated conversations across 14 models. It measures whether large language models revise initial refusals of ambiguous but benign queries once users clarify intent. Single-turn fulfillment rates after clarification range from 10.5 percent to 37.6 percent. The evaluation surfaces recurring failure modes including utility lock-in that blocks recovery and unsafe revisions that weaken the original refusal.

Original post

LLMs refuse ambiguous queries that look harmful but aren't. Can they recover once users clarify, while staying safe? Our new interactive multi-turn benchmark measures both. 🚨 Turns out: not both at once.

12:49 PM · May 13, 2026 View on X
Mingqian Zheng releases CARRYONBENCH benchmark for LLM clarification · Digg