Users praise Claude Opus 4.6 for leading the TERMS-Bench and advancing verifier-based LLM negotiation evaluation plus AI progress, but others doubt the benchmark's reliability over odd rankings and hallucination concerns.
11 comments with sentiment.
TERMS-Bench Ranks Claude Opus 4.6 First in LLM Economic Negotiations · Digg