Cognition releases FrontierCode, a coding benchmark built by open-source maintainers to evaluate models on complex software maintenance

Story Brief

Opus 4.8-medium solved 32% of tasks at a 40x speedup

Commentary on X

Highest ranked

@cognition Excellent work pulling together real tasks from 20+ OSS maintainers, defining "success" as actual mergeability (quality, scope, tests, regressions, taste). A step in right direction for advancing the field with real world production-code benchmarks.

Cognition releases FrontierCode, a coding benchmark built by open-source maintainers to evaluate models on complex software maintenance

Related Stories

Commentary on X

Digg Deeper