17h agoSpecBench Launches As First Benchmark For Reward Hacking In Coding Agents——0——Original postTR#62@_ROCKTOPWAWeco AI|@WECOAIIntroducing SpecBench: the first benchmark for measuring reward hacking in long-horizon coding agents. Key finding: reward hacking is driven not by test coverage, but by the gap between task difficulty and model capability: 🧵(1/8)9:45 AM · May 21, 2026 View on X