John (@jyangballin) talking about the wide behavioral differences between GPT and Claude on ProgramBench.
7:42 AM · Jun 1, 2026 · 1.5K Views
GPT models then generate near-final code with minimal verification.
John (@jyangballin) talking about the wide behavioral differences between GPT and Claude on ProgramBench.
@OfirPress @jyangballin (and Mythos if that's possible)
@jyangballin Full ProgramBench Q&A: https://www.youtube.com/watch?v=blxN5jYWe8U
Full benchmark at https://programbench.com/
GPT models then generate near-final code with minimal verification.
John (@jyangballin) talking about the wide behavioral differences between GPT and Claude on ProgramBench.