7d ago

Physics-intern framework lifts Gemini 3.1 Pro to new CritPt record

David Louapre introduced Physics-intern, an agentic framework that raises Gemini 3.1 Pro performance on the CritPt benchmark from 17.7 percent to 31.4 percent, establishing a new state of the art ahead of GPT-5.5 Pro at 30.6 percent. The system decomposes research problems and routes subtasks to specialized agents that iterate through self-correction, equation derivation, and re-estimation. Emad Mostaque and Thomas Wolf amplified the post on social media.

0
Original post

Meet physics-intern🧑‍🎓, our agentic framework for theoretical physics. It takes Gemini 3.1 Pro from 17.7% to 31.4% on CritPt, a new SOTA on one of the hardest benchmarks for LLMs. Theoretical physics is hard for humans and LLMs alike. But physics-intern decomposes problems and dispatches them to a team of specialized agents, solving research-level questions far more effectively than the base model alone.

8:09 AM · May 12, 2026 View on X
Reposted by

watching a team of agents tackling a hard theoretical physics problem is quite mesmerizing - self-correcting, deriving hard equations, computing intermediate results, re-estimating the best approach

David LouapreDavid Louapre@dlouapre

Meet physics-intern🧑‍🎓, our agentic framework for theoretical physics. It takes Gemini 3.1 Pro from 17.7% to 31.4% on CritPt, a new SOTA on one of the hardest benchmarks for LLMs. Theoretical physics is hard for humans and LLMs alike. But physics-intern decomposes problems and dispatches them to a team of specialized agents, solving research-level questions far more effectively than the base model alone.

3:09 PM · May 12, 2026 · 57.9K Views
5:02 PM · May 13, 2026 · 31.6K Views