Physics-intern framework lifts Gemini 3.1 Pro to new CritPt record
David Louapre introduced Physics-intern, an agentic framework that raises Gemini 3.1 Pro performance on the CritPt benchmark from 17.7 percent to 31.4 percent, establishing a new state of the art ahead of GPT-5.5 Pro at 30.6 percent. The system decomposes research problems and routes subtasks to specialized agents that iterate through self-correction, equation derivation, and re-estimation. Emad Mostaque and Thomas Wolf amplified the post on social media.
watching a team of agents tackling a hard theoretical physics problem is quite mesmerizing - self-correcting, deriving hard equations, computing intermediate results, re-estimating the best approach
Meet physics-intern🧑🎓, our agentic framework for theoretical physics. It takes Gemini 3.1 Pro from 17.7% to 31.4% on CritPt, a new SOTA on one of the hardest benchmarks for LLMs. Theoretical physics is hard for humans and LLMs alike. But physics-intern decomposes problems and dispatches them to a team of specialized agents, solving research-level questions far more effectively than the base model alone.