Aran Komatsuzaki, GPT-J co-leader, says running Codex on complex math problems shows parallel agents fail to scale
Lewis Tunstall suggested using specialized subagents with limited contexts
@arankomatsuzaki Not sure if you are already doing this, but you can get much better performance on physics problems by spawning subagents with specific roles and limited context
We released physics-intern: a simple harness for science problems! It gets models like Gemini 3.1 Pro to go from 17.7 -> 31.4, thus beating GPT 5.5 Pro. The physics-intern harness can wrap any model and via dedicated subagent boost the performance of the vanilla reasoning models. While I think more and more of these harness capability gains will be absorbed into the models (like prompting tricks disappeared over time) there is a lot to be gained right now by building good scaffolds for those models and integrating tools well. Interestingly, the exception we found that GPT 5.5 Pro actually didn't benefit from the physics-intern harness! Read more about it here: https://huggingface.co/spaces/huggingface/physics-intern PS: I think the Harness[Model] notation is kind of nice.