This matches my recent experience with AI agents as research assistants. Amazing at coding, sub-good-masters-student at navigating the space of ideas and updating based on experimental results.
i gave codex a /goal to improve an ml training pipeline i am working on while i went for a hike.
during the hike i had an idea. which i came back and (codex) implemented and it worked to bump things up a bit.
in the meantime /goal spent $400 on modal and a lot of tokens to achieve nothing. i went through the ideas it had come up with and they were decent generic ml ideas (eg try this normalization) but terrible for the thing i was working on.
so… coding assistant? very good. even jr researcher? not yet.




