Tibo Sottiaux, OpenAI Codex engineering lead, posts master plan for releasing better models, shipping weekly product updates, and securing additional compute
kache replies that the plan is the best available but improvable
@thsottiaux I'm finding that 5.5 is not very good at automated RL research. Specifically puffer lib. One of the failure cases is that it gets confused with total reward going up when reviewing experiments which have differently tuned reward scales
Our master plan is to release better and more efficient models. And also to release better products, week after week. Oh and get more compute too. Together with spending too much time on x. How good is this plan?
@thsottiaux When I say not very good I mean the best. But could be better
@thsottiaux I'm finding that 5.5 is not very good at automated RL research. Specifically puffer lib. One of the failure cases is that it gets confused with total reward going up when reviewing experiments which have differently tuned reward scales
@thsottiaux I can work around it with clever harnessing and prompting.
@thsottiaux When I say not very good I mean the best. But could be better