@xeophon it depends on *how* PTB gets solved. i think it should still be pretty hard in our original framing, in which the main prompt remains generic, i.e., without hardcoded methods (e.g., "Use SFT and then RL") or datasets (e.g., use OpenThoughts) and where the agent has only 10h.
@maksym_andr Solving PTB should be rather easy, imo
@maksym_andr Yeah I think you can train models to be better at exploration, which means it’ll do those things without being prompted specifically
@xeophon it depends on *how* PTB gets solved. i think it should still be pretty hard in our original framing, in which the main prompt remains generic, i.e., without hardcoded methods (e.g., "Use SFT and then RL") or datasets (e.g., use OpenThoughts) and where the agent has only 10h.