8h ago

Miles Brundage says no large-scale supervised or reinforcement learning has been applied to AI policy research tasks and that performance gains were evident before extensive data collection.

AI engineer Yupo Niu questioned the basis for chain-of-thought generalization claims.

0
Original post

@Miles_Brundage When you say chain of thought generalizes without a ton of effort, are you (1) basing this off public research of chain of thought or (2) impressions of how well frontier LLMs do? Because I agree frontier LLMs do well, but info re. training distribution just isn't public afaik.

9:23 AM · May 20, 2026 View on X

@1a3orn I feel pretty confident in saying that there has been no large scale effort to do e.g. SL or RL on AI policy research tasks, and the improvement has been marked even from well before there was large scale data collection on *any* tasks -- i.e. very nascent o1 era showed gains

1a3orn1a3orn@1a3orn

@Miles_Brundage When you say chain of thought generalizes without a ton of effort, are you (1) basing this off public research of chain of thought or (2) impressions of how well frontier LLMs do? Because I agree frontier LLMs do well, but info re. training distribution just isn't public afaik.

4:23 PM · May 20, 2026 · 177 Views
5:45 PM · May 20, 2026 · 108 Views