3h ago

New Research Uncovers Mode-Hopping Dynamics In Language Model Pre-Training

0
Original post

On benchmarks with an obvious (but wrong) shortcut, this very cool blog shows models hop between generalizing and using shortcuts. I was curious what you see across compute optimal models. Interesting case of U-shaped scaling when ICL examples have a successive answer pattern!

12:57 PM · May 27, 2026 View on X