not entirely sure it's how nanogpt speedrun should be done but educational.

actually maybe it's just time to design new challenges: *moe (!) *intently multilingual *documented/releasable data (and mode failure) *at least partial synth with reasoning traces throughout (SYNTH/Zyphra) *benchmark targets on top of loss.
not entirely sure it's how nanogpt speedrun should be done but educational.
and maybe even an agentic subset (function calling definitely work in small range)
actually maybe it's just time to design new challenges: *moe (!) *intently multilingual *documented/releasable data (and mode failure) *at least partial synth with reasoning traces throughout (SYNTH/Zyphra) *benchmark targets on top of loss.