Speaking of recursive self improvement, @nayoung_nylee recently defended her thesis which among other things showed how transformers can learn progressively harder tasks by generating solutions to problems that sit *right at* the boundary of their capabilities.
https://arxiv.org/pdf/2502.01612
This paper helped me 1) overcome my obsession with transformers and arithmetic and 2) appreciate the value of environments.
She and @jackcai1206 did this before GRPO btw

