staring into the abyss as models get better at modelcrafting. the abyss stares back, and the stare is the training signal
New #1 on PostTrainBench: Opus 4.8 (max reasoning) hits 37.23% — up from 28.56% for 4.7, the largest single improvement we've seen.
Fable 5 runs underway now that AI research behavior is no longer silently degraded.
PostTrainBench asks how well frontier AI can train weaker language models. That makes it one of the first benchmarks for recursive self-improvement: AI improving AI, with progress measured in the loop itself.







