some thoughts on when ai builds itself
1) anthropic put out a piece on recursive self-improvement
2) for those that have been following ai progress, there isn't much new in this report
3) if you have seen the metr graph, you know we've seen rapid progress over the last year in coding agents
4) there is some internal information that anthropic provided, which is new but hard to interpret without additional information that anthropic doesn't give us
5) anthropic engineers are shipping 8x as much code as they were before claude code; but we don't know how to translate that into ai progress
5) mythos can optimize the training code for a small model much faster and more extensively than a human researcher can; but what does this mean for the frontier
6) given a sample of just problems where researchers made the wrong decision, a claude judge preferred mythos's next step 64% of the time; but apparently sonnet 4 was preferred 50% of the time
7) so, anthropic withholds the information that would really be useful for assessing each of these new datapoints; they read almost like marketing
8) i dislike how the tone of the piece is very "be worried, be scared" but they do not give us datapoints that would really tell us more about the pace of progress
9) i think that if you actually take this risk seriously and want other people to take it seriously, it is incumbent on you to do some amount of disclosure;
10) some things they could have given us:
10a) in 2025/2026, how fast has algorithmic progress accelerated in pretraining, measured in effective compute on pretraining loss
10b) in 2025/2026, how fast has algorithmic progress accelerated in post-training, measured on their internal benchmarks across a range of tasks
10c) what percentage of the large-scale, mid-scale and small-scale improvements needed to go from opus 4 to mythos, which are not in the training data, can be found independently by mythos
10d) since mythos was released, what percentage of large-scale and mid-scale improvements discovered at anthropic should be primary attributed to mythos
11) without this kind of information, anthropic has given us nothing new on the rate-of-progress question
12) they also suggest a pause; but, i find pause arguments unconvincing; the whole posture from anthropic seems a mix of unserious and performative
13) i don't like to read vague statements from parties that say i should be *very concerned* but then won't disclose anything significant;