/Tech1h ago

Analyst Reverses Stance, Credits Anthropic With Deep LLM Circuitry Science

1614149510.7K

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#501inTech

Upon more interactions with Fable, I need to issue a Mea Culpa This is provisional, but I think true: I have been completely wrong about Anthropic. They had me – and everyone - well and truly fooled. I failed to grasp the profundity of Dario's "scale". Anthropic is a lab of *scientists*. Proverbial triple Ph.Ds, Mahattan Project material. Safety/steerability focus served as a good recruitment pipeline, unifying ethos, and a smokescreen. In the meantime, publishing neat visualizations and results of curious safety-framed experiments, they must have developed a proper *science* of LLM circuitry, the missing layer between optimization theory, academic math validated on toy models – and downstream humanlike behaviors on the frontier. Us plebs outside think in these petty terms of "1-3-10-100T models" and GPU arsenals, only aware of crude undergrad tier problems like distributed training implementations, exploding gradients, loss spikes, router collapse and so on, entirely ignorant of how artificial intelligences develop at larger scales… or really at any scale. We have some alchemical, witch doctor understanding of data mixing and "quality", and even buy the copes that the era of post-training is over, or that Anthropic's real advantage is just investment into data, Amanda Askell's constitution, the commercial focus on agentic coding. "What are you scaling?", asks Ilya. How about you scale your understanding of what the fuck you're trying to do? How about you try to get out of this lame attractor of ever more exact memorization of ever greater volume of data slathered onto ever-expanding blanket of weights, hopelessly asymptotically approaching flawless mediocrity? …No, I don't believe that OpenAI's pretraining team is a shitshow. They must be about as good as Google and top Chinese labs. They have great infra, they have the hardware, they can definitely train a "15T model" if they put their minds to it. Except that's not enough. And that likely puts a cap on how far "post-training" can go. If post-training is even a necessary category when you do your *training* right, in the limit.

I have always been saying that mechanistic interpretability is dual use, and can advance capabilities; doomers also thought this way; somehow, it didn't have an impact on the discourse, or even on my thinking of the competitive landscape. I failed to extrapolate. If Chris Olah's research program had quietly advanced to the level of physics, chemistry or even biology of multi-layer computational organisms, just as it was intended to – then Dario holds the commanding heights in the foreseeable future. When ByteDance tries to even think in these terms, it looks ludicrous, a fever dream or pretentious LLM slop. But seriously – do you believe that we were going to build AGI, or ASI, with our rules of thumb, muh "'lots' of 'good' 'diverse' data", with this dumb piling of chairs?

I don't want to doompost. GPT-4 seemed like an unreachable standard as well. Capabilities diffuse; what was done once has, historically, usually been replicated on a shorter timeline; lots of smart people are working on it. Anthropic was just early. Maybe it's not too late. But boy, if I'm right, were they early.

Zephyr@zephyr_z9

"OpenAI will leapfrog Anthropic with their 15-20T model" 🤣🤣🤣🤣 OpenAI's pretraining team is a complete shitshow All the good pretraining people are at Anthropic

8:47 PM · Jun 12, 2026 · 493 Views

Sentiment

Users approve of the timing of a critic reversing stance to praise Anthropic's deep LLM circuitry science.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS7.8KBOOKMARKS84LIKES115RETWEETS4REPLIES11

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Upon more interactions with Fable, I need to issue a Mea Culpa This is provisional, but I think true: I have been completely wrong about Anthropic. They had me – and everyone - well and truly fooled. I failed to grasp the profundity of Dario's "scale". Anthropic is a lab of *scientists*. Proverbial triple Ph.Ds, Mahattan Project material. Safety/steerability focus served as a good recruitment pipeline, unifying ethos, and a smokescreen. In the meantime, publishing neat visualizations and results of curious safety-framed experiments, they must have developed a proper *science* of LLM circuitry, the missing layer between optimization theory, academic math validated on toy models – and downstream humanlike behaviors on the frontier. Us plebs outside think in these petty terms of "1-3-10-100T models" and GPU arsenals, only aware of crude undergrad tier problems like distributed training implementations, exploding gradients, loss spikes, router collapse and so on, entirely ignorant of how artificial intelligences develop at larger scales… or really at any scale. We have some alchemical, witch doctor understanding of data mixing and "quality", and even buy the copes that the era of pre-training is over, or that Anthropic's real advantage is just investment into data, Amanda Askell's constitution, the commercial focus on agentic coding. "What are you scaling?", asks Ilya. How about you scale your understanding of what the fuck you're trying to do? How about you try to get out of this lame attractor of ever more exact memorization of ever greater volume of data slathered onto ever-expanding blanket of weights, hopelessly asymptotically approaching flawless mediocrity? …No, I don't believe that OpenAI's pretraining team is a shitshow. They must be about as good as Google and top Chinese labs. They have great infra, they have the hardware, they can definitely train a "15T model" if they put their minds to it. Except that's not enough. And that likely puts a cap on how far "post-training" can go. If post-training is even a necessary category when you do your *training* right, in the limit.

Zephyr@zephyr_z9

"OpenAI will leapfrog Anthropic with their 15-20T model" 🤣🤣🤣🤣 OpenAI's pretraining team is a complete shitshow All the good pretraining people are at Anthropic

1h7.8K11584

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

…nevermind

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Upon more interactions with Fable, I need to issue a Mea Culpa This is provisional, but I think true: I have been completely wrong about Anthropic. They had me – and everyone - well and truly fooled. I failed to grasp the profundity of Dario's "scale". Anthropic is a lab of *scientists*. Proverbial triple Ph.Ds, Mahattan Project material. Safety/steerability focus served as a good recruitment pipeline, unifying ethos, and a smokescreen. In the meantime, publishing neat visualizations and results of curious safety-framed experiments, they must have developed a proper *science* of LLM circuitry, the missing layer between optimization theory, academic math validated on toy models – and downstream humanlike behaviors on the frontier. Us plebs outside think in these petty terms of "1-3-10-100T models" and GPU arsenals, only aware of crude undergrad tier problems like distributed training implementations, exploding gradients, loss spikes, router collapse and so on, entirely ignorant of how artificial intelligences develop at larger scales… or really at any scale. We have some alchemical, witch doctor understanding of data mixing and "quality", and even buy the copes that the era of pre-training is over, or that Anthropic's real advantage is just investment into data, Amanda Askell's constitution, the commercial focus on agentic coding. "What are you scaling?", asks Ilya. How about you scale your understanding of what the fuck you're trying to do? How about you try to get out of this lame attractor of ever more exact memorization of ever greater volume of data slathered onto ever-expanding blanket of weights, hopelessly asymptotically approaching flawless mediocrity? …No, I don't believe that OpenAI's pretraining team is a shitshow. They must be about as good as Google and top Chinese labs. They have great infra, they have the hardware, they can definitely train a "15T model" if they put their minds to it. Except that's not enough. And that likely puts a cap on how far "post-training" can go. If post-training is even a necessary category when you do your *training* right, in the limit.

1h1.4K151

interstice@an_interstice

@teortaxesTex I kinda doubt that developing a *real* science of DL is something that can be done inside a lab(for now...) no matter how smart the staff. Likely the same kinds of "undergrad" understanding as everywhere else, just done better

1h44

The Last Mensch@the_last_mensch

@teortaxesTex This is a truly terrifying thought. It would also represent the equivalent of a paradigm shift in neuroscience and connectomics. The model is so tasteful it might be true tho.

1h611

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@an_interstice There are levels to it they may be at physics 101, which would still put them years ahead of everyone doing "tasteful alchemy". I don't know for certain about a single great non-Anthropic model >1T. Presumably GPT is much larger, but it's not clear what that buys. Gemini… lol

1h27

Moonlit Monkey@MoonlitMonkey69

@teortaxesTex Great timing.

1h11

namethatistotallyreallycool@Quandalina

@teortaxesTex odd…?

1h6

hardin@anacreonte_

@teortaxesTex Do you still believe this?