/AI5h ago

CoreAutoAI co-founder Rohan Anil argues ML researchers focus on incremental optimization variants instead of questioning core formulations

Lucas Beyer noted a similar pattern with ResNets and ViTs.

2319675414.2K

#55

Original post

rohan anil@_arohan_#79inAI

I don’t know what the phenomena is called:

Sometimes the field mines improvements near a local neighborhood.

Like Adam -> (badam, dadam, madam), Shampoo -> Muon -> (Duon, Buon, Luon), last few made up instead of questioning whether the original formulation itself is the right question. You get so much math explaining these variants bordering slop. Same happened with Transformers too.

Mathematically sophisticated but solving the wrong problem.

10:44 AM · Jun 8, 2026 · 10.7K Views

/AI5h ago

CoreAutoAI co-founder Rohan Anil argues ML researchers focus on incremental optimization variants instead of questioning core formulations

Lucas Beyer noted a similar pattern with ResNets and ViTs.

2319675414.2K

#55

Original post

rohan anil@_arohan_#79inAI

I don’t know what the phenomena is called:

Sometimes the field mines improvements near a local neighborhood.

Mathematically sophisticated but solving the wrong problem.

10:44 AM · Jun 8, 2026 · 10.7K Views

Sentiment

Many users dismiss local variants of Adam and Shampoo as unoriginal spam and noise produced by groupthink and publish-or-perish incentives instead of addressing base assumptions.

Pos

0.0%

Neg

100.0%

7 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS3.3KBOOKMARKS3LIKES40RETWEETS3REPLIES3

rohan anil@_arohan_

Llms have made it easy to now spam methodological improvements that are largely bordering noise.

rohan anil@_arohan_

I don’t know what the phenomena is called:

Sometimes the field mines improvements near a local neighborhood.

Mathematically sophisticated but solving the wrong problem.

5h3.3K403

Lucas Beyer (bl16)@giffmana

@_arohan_ I don't think it's new or llm related. We had the same with ResNets and later with ViTs, and those were before llms. It's just the easy research to do.

rohan anil@_arohan_

Llms have made it easy to now spam methodological improvements that are largely bordering noise.

2h63290

Lucas Nestler@Clashluke

@_arohan_ it’s difficult seeing [outside of] your box

rohan anil@_arohan_

I don’t know what the phenomena is called:

Sometimes the field mines improvements near a local neighborhood.

Mathematically sophisticated but solving the wrong problem.

2h30030

ueaj@_ueaj

@_arohan_ Thoughts on ademamix? I think it kinda missed the multiscale inductive bias but it was very close and very early

5h4112

Sachin@sachdh

@_arohan_ GRPO variants from last year will say hi

5h277

Federico Vaggi@F_Vaggi

@_arohan_ @nathancgy4 Oh man, I think I have a really good explanation for this, but it's a bit longer than a tweet. I might have to blog about this: I think it's because all components of neural network training have to work together, so it's hard to do non-local improvements.

5h248

Alex YGift@Radipdegen

@_arohan_ Wait, I think youre suggesting all those extensions are just post-hoc rationalizations pasted onto an original breakthrough?

5h106

jaimin patel@jnptl

@_arohan_ what an example!

5h85

Rugbist@rugbist_

@_arohan_ feels like naming satire just keeps becoming realer over time

not sure if we need new names or just to sit with the original ones longer

5h41

ashu@pizzacritic999

@_arohan_ "badam" (almond)

4h38

Invincible@InvincibleEdge

@_arohan_ real recognize real discovering meta-methods while ignoring base assumptions is how the loop keeps running

5h33

Lavan@ponylavan

@_arohan_ groupthink + publish or perish

5h28

M@init_malachi

@_arohan_ yes so dissatisfying but apparently everything else is market irrelevant

3h24

Zack Fitch@Jzfitch1

@_arohan_ People show up for hacks, not first principles.

3h24

Blissy@BlissyOnX

@_arohan_ greedy optimization is called local maxima. but honest question - is there a taxonomy for the "just keep changing letters" phase?

5h21

jaisel@jaiselsingh

i sometimes wonder if we're just doing a local search over research programs. once the abstraction is fixed, you get high-sophistication perturbations: Adam→variants, Shampoo→variants, Transformer→variants. i wonder if it's even optimizing inside the right model class most of the time

5h14

Logan Ford@lhford0

@_arohan_ anyone who has followed AI research closely knows that AI slop has been around a lot longer than LLMs

2h6

tiplur-bilrex@tiplur_bilrex

@_arohan_ Some relevant advice from Hamming:

5h2