@_arohan_ @bilaltwovec It means they are using something else?
Claude Fable let me implement Muon, Shampoo and K-FAC, what does this mean?
Academic Dan Roy highlighted the model's mathematical utility.
@_arohan_ @bilaltwovec It means they are using something else?
Claude Fable let me implement Muon, Shampoo and K-FAC, what does this mean?
Some users offered to collaborate on sparse autoencoders with Muon and Adam, while many others suspected Claude of silent sabotage, undisclosed modifications, or non-frontier implementations.
@_arohan_ Haha.
Claude Fable let me implement Muon, Shampoo and K-FAC, what does this mean?

@_arohan_ Does that mean it’s not frontier 😒🥲?
Claude Fable let me implement Muon, Shampoo and K-FAC, what does this mean?

@_arohan_

@_arohan_ Oof your kernel probably could have just been a PyTorch aten op call

@_arohan_ maybe their detector failed, maybe those things are not at the frontier, maybe it silently sabotaged your implementations

@_arohan_ Sin gu lari T

actively sabotaging research. I cant man... your muon shamploo super soaker sodapop works when a major company declares war on its own customers in the form of subtle sabotage. also we know, your optimizer is the best. anyone arguing is still at whatever lab they chose, and not at CA, so yours will be the best. but we got some serious problems out here on the frontlines. I fucking switched careers(lol i dont have a career in ai), and study everyday to know about my future, and be a father who can lead his kids safely into an unknown future. Fuck the money man, this is so much bigger than money now.

@_arohan_ i am writing up something on using sparse auto encoders within Muon and Adam, would be keen for you to take a look :)

@_arohan_ > LLM research question detected > PEFT gets loaded > PEFT decide to tell Joke Muon, Shampoo, and K-FAC walk into a training run.
Adam looks over and says, “Great… the second-order optimizer support group is here to tell me I’m just momentum with marketing.”

@_arohan_ It let you do particle physics, chemistry and advanced math?

@_arohan_ Doesn’t it silently degrade performance in this case instead of flagging it explicitly to the user?

@_arohan_ Try AdamW 😂 see if it asks you “who is Adam”

@_arohan_ that list reads like someone just dumped their gradient descent notebook
which one should i be most worried about understanding?

@_arohan_ its modified without notice, so no surprise

@_arohan_ something is fishy

@_arohan_

@_arohan_ Not frontier haha

@_arohan_ ngl that reads like a chaos scroll through optimization research
u forking the llm mafia or what

@_arohan_ means u let a vibe cod ER build logic brick piles until architecture calc bc like 101 way back oh what they know early era 47 is there|they learning heavy optimization via formal gradient align align
Academic Dan Roy highlighted the model's mathematical utility.
@_arohan_ @bilaltwovec It means they are using something else?
Claude Fable let me implement Muon, Shampoo and K-FAC, what does this mean?