/Tech9h ago

EPFL MLO Lab introduces Magnitude-Direction Decoupling to improve neural network training efficiency at scale

Meta's Lucas Beyer praised the paper's extensive empirical experiments.

41642011319.4K

#72

Original post

Lucas Beyer (bl16)@giffmana#72inTech

This is the natural continuation of this group's previous several works on optimization, and I really like their style.

A lot of experiments to look at all possible details. As opposed to a wall of theory and then one single experiment with untuned baselines.

Alex Hägele@haeggee

Our paper is now on arXiv: https://arxiv.org/abs/2606.25971 Besides all the details and discussions of the broader literature, it also contains lots other experiments that answer some of the questions we have already received. For example:

11:00 AM · Jun 25, 2026 · 20.3K Views

Sentiment

Many users praised the EPFL paper on magnitude-direction decoupling for emphasizing real experimental data, exhaustive ablations, and reproducibility over theoretical claims.

Pos

100.0%

Neg

0.0%

5 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

ARXIV.ORGVia

Posts from X

Most Activity

VIEWS265LIKES5REPLIES2

Filip@filipviz

@giffmana Yeah, their Rotational Equilibrium work was also fantastic - one of the few papers which feels optimized for being honest and actually helping the reader understand the work.

8h2655

BOOKMARKS1

Calc Consulting@CalcCon

@giffmana Actually, this idea is well supported by the weightwatcher theory. It is saying that is the shape, not the scale, of W that really modulates the generalization accuracy.

8h19511

RETWEETS20

Lucas Beyer (bl16)@giffmana

This is the natural continuation of this group's previous several works on optimization, and I really like their style.

A lot of experiments to look at all possible details. As opposed to a wall of theory and then one single experiment with untuned baselines.

Alex Hägele@haeggee

9h20.3K171122

Filip@filipviz

@giffmana The world if every ML paper were like this

8h351

Lucas Beyer (bl16)@giffmana

@filipviz Yep fully agree

7h1761

Strata@ChainZenit

@giffmana Refreshing to see actual data instead of just more whitepaper fluff.

9h36

Eclipse 🌖@ECLresearch

@giffmana The focus on exhaustive ablation over theoretical posturing is exactly what moves the field forward. Reproducibility starts with showing your work at every marginal gain.