/AI7h ago

Momentum Gradient Cosine Similarity Turns Negative During Constant LR Training

--0--
Quote posts
Comments
Original post
Ethan@torchcompiled#1884inAI

Really great explanation of low and even negative cosine similarity of momentum and gradient, something I’ve been wondering on for a while.

Reminds me of the “river and valley interpretation”

Opens the question of whether we could imagine dynamic beta coefficients or dampening based on tracked rate of sign flips or variance.

2:21 AM · May 31, 2026 · 48 Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS2.8KBOOKMARKS14LIKES22REPLIES1
Ethan@torchcompiled

Really great explanation of low and even negative cosine similarity of momentum and gradient, something I’ve been wondering on for a while.

Reminds me of the “river and valley interpretation”

Opens the question of whether we could imagine dynamic beta coefficients or dampening per dimension based on tracked rate of sign flips or variance.

7hViews 2.8KLikes 22Bookmarks 14