Original post
Ravid Shwartz Ziv#612
Sunny Sanyal@SunnySanyal9
Pre-training folks 馃憖
This is a super interesting observation from the MAI technical report:
Randomly initialized attention naturally behaves like uniform averaging (i.e., the attention matrix is approximately rank-1). They suggested a surprisingly simple training trick.
Feeling confused? Good. Keep reading 馃У
7:48 PM 路 Jun 2, 2026 路 6.2K Views