it is truly amazing we can supervise a trillion parameters from 16 bits of information. the fact that credit assignment across such a massive & discontinuous network is such a non-issue deserves so much reflection
Will Depue, who worked on OpenAI's Sora, says trillion-parameter neural networks can be successfully supervised with only 16 bits of information
Charles Foster attributes this to batch and sequence dimensions.
Users note that catastrophic forgetting becomes almost a non-issue at scale for trillion-parameter models supervised with just 16 bits.
Most Activity
@willdepue @lu_sichu *batch/sequence dimension voice*
it is truly amazing we can supervise a trillion parameters from 16 bits of information. the fact that credit assignment across such a massive & discontinuous network is such a non-issue deserves so much reflection

@CFGeek @lu_sichu lol

@willdepue why do you think that is? high dimensions just dont really have that many local minima so it will "eventually work"?

@willdepue Catastrophic forgetting almost a non-issue as well at scale!

@willdepue 16 bits is a metric fuckton