ueaj@_ueaj·Original post
"Attention is just a special case of <abstract math thing> so we generalized it by <neglecting the other 30 abstractions and conditions required for frontier architecture> and we found it performed <p hacking> compared to <naive baseline>"