@teortaxesTex there is a huge psyop about generalization being something that you measure in terms of how fast you acquire skills rather than being a deep structural consequence of how the network's geometry has organized itself in response to competing optimization pressures
I didn't even bother commenting, but people seem to care yeah. Deep learning works. If you're more surprised by this than by models learning "general assistant behavior" from a finite set of User-Assistant examples, idk what to say.