Low bias high variance methods are usually not very compute efficient but they are scalable
High bias low variance methods have higher efficiency but hit limits of scaling quickly
One of those sides has had better time historically
You Jiacheng reposted the observation without added comment.
Low bias high variance methods are usually not very compute efficient but they are scalable
High bias low variance methods have higher efficiency but hit limits of scaling quickly
One of those sides has had better time historically
Users express enthusiasm for low bias high variance methods scaling better despite compute inefficiency because they equate exploration with winning and see it as the core essence of AI progress over rigid PhD or compute-heavy approaches.
No Digg Deeper questions have been answered for this story yet.

I love this post... this is the essence, the mustard that is being cut and spread on all our sandwiches. The compute vs PhD vs Grinder matrix. I am a high bias, low variance kinda guy. The problem is... I am biased to the dumbest, most optimistic convoluted moonshots, and dont want to do the other shit. Why? ive got a set pile of neurons, why waste them on the mundane concepts like learning how to read.

@MillionInt bitter is taken, would you like the Rancid Lesson™

@babdam 🤣🤣
Low bias high variance methods are usually not very compute efficient but they are scalable
High bias low variance methods have higher efficiency but hit limits of scaling quickly
One of those sides has had better time historically

@MillionInt GRPO vs GAE (lambda=1) vs GAE (lambda < 1)

@MillionInt low bias high variance methods have had better historical run, but that might just be because we had more compute to throw at them?
Not sure

@MillionInt so youre betting on the underdog that finally wins when compute gets cheap enough
whats the crossover point?

@MillionInt Haven't these terms just been high throughput and centralization for years

@MillionInt only because of the sweet sweet gigantic amount of DATA

@MillionInt bitter lesson

@MillionInt so the tradeoff is basically asymptotics vs ceiling
but the faster ceiling sounds like it would hurt more over time

@MillionInt Keep exploring == win

@MillionInt historically the low bias side keeps getting rescued by more compute
how long before we decide the bias wall is the real constraint?

@MillionInt OPD but its evolutionary strategies