@giffmana Begging people to do true rl from scratch instead of starting with the big language models
I mean, it's one RL step, Michael. What could it cost?
Complete training runs require hundreds of these steps.
@giffmana Begging people to do true rl from scratch instead of starting with the big language models
I mean, it's one RL step, Michael. What could it cost?
Some users dismissed RL from scratch on LLMs like Nvidia Nemotron as impractical, reacting to its reported per-step costs of $1K and full runs reaching hundreds of thousands.
No Digg Deeper questions have been answered for this story yet.

@yacineMTB @giffmana It’s like asking Claude to control cart pole directly. Like bro no

@giffmana there is always money in the gpu stand tho
I mean, it's one RL step, Michael. What could it cost?
"A fun fact is that an RL step on a model like Nvidia Nemotron 3 Ultra on Tinker costs $1K and a meaningful RL run would be hundreds of steps"

@giffmana The rare scenario where Arrested Development's 'What could it cost?' is completely sincere.

@giffmana this is the Arrested Development reference i was looking for today

@shakoistsLog @giffmana the tasks that really matter don't benefit from language priors!

@yacineMTB @giffmana You can do a lot more, a lot faster with a 3-4 layer NN.

@giffmana My take: post-training cost is the next real filter, not model access. When a meaningful RL run is hundreds of $1K steps, only teams with a tight reward signal can afford to iterate. The bottleneck stops being compute. It becomes knowing exactly what to reward.