/Tech1d ago

Systems engineer Yacine argues developers should train RL from scratch as Nemotron 3 Ultra steps cost $1,000 each

Complete training runs require hundreds of these steps.

9362104327.6K

#403

Original post

kache@yacineMTB#403inTech

@giffmana Begging people to do true rl from scratch instead of starting with the big language models

Lucas Beyer (bl16)@giffmana

I mean, it's one RL step, Michael. What could it cost?

7:45 AM · Jun 23, 2026 · 2.3K Views

Sentiment

Some users dismissed RL from scratch on LLMs like Nvidia Nemotron as impractical, reacting to its reported per-step costs of $1K and full runs reaching hundreds of thousands.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

ICanAutomateThat@Jack1000k

@yacineMTB @giffmana It’s like asking Claude to control cart pole directly. Like bro no

1d310

LIKES3

janotekk@janotekk

@giffmana there is always money in the gpu stand tho

1d1843

RETWEETS9

Lucas Beyer (bl16)@giffmana

I mean, it's one RL step, Michael. What could it cost?

Drew Breunig@dbreunig

"A fun fact is that an RL step on a model like Nvidia Nemotron 3 Ultra on Tinker costs $1K and a meaningful RL run would be hundreds of steps"

1d25.9K31537

Praveen Koka@praveenkoka

@giffmana The rare scenario where Arrested Development's 'What could it cost?' is completely sincere.

1d2272

Eclipse 🌖@ECLresearch

@giffmana this is the Arrested Development reference i was looking for today

1d205

kache@yacineMTB

@shakoistsLog @giffmana the tasks that really matter don't benefit from language priors!

1d126

dmsimon@dmsimon

@yacineMTB @giffmana You can do a lot more, a lot faster with a 3-4 layer NN.

1d351

Vikas(Vik) Malpani| AI for US Real Estate@vikasmalpani

@giffmana My take: post-training cost is the next real filter, not model access. When a meaningful RL run is hundreds of $1K steps, only teams with a tight reward signal can afford to iterate. The bottleneck stops being compute. It becomes knowing exactly what to reward.

1d43