/Tech6h ago

Pretraining On Massive Datasets Could Outperform RL Models Says Builder

5647177734.4K

Original post

Taelin@VictorTaelin#1140inTech

RL is a mistake, thinking is a mistake, and if we just put all the money into crafting an astronomically good, massive dataset, we'd pretrain a model that outperforms everything that exists by a considerable margin

source: my ass (I have no idea what I'm talking about)

Greg Oyan@thegoodreturn

People loved Fable 5 because it inferred intent better than any other model by a considerable margin. Sure, it was also the most capable model, but it's ability to just 'get' what you wanted, was unparalleled.

1:04 PM · Jun 17, 2026 · 32.5K Views

Sentiment

Many users rejected the claim that pretraining on massive datasets could outperform RL models for AGI, arguing that reinforcement learning remains essential for reaching superintelligence.

Pos

25.0%

Neg

75.0%

8 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS1.8KBOOKMARKS1LIKES14

Taelin@VictorTaelin

(this post assumes Fable was mostly the result of a massive pretrain)

Taelin@VictorTaelin

source: my ass (I have no idea what I'm talking about)

4h1.8K141

RETWEETS1

gev@dungeon95141844

@VictorTaelin

6h1634

REPLIES3

Flowers ☾@flowersslop

Isn’t RL basically just trying to find solutions that aren’t present in your dataset, but in a very inefficient way? Or isn’t the purpose of neural nets in general to interpolate, which you only have to do with a lack of data?

So if you had a dataset covering all economically valuable tasks in every variation that should be preferable?

6h88714

nicolasmelo@nicolasmelo

@VictorTaelin I totally disagree. RL is the ultimate way to reach super intelligence

One reason: Robots shouldn't look Human, it should look absolutely out of this world. It shouldn't resemble human. We write and express stuff in words because it's optimal for humans

AI should be alien to us

6h64031

Elliot Arledge@elliotarledge

not gonna disagree with that but thinking is still important. maybe it should be more latent. we give it so many experts and so many of them to active per token, but some tokens are much harder than others. so maybe some dynamic routing size... idk how to set this up without getting shape errors. i guess looped transformer makes sense here.

damn i just answered my own question (facepalm)

6h53341

Taelin@VictorTaelin

@nicolasmelo I don't disagree with you!

RL itself is fine, but transformers can't do RL justice

5h43521

Taelin@VictorTaelin

@flowersslop that's exactly how I see RL (as layman)

5h2146

Taelin@VictorTaelin

@notselfmodel no hope of agi with transformers regardless

6h4444

Nate@notselfmodel

@VictorTaelin no hope of agi that way

6h480

cup@etacup

@VictorTaelin This isnt as far off as you think. More diverse data in pre training makes RL so much more easy. Makes the difference between the model having to reason with fuzzy concepts it doesnt fully know about vs things its versed in.

5h201

Mihir@mihirneal

@elliotarledge @VictorTaelin this right here was thinking

6h26

sakanade@0xsakanade

@VictorTaelin working with opus 4.8 feels like reward hacking

6h2122

nicolasmelo@nicolasmelo

@VictorTaelin Actually a great movie that if you haven't watched you should go watch:

On this movie aliens do not resemble humans AT ALL But they are able to control and bend time through language

It's something we can't even comprehend with our current knowledge

AI should be more like that

6h932

kimse /\ nobody@Misteriazq

By the way that fable thing was probably just 2 opus model one of them does other tries to find better way or looking for missing things kinda thing then merge 2 of them into a monolith chain of thought if you can merge these 2 persona well now you have almost same thing source: trumps hair

6h2731

João Felipe@joaofelipe_sp

@VictorTaelin Why bother training a neural net at all. why dont we build a perfectly deterministic LLM exclusively using python elif's

6h2421

SpeakEZ.tech@SpeakezTech

@VictorTaelin You sound like @karpathy in 2022

6h1961

Nate@notselfmodel

@VictorTaelin with enough rl scale maybe you can just distill out an inefficient but general reasoning algorithm

6h1181

jack@jd_jd_jd_jd

@VictorTaelin we must have similar gut microbiomes cuz my ass is saying the same

5h911

Ceoz@Ceoz_1

@VictorTaelin Nah, because then we are constrained to form, you lose the possibility of new behavior emerging from training that were outside what we expected or produced.

Emergent behavior outside anthropomorphic structure is probably what will get us AGI.

6h245

Deen Kun A.@sir_deenicus

@flowersslop @VictorTaelin Not going to be near as robust to perturbations (surprise!) from trying to interpolate over data alone. This matters when you're trying to be novel too. RL helps reasoning because the model learns to self-correct and this repairs it during/prepares it for offish le beaten path

5h541