/Tech5h ago

OpenAI's @tszzl questions the competitive risk of disclosing reinforcement learning and chain-of-thought scaling principles during the o1 launch

Story Overview

OpenAI technical staffer @tszzl is openly wondering whether the o1 launch went too far by spelling out the core ideas behind scaling reinforcement learning on chain-of-thought reasoning, right in the middle of a heated AGI race, and how much of an edge that public post may have handed to everyone else watching.

1372K37223170.7K

#44

Original post

roon@tszzl#44inTech

imo it is crazy that openai, years into the heated AGI race, released o1 and described in quite a bit of detail the principles of scaling RL over CoT. I wonder how much value was dispersed to the public that day

5:05 PM · Jun 18, 2026 · 117.5K Views

Open Question

What the September 2024 post actually described

The accompanying OpenAI blog explained that a large-scale RL process trains the model to refine its own chain of thought, fix mistakes, and improve with both more training compute and more thinking time at inference, while noting that the scaling rules differ from ordinary pretraining.

FYI

Whether any of it was truly new ground

Cluster replies point out that earlier public efforts such as DeepSeekMath and AlphaProof already sketched similar RL-over-reasoning building blocks, leaving the real competitive cost of the disclosure still unsettled.

Sentiment

Positive users praised OpenAI for openly sharing RL scaling details and model improvements in the o1 release, while a few questioned the company's transparency and value distribution.

Pos

80.6%

Neg

19.4%

42 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

YOUTUBEVia

#57

Posts from X

Most Activity

VIEWS22.1KBOOKMARKS26

Ethan Mollick@emollick

One of the key moments of the LLM era, ali g with GPT-3.5 and the decision by Microsoft to not take down Bing/Sydney/GPT-4 after the @kevinroose New York Times article.

roon@tszzl

4h22.1K13826

LIKES227RETWEETS3

Jerry Tworek@MillionInt

I wonder why does my tl seems to be reliving the o1 day?

Maybe it’s time for the next one?

roon@tszzl

4h20.7K22714

REPLIES11

Lisan al Gaib@scaling01

all of the pieces for reasoning models were already there

OpenAI only applied this at scale

without them the field would've converged to reasoning models within 1-2 years anyways

DeepSeekMath and AlphaProof existed way before o1 was announced

roon@tszzl

3h7.5K797

rohan anil@_arohan_

Only path is forward

Jerry Tworek@MillionInt

I wonder why does my tl seems to be reliving the o1 day?

Maybe it’s time for the next one?

3h2.9K221

Miles Brundage@Miles_Brundage

@tszzl https://www.youtube.com/watch?v=8WepLivNCbI

roon@tszzl

4h2.3K100

ashu@pizzacritic999

@tszzl This:

https://arxiv.org/abs/2203.14465

5h2331

ρ:ɡeon@pigeon__s

i dont wanna like discredit o1 or anything but even if you literally said nothing about it other than the fact it thinks before answer people probably would have realized what you did pretty quickly either way just by using the model DeepSeek-R1-Lite-Preview came out only like 2 months after o1-preview if you hadnt given as much detail probably would have been only like 3 months instead i suspect or less

5h3678

JMB 🧙‍♂️@jmbollenbacher

@tszzl I think it was going to be fairly obvious to everyone soon regardless.

CoT things were happening elsewhere. The public chatter about inference time scaling was widespread.

R1 followed quickly and would have whether or not OAI described much about o1.

5h18341

Jimmy Apples 🍎/acc@apples_jimmy

@MillionInt Where is it Jerry,

Where’s the berry

4h2965

Flowers ☾@flowersslop

@tszzl If theyd just shown ...thinking... with some big GPT-5 needs time to first token excuse, people would’ve been 10x more blown away and competitors mightve chased the wrong path actually

cot summaries stripped away the black magic, to oais detriment and everyone elses benefit

5h2804

ashu@pizzacritic999

@tszzl wasn’t that in itself inspired from eric zelikman’s 2022 paper?

5h1673

𒄆@liqsweep

optimizing for backtracking and self-correction was important but easily derived from first principles and an obvious implication

i think cot today can still be better with some more quirks like introspection, frame switching, edge case anticipation etc

let the gnome bread a bigger 🍓

5h3222

Andre Infante@AndreTI

@tszzl The big labs are basically playing musical chairs with the same 40 autists and increasingly absurd salaries. I don't see anyone keeping a secret like that for very long, and the idea wasn't particularly unknown. I don't see the counterfactual being that different.

4h165

𒄆@liqsweep

@tszzl it’s probably as simple as analyzing patterns in the greatest thinkers, and as complex as alien logic we can’t keep up with

5h291

Martian@space_colonist

@tszzl to determine this accurately you would need to know the distribution of counterfactual realities. it may have been obvious to other researchers already or been something that would otherwise never happen. likely somewhere in between though.

4h2003

Chase Blagden@ChaseBlagden

@hallerite @tszzl yes they did

1h191

ueaj@_ueaj

@pizzacritic999 @tszzl woah that's really cool, but also basically everything has been done on arxiv at some point. ASI is probably a 0 citation arxiv paper written in language indistinguishable from AI psychosis.

4h121

Vladimir Sumarov@summeroff

@jmbollenbacher @tszzl Everybody had CoT instructions in their custom instructions for months before major companies begin focusing on it in training.

4h21

Jake Colling@JacobColling

@tszzl goat move

5h1732

stop saying bitter lesson@localoptimiser

@rv32e @tszzl It was stupid and you were right. This won't get you a job but if you want to know how to make it work: stick a layer between the model and the user that plucks nearby thoughts from a vector database and injects them. It's infinitely more effective and it's cheaper on both sides.

3h13