OpenAI's @tszzl questions the competitive risk of disclosing reinforcement learning and chain-of-thought scaling principles during the o1 launch
Story Overview
OpenAI technical staffer @tszzl is openly wondering whether the o1 launch went too far by spelling out the core ideas behind scaling reinforcement learning on chain-of-thought reasoning, right in the middle of a heated AGI race, and how much of an edge that public post may have handed to everyone else watching.
What the September 2024 post actually described
The accompanying OpenAI blog explained that a large-scale RL process trains the model to refine its own chain of thought, fix mistakes, and improve with both more training compute and more thinking time at inference, while noting that the scaling rules differ from ordinary pretraining.
Whether any of it was truly new ground
Cluster replies point out that earlier public efforts such as DeepSeekMath and AlphaProof already sketched similar RL-over-reasoning building blocks, leaving the real competitive cost of the disclosure still unsettled.
Positive users praised OpenAI for openly sharing RL scaling details and model improvements in the o1 release, while a few questioned the company's transparency and value distribution.
No Digg Deeper questions have been answered for this story yet.
Most Activity
One of the key moments of the LLM era, ali g with GPT-3.5 and the decision by Microsoft to not take down Bing/Sydney/GPT-4 after the @kevinroose New York Times article.
imo it is crazy that openai, years into the heated AGI race, released o1 and described in quite a bit of detail the principles of scaling RL over CoT. I wonder how much value was dispersed to the public that day
I wonder why does my tl seems to be reliving the o1 day?
Maybe it’s time for the next one?
imo it is crazy that openai, years into the heated AGI race, released o1 and described in quite a bit of detail the principles of scaling RL over CoT. I wonder how much value was dispersed to the public that day
all of the pieces for reasoning models were already there
OpenAI only applied this at scale
without them the field would've converged to reasoning models within 1-2 years anyways
DeepSeekMath and AlphaProof existed way before o1 was announced
imo it is crazy that openai, years into the heated AGI race, released o1 and described in quite a bit of detail the principles of scaling RL over CoT. I wonder how much value was dispersed to the public that day
Only path is forward
I wonder why does my tl seems to be reliving the o1 day?
Maybe it’s time for the next one?
@tszzl https://www.youtube.com/watch?v=8WepLivNCbI
imo it is crazy that openai, years into the heated AGI race, released o1 and described in quite a bit of detail the principles of scaling RL over CoT. I wonder how much value was dispersed to the public that day

@tszzl This:
https://arxiv.org/abs/2203.14465

i dont wanna like discredit o1 or anything but even if you literally said nothing about it other than the fact it thinks before answer people probably would have realized what you did pretty quickly either way just by using the model DeepSeek-R1-Lite-Preview came out only like 2 months after o1-preview if you hadnt given as much detail probably would have been only like 3 months instead i suspect or less

@tszzl I think it was going to be fairly obvious to everyone soon regardless.
CoT things were happening elsewhere. The public chatter about inference time scaling was widespread.
R1 followed quickly and would have whether or not OAI described much about o1.

@MillionInt Where is it Jerry,
Where’s the berry

@tszzl If theyd just shown ...thinking... with some big GPT-5 needs time to first token excuse, people would’ve been 10x more blown away and competitors mightve chased the wrong path actually
cot summaries stripped away the black magic, to oais detriment and everyone elses benefit

@tszzl wasn’t that in itself inspired from eric zelikman’s 2022 paper?

optimizing for backtracking and self-correction was important but easily derived from first principles and an obvious implication
i think cot today can still be better with some more quirks like introspection, frame switching, edge case anticipation etc
let the gnome bread a bigger 🍓

@tszzl The big labs are basically playing musical chairs with the same 40 autists and increasingly absurd salaries. I don't see anyone keeping a secret like that for very long, and the idea wasn't particularly unknown. I don't see the counterfactual being that different.

@tszzl it’s probably as simple as analyzing patterns in the greatest thinkers, and as complex as alien logic we can’t keep up with

@tszzl to determine this accurately you would need to know the distribution of counterfactual realities. it may have been obvious to other researchers already or been something that would otherwise never happen. likely somewhere in between though.

@hallerite @tszzl yes they did

@pizzacritic999 @tszzl woah that's really cool, but also basically everything has been done on arxiv at some point. ASI is probably a 0 citation arxiv paper written in language indistinguishable from AI psychosis.

@jmbollenbacher @tszzl Everybody had CoT instructions in their custom instructions for months before major companies begin focusing on it in training.

@tszzl goat move

@rv32e @tszzl It was stupid and you were right. This won't get you a job but if you want to know how to make it work: stick a layer between the model and the user that plucks nearby thoughts from a vector database and injects them. It's infinitely more effective and it's cheaper on both sides.