around feb / starting with gpt 5.2, model capability stopped being my main constraint (which is why more of _my_ cycles are spent on working with these models effectively these days vs model capability). there are still lots of things that frontier models cant do, but they are not preventing me from working or accomplishing my goals (for the most part). GLM 5.2 is _just_ about there (not quite tho) . ~6mos behind frontier feels about right and the real limitation being RL flops right now (instead of data, or traces, or envs, etc) is somewhat expected but also telling
Entropix creator _xjdr argues reinforcement learning compute, not raw model capability, is the primary bottleneck in AI development
Teortaxes questioned parallelization limits of Multi-Objective Policy Disentanglement.
Users express interest in AI builders shifting focus to effective usage and tools like Fable while endorsing smart scaling over brute force to address RL flops as the main bottleneck.
No Digg Deeper questions have been answered for this story yet.
Most Activity
> the real limitation being RL flops right now I wonder what the limits of parallelization with MOPD are obviously, 1000 "experts" with 20 RL steps each are ≈useless compared to 10 experts with 2000 steps. But what about 40 experts @ 500 steps, merging@4 before OPD?
around feb / starting with gpt 5.2, model capability stopped being my main constraint (which is why more of _my_ cycles are spent on working with these models effectively these days vs model capability). there are still lots of things that frontier models cant do, but they are not preventing me from working or accomplishing my goals (for the most part). GLM 5.2 is _just_ about there (not quite tho) . ~6mos behind frontier feels about right and the real limitation being RL flops right now (instead of data, or traces, or envs, etc) is somewhat expected but also telling

@xandykati98 http://Code.noumena.com and http://noumena.com

@Orwelian84 @jmbollenbacher @_xjdr I spent 2 weeks optimizing metal kernels with Opus struggling to get ahead and 2 days with fable finished the job

@_xjdr the difference between fable and opus 4.8 is way way bigger, qualitatively so, than between Opus 4.6 and 4.8

@_xjdr what are mostly working on rn?

@Rafa_Schwinger @Orwelian84 @_xjdr bigger. for sure. claude 3.5 -> 4 wasnt nearly as big as opus->mythos.
i think its more akin to gpt-3.5 -> gpt-4. definitely on that level.

@jmbollenbacher @Rafa_Schwinger @_xjdr That's very interesting to hear. I did not try fable, so I have no point of reference right now. I look forward to getting to see the stepwise change, though

@Rafa_Schwinger @_xjdr Bigger or smaller than the difference between 3.5 and 4?

@Orwelian84 @_xjdr idk, I wasn't really using them seriously at the time

@_xjdr yeah and no, I mean, we are not in horrible place atm if big labs shut down
still, better model is just better model, often surprises you by picking better path than you expected, but chances are we are hitting cognitive limits

@Rafa_Schwinger @jmbollenbacher @_xjdr Well damn

@_xjdr what's launching?

@teortaxesTex Smart scaling always beats brute force when the engineering gets tough