the most important problem in ai safety, as well as the biggest unlock for letting RSI fucking rip, is formalizing and automatizing the science of robust model behavior evaluation
Prime Intellect's Will Brown argues that automating robust evaluation is the essential precursor to safe recursive self-improvement
Story Overview
Will Brown, Research Lead at Prime Intellect, positions the formalization and automation of robust model behavior evaluation as both the core unsolved problem in AI safety and the main blocker preventing rapid, safe recursive self-improvement. He notes that optimizers and architectures distract attention while evals, data, and kernels drive real leverage, yet current evals remain largely inadequate.
Evals subsume the rest of the stack
Brown and agreeing builders argue that data and kernel problems collapse into evaluation problems, making strong evals the highest-leverage investment over architecture tweaks or optimizer changes.
No clear path from internal labs to public standards
Heavy internal eval spending at frontier labs contrasts with limited external sharing, since the effort-to-reward ratio favors quick vibechecking over rigorous formalization, leaving the automation step underspecified.
Positive users agree on prioritizing formalizing robust model evaluation for AI safety due to its value for alignment generalization and practical efforts, while some note reluctance to perform the work.
No Digg Deeper questions have been answered for this story yet.
Most Activity
optimizers and architectures are wonderful nerdsnipes, and RSI will find some cute tweaks for sure, but the big levers are evals, data, and kernels. but data and kernels are evals problems, so it’s really just evals. that, and bringing the damn GPUs online.
the most important problem in ai safety, as well as the biggest unlock for letting RSI fucking rip, is formalizing and automatizing the science of robust model behavior evaluation

@willccbb Did you see this? Thought it was interesting.
https://alignment.openai.com/beneficial-rl/
evals evals evals
the most important problem in ai safety, as well as the biggest unlock for letting RSI fucking rip, is formalizing and automatizing the science of robust model behavior evaluation

at @primeintellect we are hard at work scaling both evals and GPUs for the masses
come hang http://primeintellect.ai/careers
@willccbb totally agree. was just trying to explain to some folks last week that evals are where they should be spending their efforts. you get value from strong evals in a number of different ways.
optimizers and architectures are wonderful nerdsnipes, and RSI will find some cute tweaks for sure, but the big levers are evals, data, and kernels. but data and kernels are evals problems, so it’s really just evals. that, and bringing the damn GPUs online.

@willccbb This is why I suspect the interp work at ant is what led to their large improvements. Something something steering vectors during RL to improve exploration and desired behavior.

@willccbb i love working on the things that never make anyone's lists of things!

@willccbb Basically

@leerob nice! very cool results + great signs for alignment generalization

@leerob @willccbb Bruh after cursor will you join spacex?

@willccbb I am not a ML person per se but isn’t the ability to effectively automate evals effectively true AGI?
Not sure if you’re in that camp or not.

@willccbb One thing comes my mind is property based testing if we consider `eval : model :: tests : code`, so basically finding properties/invariants manually but automatically generating eval dataset based on them. Something like that?

@willccbb @PrimeIntellect @willccbb Does remote mean remote from within the US or remote from anywhere?

@willccbb The bottleneck may not be model capability but measurement capability. If we can't reliably evaluate behavior, every capability gain just increases uncertainty faster than confidence. Prime Intellect is betting eval infrastructure scales before intelligence does.

@EdSealing the bitter lesson way to improve exploration in RL is to avoid entropy collapse + sample more

@willccbb Cells Interlinked

@willccbb You mean proof-type formalization? Any interesting work you've seen there?

@willccbb formalize robust eval and half the safety policy arguments dissolve. nobody wants to do the work though

@EdSealing my instinct is that steering vectors function like a more surgical version of prompt conditioning, and are useful for interpreting behavior + example generation, but the big lever is basically pure RL but with really well-calibrated soft rewards

@willccbb i need to adjust this more after a recent Pliny paper but you get the idea
https://app.primeintellect.ai/dashboard/environments/anthone/channel-switching-eval