/Tech5h ago

Elicit's Andreas Stuhlmüller argues alignment eliminates safety classifiers, but creator Tenobrus says stateless APIs still require external context

Stateless API calls cannot easily distinguish defensive queries from attacks

730022K
Original post
Andreas Stuhlmüller@stuhlmueller#1531inTech

If alignment were easy, would you still need bio/cyber/r&d classifiers on top of your model?

You'd align the model to a principal who doesn't want that work to be done. The model would deploy its full cognition to distinguishing forbidden from valid work

Alas

8:47 AM · Jun 11, 2026 · 946 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS148BOOKMARKS1
Tenobrus@tenobrus

@stuhlmueller hm while there is an alignment problem i think there's also a ~fundamental "harness" problem- a model hit via raw stateless api calls to find bugs in a codebase genuinely needs way more affordances + information to differentiate between negative and positive use

4hViews 148Bookmarks 1
LIKES1REPLIES1

@lukestanley Yup - and *if alignment were easy* we would be able to have that trust

It's not, of course! But it's easy to forget that because the models are so helpful in practice

5hViews 9Likes 1
Luke Stanley@lukestanley

@stuhlmueller Nothing wrong with training a model to preferring avoid providing dangerous things, but getting rid of separate safety classifiers would require a lot of trust to justify the independence, with current techniques, surely?

5hViews 10
Strata@ChainZenit

@stuhlmueller that's a wild way to look at safety constraints.

5hViews 9
Luke Stanley@lukestanley

@stuhlmueller Yeah, the fact that in order to launch Fable Anthropic had to throw the kitchen sink at it is pretty revealing that robust alignment isn't here yet. With "prompt modification, steering vectors, PEFT-style intervention" even the infrastructure considerations are mind boggling.

5hViews 7
Rugbist@rugbist_

@stuhlmueller so basically the ideal model becomes the most paranoid regulator

5h