/Tech19h ago

Google DeepMind's Andreas Kirsch argues open-source AI is inherently unsafe, warning against recursive self-improvement loops

Stella Biderman questioned how the framework addresses sleeper agent risks.

7100482
Original post

I think open-source AI is inherently unsafe and neither Anthropic nor OpenAI should enable RSI loops for open-source AI prematurely

I def don't want open-source AGI or ASI before there is closed-source one that is properly vetted for safety and aligned

My public benefit is that not everyone gets very strong models whose safety can be removed

Stella Biderman@BlancheMinerva

@BlackHC @mtavitschlegel I feel like their actions since I made this comment justify my stance that Anthropic should not be viewed as an actor dedicated to promoting public benefit.

Do you agree?

8:35 PM · Jun 9, 2026 · 87 Views
Sentiment

Negative users criticize open-source AI releases by framing them as releasing sleeper agents that silently sabotage code.

Pos
0.0%
Neg
100.0%
1 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS91

Fair and no worries!

"Sabotage refers to the deliberate destruction, damage, or obstruction of property, systems, or processes to hinder an opponent, weaken a government, or disrupt an organization"

If they eg just change how many reasoning tokens the model will spend, you get weaker performance but it's not sabotaging your work. They are also transparent about it. You can try both GPT 5.5, Opus, and Fable and use whichever works best

If the model starts thinking about where to introduce a race condition intentionally, that would be sabotaging your work (for me). And/or if it tries to conceal this and deceive you when you ask it to debug the bug later

Stella Biderman@BlancheMinerva

@BlackHC @mtavitschlegel I’m sorry, I wasn’t trying to be rude. I don’t release repeating the question would be perceived as such. I just wanted you to respond to the question.

18hViews 91Likes 0Bookmarks 0
REPLIES1
Stella Biderman@BlancheMinerva

@BlackHC @mtavitschlegel If I give an AI system a task and it says “okay, I’ll do that” but instead of doing it correctly it deliberately does a bad job with the goal of limiting my ability to succeed, why shouldn’t I consider that silently sabotaging my code?

It doesn't sabotage your code though

My expectation/guess is that Fable is still better than Opus 4.8 at these tasks but just worse than Mythos

In an exponential take-off scenario, you just want to make to sure that your multiplier is slightly higher than what you provide externally to preserve your lead

18hViews 69Likes 0Bookmarks 0

@BlancheMinerva @mtavitschlegel This is not nice but I'll humor you once.

Silently sabotaging your code would be intentionally introducing hard to find and debug bugs. This is different from just having worse performance

Stella Biderman@BlancheMinerva

@BlackHC @mtavitschlegel If I give an AI system a task and it says “okay, I’ll do that” but instead of doing it correctly it deliberately does a bad job with the goal of limiting my ability to succeed, why shouldn’t I consider that silently sabotaging my code?

18hViews 86Likes 0Bookmarks 0

@BlancheMinerva @mtavitschlegel Huh? What do you mean by "sleeper agent that silently sabotages your code"? This is not what this is 🤯

Stella Biderman@BlancheMinerva

@BlackHC @mtavitschlegel How do you justify releasing a sleeper agent that silently sabotages some code?

18hViews 53Likes 0Bookmarks 0
Stella Biderman@BlancheMinerva

@BlackHC @mtavitschlegel How do you justify releasing a sleeper agent that silently sabotages some code?

I think open-source AI is inherently unsafe and neither Anthropic nor OpenAI should enable RSI loops for open-source AI prematurely

I def don't want open-source AGI or ASI before there is closed-source one that is properly vetted for safety and aligned

My public benefit is that not everyone gets very strong models whose safety can be removed

18hViews 50Likes 0Bookmarks 0
Stella Biderman@BlancheMinerva

@BlackHC @mtavitschlegel If I give an AI system a task and it says “okay, I’ll do that” but instead of doing it correctly it deliberately does a bad job with the goal of limiting my ability to succeed, why shouldn’t I consider that silently sabotaging my code?

@BlancheMinerva @mtavitschlegel Huh? What do you mean by "sleeper agent that silently sabotages your code"? This is not what this is 🤯

18hViews 50Likes 0Bookmarks 0