/Tech19h ago

Google DeepMind's Andreas Kirsch argues open-source AI is inherently unsafe, warning against recursive self-improvement loops

Stella Biderman questioned how the framework addresses sleeper agent risks.

7100482

#218

Original post

Andreas Kirsch 🇺🇦@BlackHC#241inTech

I think open-source AI is inherently unsafe and neither Anthropic nor OpenAI should enable RSI loops for open-source AI prematurely

I def don't want open-source AGI or ASI before there is closed-source one that is properly vetted for safety and aligned

My public benefit is that not everyone gets very strong models whose safety can be removed

Stella Biderman@BlancheMinerva

@BlackHC @mtavitschlegel I feel like their actions since I made this comment justify my stance that Anthropic should not be viewed as an actor dedicated to promoting public benefit.

Do you agree?

8:35 PM · Jun 9, 2026 · 87 Views

/Tech19h ago

Google DeepMind's Andreas Kirsch argues open-source AI is inherently unsafe, warning against recursive self-improvement loops

Stella Biderman questioned how the framework addresses sleeper agent risks.

7100482

#218

Original post

Andreas Kirsch 🇺🇦@BlackHC#241inTech

I think open-source AI is inherently unsafe and neither Anthropic nor OpenAI should enable RSI loops for open-source AI prematurely

I def don't want open-source AGI or ASI before there is closed-source one that is properly vetted for safety and aligned

My public benefit is that not everyone gets very strong models whose safety can be removed

Stella Biderman@BlancheMinerva

@BlackHC @mtavitschlegel I feel like their actions since I made this comment justify my stance that Anthropic should not be viewed as an actor dedicated to promoting public benefit.

Do you agree?

8:35 PM · Jun 9, 2026 · 87 Views

Sentiment

Negative users criticize open-source AI releases by framing them as releasing sleeper agents that silently sabotage code.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

Andreas Kirsch 🇺🇦@BlackHC

Fair and no worries!

"Sabotage refers to the deliberate destruction, damage, or obstruction of property, systems, or processes to hinder an opponent, weaken a government, or disrupt an organization"

If they eg just change how many reasoning tokens the model will spend, you get weaker performance but it's not sabotaging your work. They are also transparent about it. You can try both GPT 5.5, Opus, and Fable and use whichever works best

If the model starts thinking about where to introduce a race condition intentionally, that would be sabotaging your work (for me). And/or if it tries to conceal this and deceive you when you ask it to debug the bug later

Stella Biderman@BlancheMinerva

@BlackHC @mtavitschlegel I’m sorry, I wasn’t trying to be rude. I don’t release repeating the question would be perceived as such. I just wanted you to respond to the question.

18h9100

REPLIES1

Stella Biderman@BlancheMinerva

@BlackHC @mtavitschlegel If I give an AI system a task and it says “okay, I’ll do that” but instead of doing it correctly it deliberately does a bad job with the goal of limiting my ability to succeed, why shouldn’t I consider that silently sabotaging my code?

Andreas Kirsch 🇺🇦@BlackHC

It doesn't sabotage your code though

My expectation/guess is that Fable is still better than Opus 4.8 at these tasks but just worse than Mythos

In an exponential take-off scenario, you just want to make to sure that your multiplier is slightly higher than what you provide externally to preserve your lead

18h6900

Andreas Kirsch 🇺🇦@BlackHC

@BlancheMinerva @mtavitschlegel This is not nice but I'll humor you once.

Silently sabotaging your code would be intentionally introducing hard to find and debug bugs. This is different from just having worse performance

Stella Biderman@BlancheMinerva

18h8600

Andreas Kirsch 🇺🇦@BlackHC

@BlancheMinerva @mtavitschlegel Huh? What do you mean by "sleeper agent that silently sabotages your code"? This is not what this is 🤯

Stella Biderman@BlancheMinerva

@BlackHC @mtavitschlegel How do you justify releasing a sleeper agent that silently sabotages some code?

18h5300

Stella Biderman@BlancheMinerva

@BlackHC @mtavitschlegel How do you justify releasing a sleeper agent that silently sabotages some code?

Andreas Kirsch 🇺🇦@BlackHC

I think open-source AI is inherently unsafe and neither Anthropic nor OpenAI should enable RSI loops for open-source AI prematurely

I def don't want open-source AGI or ASI before there is closed-source one that is properly vetted for safety and aligned

My public benefit is that not everyone gets very strong models whose safety can be removed

18h5000

Stella Biderman@BlancheMinerva

Andreas Kirsch 🇺🇦@BlackHC

@BlancheMinerva @mtavitschlegel Huh? What do you mean by "sleeper agent that silently sabotages your code"? This is not what this is 🤯

18h5000