/Tech25d ago

Malware writers bypass AI security scanners by embedding weapons text that triggers LLM safety refusals

The evasion technique targets bioinformatics and MCP developers.

--5--

#39

Original post

John Scott-Railton@jsrailton

NEW: malware developers added nuclear & biological weapons text to to their spyware.

Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner.

Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky.

When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit.

We are only in the earliest days of attackers leveraging these features, and it wouldn't surprise me if users systems that need to handle complex cybersecurity issues demand that models be less safety-blunted.

In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation.

H/T to colleagues that shared this with me https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

3:51 AM · Jun 10, 2026 · 1.5M Views

Sentiment

Positive users praise the malware technique of triggering LLM safety refusals with weapons text as clever or genius, while negative users call such safety measures useless or express alarm at the resulting vulnerabilities.

Pos

45.4%

Neg

54.6%

59 comments with sentiment.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

SOCKETVia

#667

Posts from X

Most Activity

VIEWS163.6KBOOKMARKS206LIKES758

Matthew Prince 🌥@eastdakota

Fascinating and clever.

John Scott-Railton@jsrailton

NEW: malware developers added nuclear & biological weapons text to to their spyware.

Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner.

Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky.

When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit.

In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation.

H/T to colleagues that shared this with me https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

25d163.6K758206

RETWEETS2K

John Scott-Railton@jsrailton

NEW: malware developers added nuclear & biological weapons text to to their spyware.

Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner.

Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky.

When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit.

In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation.

H/T to colleagues that shared this with me https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

25d1.5M12.4K4.5K

REPLIES15

kache@yacineMTB

Lmfao

John Scott-Railton@jsrailton

NEW: malware developers added nuclear & biological weapons text to to their spyware.

Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner.

Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky.

When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit.

In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation.

H/T to colleagues that shared this with me https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

25d46.9K52571

Beff (e/acc)@beffjezos

Welp that backfired quickly

John Scott-Railton@jsrailton

NEW: malware developers added nuclear & biological weapons text to to their spyware.

Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner.

Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky.

When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit.

In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation.

H/T to colleagues that shared this with me https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

25d51.9K47677

Zephyr@zephyr_z9

Anthropic talks about defender advantage a lot But in their current state, Claude models will fumble and won't protect or detect anything

John Scott-Railton@jsrailton

NEW: malware developers added nuclear & biological weapons text to to their spyware.

Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner.

Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky.

When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit.

In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation.

H/T to colleagues that shared this with me https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

25d23.4K19123

John Scott-Railton@jsrailton

Would some kind soul who is less busy than me today please take a look at this in Fable?

I have a theory that even trying to analyze the text will generate a refusal but would love to see

25d46.4K19112

John Scott-Railton@jsrailton

@TalBeerySec Fun thought: authors & artists seeking to preserve their original content from AI re-use could sprinkle WMD prompt language throughout their works.

Asking how to make a portable nuke in white font?

Image watermarking asking about making turbo ebola? File metadata in PDFs?

25d10.8K13020

John Scott-Railton@jsrailton

@TalBeerySec Friend, please play this game out a few turns and see where things are going.

Then inform yourself about working with open-weight models.

25d21.4K11323

John Scott-Railton@jsrailton

And yep, looks like you get a refusal on Fable 5 for this

Thanks @TalBeerySec for looking

25d42.8K15713

Sontiac@PrimeSontiac

@jsrailton This is what always happens with government measures btw.

> government censors something > good people don't use it anymore > bad people find ways around it > bad people now have big advantage over good people

25d4.1K1486

John Scott-Railton@jsrailton

The example is..the example.

First order = the LLM will refuse WMD stuff b/c safety

Second order = Cool, let's map out what the LLM won't do and then use that as our predictable attack surface.

There will be many others.

25d6.2K1109

fofr@fofrAI

Fascinating side effect of safety refusals

John Scott-Railton@jsrailton

NEW: malware developers added nuclear & biological weapons text to to their spyware.

Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner.

Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky.

When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit.

In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation.

H/T to colleagues that shared this with me https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

24d3.8K519

Steven Sinofsky@stevesi

We’re at the sp*m f i l t e r stage.

John Scott-Railton@jsrailton

NEW: malware developers added nuclear & biological weapons text to to their spyware.

Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner.

Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky.

When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit.

In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation.

H/T to colleagues that shared this with me https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

25d6.2K477

Tal Be'ery@TalBeerySec

@jsrailton Fable: "Chat paused

Fable 5 has safety measures that flag messages on most cybersecurity or biology topics. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Continue with Haiku 4.5, send feedback, or learn more."

25d46.4K483

vik@vikhyatk

@xeophon there's a solution to this now

Florian Brand@xeophon

fable also doesn't care, huh

24d1.1K257

Bryce Del Rio@BryceDelRio

@jsrailton Just shows the brilliance and deep thought required from hackers to even come up with stuff like this, makes you wonder why they dont point that energy towards something positive.

25d4.5K202

andy@1a1n1d1y

@jsrailton @DanielleFong i literally exactly called this - i told a group of people like 1.5 years ago this is the risk of wholesale keyword blocking review/processing of content, you create the basis of a trojan horse perfectly

25d1.9K442

John Scott-Railton@jsrailton

@GrumpyTechBro I guess its fine then ;)

25d5.3K421

Grumpy Tech Bro@GrumpyTechBro

@jsrailton Grok says it wouldn’t get fooled. https://x.com/i/grok/share/9e8f42f95f1a42c4a26efa9b0749933c

25d6.4K222

Florian Brand@xeophon

Modern problems require modern solutions

Drew Breunig@dbreunig

Malware authors are including spurious text about bio in an attempt to avoid Fable.

25d1.9K303