/AI4h ago

Malware developers bypass LLM security scanners by embedding biological and nuclear weapon reference strings to trigger safety refusals

Story Overview

Attackers are slipping blocks of non-executing JavaScript comments into malicious packages on PyPI, packing them with fabricated instructions about aerosol-dispersed pathogens and implosion-type nuclear designs so that safety-tuned LLM scanners hit refusal mode and skip the file entirely, leaving the credential-stealing payload untouched.

2023.5K7131.2K244.1K
Original post

Genius. Computer worms now contain strings that trip the biosafety safeguards of the target's LLM malware detectors, cause a refusal, and thus cause a false negative

https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

for about 20 years now people have been putting little neural nets inside CPUs to do branch prediction for your particular workload

5:14 AM · Jun 10, 2026 · 206 Views
Developer Impact

The payload still runs after the scanner quits

Newer variants use .pth loaders and native extensions to fire up Bun-powered JavaScript stealers that grab GCP, Azure, and CI/CD secrets once the package is installed by bioinformatics or Model Context Protocol developers.

Open Question

Whether LLM vendors will patch this blind spot stays unclear

No public data yet shows which scanners are most affected, how often the trick succeeds, or what registries and model makers plan to do next about the static weapon strings that trigger the refusal.

Sentiment

Some users called the malware's nuclear text trick to trigger LLM refusals genius or hilarious, while others slammed the safety systems as dumb failures that aid evasion.

Pos
28.6%
Neg
71.4%
29 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS2.7K

We’re at the sp*m f i l t e r stage.

NEW: malware developers added nuclear & biological weapons text to to their spyware.

Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner.

Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky.

When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit.

We are only in the earliest days of attackers leveraging these features, and it wouldn't surprise me if users systems that need to handle complex cybersecurity issues demand that models be less safety-blunted.

In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation.

H/T to colleagues that shared this with me https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

2hViews 2.7KLikes 22Bookmarks 3
BOOKMARKS4LIKES25REPLIES4
Zephyr@zephyr_z9

Anthropic talks about defender advantage a lot But in their current state, Claude models will fumble and won't protect or detect anything

NEW: malware developers added nuclear & biological weapons text to to their spyware.

Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner.

Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky.

When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit.

We are only in the earliest days of attackers leveraging these features, and it wouldn't surprise me if users systems that need to handle complex cybersecurity issues demand that models be less safety-blunted.

In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation.

H/T to colleagues that shared this with me https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

9mViews 2.5KLikes 25Bookmarks 4
RETWEETS93

NEW: malware developers added nuclear & biological weapons text to to their spyware.

Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner.

Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky.

When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit.

We are only in the earliest days of attackers leveraging these features, and it wouldn't surprise me if users systems that need to handle complex cybersecurity issues demand that models be less safety-blunted.

In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation.

H/T to colleagues that shared this with me https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

4hViews 263.7KLikes 3.6KBookmarks 1.2K
Sontiac@PrimeSontiac

@jsrailton This is what always happens with government measures btw.

> government censors something > good people don't use it anymore > bad people find ways around it > bad people now have big advantage over good people

2hViews 301Likes 16Bookmarks 1

Would some kind soul who is less busy than me today please take a look at this in Fable?

I have a theory that even trying to analyze the text will generate a refusal but would love to see

2hViews 2.4KLikes 4Bookmarks 1
andy@1a1n1d1y

@jsrailton @DanielleFong i literally exactly called this - i told a group of people like 1.5 years ago this is the risk of wholesale keyword blocking review/processing of content, you create the basis of a trojan horse perfectly

1hViews 185Likes 5Bookmarks 1
Guilherme O'Tina@guilhermeotina

@jsrailton this is the cleanest example i've seen of safety filters becoming an active attack surface. the scanner refuses on nuke/bio keywords, so you stuff those in a comment and the payload sails through. you dont need to jailbreak anything, just exploit the refusal pattern itself

49mViews 357Likes 4Bookmarks 1

@TalBeerySec Fun thought: authors & artists seeking to preserve their original content from AI re-use could sprinkle WMD prompt language throughout their works.

Asking how to make a portable nuke in white font?

Image watermarking asking about making turbo ebola? File metadata in PDFs?

8mViews 174Likes 1Bookmarks 1
Zack Korman@ZackKorman

@SchizoDuckie @jsrailton Seems like a bad idea. Like the odds of it getting flagged for this reason is higher than the odds it works

2hViews 17Likes 2
Suketu Patel@SuketuPatel23

You might be interested in our paper. We show that this signal-level failure, the inability to resolve conflicts and adjudicate priority under context interference, is architecturally embedded in LLMs.

Scaffolds in turn propagate the error into reasoning and tool use.

https://academic.oup.com/pnasnexus/article/5/6/pgag149/8698838

19mViews 82Bookmarks 1

And yep, looks like you get a refusal on Fable 5 for this

Thanks @TalBeerySec for looking

1hViews 1.2KLikes 5

@TalBeerySec Friend, please play this game out a few turns and see where things are going.

Then inform yourself about working with open-weight models.

1hViews 425Likes 4
Max@MaxHuijgen

@jsrailton Smart but scary and yes naive from Anthro

4hViews 422Likes 4
Peter Henderson@PeterHndrsn

Interesting dual use for dual use safeguards.

NEW: malware developers added nuclear & biological weapons text to to their spyware.

Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner.

Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky.

When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit.

We are only in the earliest days of attackers leveraging these features, and it wouldn't surprise me if users systems that need to handle complex cybersecurity issues demand that models be less safety-blunted.

In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation.

H/T to colleagues that shared this with me https://socket.dev/blog/mini-shai-hulud-miasma-and-hades-worms-target-bioinformatics-and-mcp-developers-via-malicious

1hViews 309Likes 4Bookmarks 0

@jsrailton AI scanners getting faked out by prompt stuffing is crazy

3hViews 171Likes 4

@jsrailton Second-order? Examples of that please

24mViews 129

The example is..the example.

First order = the LLM will refuse WMD stuff b/c safety

Second order = Cool, let's map out what the LLM won't do and then use that as our predictable attack surface.

There will be many others.

12mViews 128Likes 4

@jsrailton NEST antifraud ** adding WMD text to trigger AI refusals proves the control boundary cannot be the model. NEST treats hostile text as data, not instruction: pre-consequence admissibility, intent-bound analysis, refusal routing, and receipts. Refusal is a signal, not analysis.

3hViews 349
MadCanny 🎒@Somuchmorefun

@jsrailton Hey @grok what is the best solution for protection?

1hViews 18Likes 1
Load more posts