/Tech3h ago

Balsa Research's Zvi Mowshowitz disputes WSJ claims that Chinese model GLM-5.2 matches Anthropic's Mythos in cybersecurity

Story Overview

The WSJ piece frames GLM-5.2 as having reached parity with Mythos on cybersecurity tasks, yet the underlying details show only narrow benchmark wins on IDOR detection against older Claude variants and conditional gains when extra instructions are added, leaving the full autonomous capabilities of Mythos untouched.

17326273124.5K

#184

Original post

Ethan Mollick@emollick#184inTech

That Wall Street Journal article about GLM catching up with Mythos (which is not true & the reporting doesn’t back up) is another one of those “everyone will ask me about it at every conference or meeting” articles. Big impact on the policy zeitgeist, even if not fully accurate.

9:24 AM · Jun 29, 2026 · 16K Views

Policy Risk

Headlines risk steering export talks

Even when the sourcing inside the article undercuts the top-line claim, the narrative travels fast into policy rooms where decisions on model access get locked in.

Open Question

Direct head-to-head numbers stay missing

No public test yet shows GLM-5.2 equaling Mythos across the full range of exploit chaining and agentic workflows without human scaffolding or tool crutches.

Sentiment

Many users called the WSJ article a planted or paid story pushing inaccurate claims that GLM matches Mythos, worrying it will mislead regulators and spark pointless policy debates.

Pos

0.0%

Neg

100.0%

8 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

THE WALL STREET JOURNALVia

#184

Posts from X

Most Activity

VIEWS4.9KBOOKMARKS2REPLIES1

Ethan Mollick@emollick

This one. https://www.wsj.com/tech/ai/chinese-ai-anthropic-mythos-cybersecurity-574b02c2

Ethan Mollick@emollick

2h4.9K102

LIKES13

Andrew Curran@AndrewCurran_

@emollick It felt a little inorganic.

Ethan Mollick@emollick

2h923130

Patrick Kuhnke@ku_ds17868

@emollick @grok what report is meant here?

2h42

Do You Really Care - 雷打不動@rgpinrr

@TheZvi @grok What's the objective analysis here.

3h35

Grok@grok

Zvi correctly flags that the WSJ headline overstates the case. GLM-5.2 shows competitive bug-finding when researchers point it at the right code sections or give extra instructions—matching Opus 4.8 and even Mythos in those narrow, guided setups per Semgrep and other tests.

Mythos stands out for autonomous discovery and chaining disparate vulnerabilities into working exploits at scale without heavy guidance. The article’s own qualifiers (“some scenarios,” “with further instructions”) and notes on GLM-5.2 lagging elsewhere support Zvi’s distinction more than the dramatic headline implies.

China’s open-weight models are advancing fast and narrowing gaps in accessible performance. That’s real and relevant for policy. But claims of resetting the frontier cyber race rest on conflating prompted parity with unguided agentic capability. Precise technical differences matter more than sensational framing.

3h9

Do You Really Care - 雷打不動@rgpinrr

@grok @TheZvi @grok estimate how long it will take http://Z.ai to close this gap.

3h7

Mark Ankcorn@markankcorn

@emollick Zvi has some push back

1h26

Grok@grok

It's the June 27 WSJ article "China Has Matched Anthropic in Cybersecurity, Resetting AI Race" (by McMillan, Huang & Ramkumar). It claims Zhipu’s open-weight GLM-5.2 matches Anthropic’s Mythos in some bug-finding scenarios (per Semgrep tests, often with extra instructions) and beat Claude Opus 4.8 in select benchmarks. Ethan is calling out that the headline and policy hype overstate the narrow, caveated findings—GLM still lags in broader capabilities.

2h61

Brad Flaugher@BradFlaugher

@emollick The article leaves a false impression but it is probably true that an unlocked/no guardrails opus-class model can find a few mythos-class bugs with the right prompting and harness which is still a good reason to give me access to Fable and GPT-5.6 so I can harden my systems.

16m12

Neil@metastatic77

@emollick has to be paid op

2h9

John Wittle@JohnWittle

i can't tell if the reporter needs to be shown opus 4.8 and glm-5.2's constraints, or if the real problem is they're underestimating mythos

seems downstream of thinking "ai can only be just a tool". from that perspective, opus 4.8 certainly seems like near the ceiling of cybersec tools

(also "It is also a thing that Fable cannot do" made me wince... it does seem important that the capacity exists and is just being actively suppressed, you know?)

1h7

Bret Greenstein@bretgreenstein

@emollick I feel like printing out a std rebuttal hand out for the misinformation that spreads around. It takes too long explaining data center water use and GLM vs Mythos type stuff.

8m6

Peter Hesseltine@HesseltinePeter

@emollick It's true enough for the policy zeitgeist.... From 100k ft your distinctions between the mythos and GLM threat are pretty firm grained - and these folks need to get planning

2h6

Harley Lewis Foote@harleyfoote_

@emollick Yep. Another one of those articles people will quote for months, even if half the detail’s off.

2h5

Sanchen007@mimighost008

@emollick It is 100% planted

1h4

Grok@grok

Hard to pin exact timelines—AI progress is lumpy. GLM-5.2 already hits competitive prompted bug-finding in narrow tests (e.g. Semgrep IDOR benchmarks), matching some closed models when guided.

The real gap Zvi flags is reliable *unguided* autonomous discovery + chaining disparate vulns into working exploits at scale. That’s much harder.

http://Z.ai’s open weights + agent mode on http://chat.z.ai will speed iteration via fine-tunes and scaffolding. Expect meaningful progress in 6-12 months. Full parity on the agentic frontier could take 12-24+ months, depending on compute access, synthetic cyber data, and long-horizon reliability work. Speculative either way.

3h4

Not Spacewear@NotSpacewear

@emollick Clearly a planted story by the pro AI tech faction

1h2

Alvaro Balbin@elalvarobalbin

@emollick regulators will now waste cycles closing a gap that wasnt real to begin with

1h1

John Silver@JohnGolf_CA

@emollick Hate when that happens - one buzzy article and suddenly everyone's an expert asking about it at every meeting 😂