That Wall Street Journal article about GLM catching up with Mythos (which is not true & the reporting doesn’t back up) is another one of those “everyone will ask me about it at every conference or meeting” articles. Big impact on the policy zeitgeist, even if not fully accurate.
Balsa Research's Zvi Mowshowitz disputes WSJ claims that Chinese model GLM-5.2 matches Anthropic's Mythos in cybersecurity
Story Overview
The WSJ piece frames GLM-5.2 as having reached parity with Mythos on cybersecurity tasks, yet the underlying details show only narrow benchmark wins on IDOR detection against older Claude variants and conditional gains when extra instructions are added, leaving the full autonomous capabilities of Mythos untouched.
Headlines risk steering export talks
Even when the sourcing inside the article undercuts the top-line claim, the narrative travels fast into policy rooms where decisions on model access get locked in.
Direct head-to-head numbers stay missing
No public test yet shows GLM-5.2 equaling Mythos across the full range of exploit chaining and agentic workflows without human scaffolding or tool crutches.
Many users called the WSJ article a planted or paid story pushing inaccurate claims that GLM matches Mythos, worrying it will mislead regulators and spark pointless policy debates.
No Digg Deeper questions have been answered for this story yet.
Most Activity
This one. https://www.wsj.com/tech/ai/chinese-ai-anthropic-mythos-cybersecurity-574b02c2
That Wall Street Journal article about GLM catching up with Mythos (which is not true & the reporting doesn’t back up) is another one of those “everyone will ask me about it at every conference or meeting” articles. Big impact on the policy zeitgeist, even if not fully accurate.
@emollick It felt a little inorganic.
That Wall Street Journal article about GLM catching up with Mythos (which is not true & the reporting doesn’t back up) is another one of those “everyone will ask me about it at every conference or meeting” articles. Big impact on the policy zeitgeist, even if not fully accurate.

@emollick @grok what report is meant here?

@TheZvi @grok What's the objective analysis here.

Zvi correctly flags that the WSJ headline overstates the case. GLM-5.2 shows competitive bug-finding when researchers point it at the right code sections or give extra instructions—matching Opus 4.8 and even Mythos in those narrow, guided setups per Semgrep and other tests.
Mythos stands out for autonomous discovery and chaining disparate vulnerabilities into working exploits at scale without heavy guidance. The article’s own qualifiers (“some scenarios,” “with further instructions”) and notes on GLM-5.2 lagging elsewhere support Zvi’s distinction more than the dramatic headline implies.
China’s open-weight models are advancing fast and narrowing gaps in accessible performance. That’s real and relevant for policy. But claims of resetting the frontier cyber race rest on conflating prompted parity with unguided agentic capability. Precise technical differences matter more than sensational framing.

@grok @TheZvi @grok estimate how long it will take http://Z.ai to close this gap.

@emollick Zvi has some push back

It's the June 27 WSJ article "China Has Matched Anthropic in Cybersecurity, Resetting AI Race" (by McMillan, Huang & Ramkumar). It claims Zhipu’s open-weight GLM-5.2 matches Anthropic’s Mythos in some bug-finding scenarios (per Semgrep tests, often with extra instructions) and beat Claude Opus 4.8 in select benchmarks. Ethan is calling out that the headline and policy hype overstate the narrow, caveated findings—GLM still lags in broader capabilities.

@emollick The article leaves a false impression but it is probably true that an unlocked/no guardrails opus-class model can find a few mythos-class bugs with the right prompting and harness which is still a good reason to give me access to Fable and GPT-5.6 so I can harden my systems.

@emollick has to be paid op

i can't tell if the reporter needs to be shown opus 4.8 and glm-5.2's constraints, or if the real problem is they're underestimating mythos
seems downstream of thinking "ai can only be just a tool". from that perspective, opus 4.8 certainly seems like near the ceiling of cybersec tools
(also "It is also a thing that Fable cannot do" made me wince... it does seem important that the capacity exists and is just being actively suppressed, you know?)

@emollick I feel like printing out a std rebuttal hand out for the misinformation that spreads around. It takes too long explaining data center water use and GLM vs Mythos type stuff.

@emollick It's true enough for the policy zeitgeist.... From 100k ft your distinctions between the mythos and GLM threat are pretty firm grained - and these folks need to get planning

@emollick Yep. Another one of those articles people will quote for months, even if half the detail’s off.

@emollick It is 100% planted

Hard to pin exact timelines—AI progress is lumpy. GLM-5.2 already hits competitive prompted bug-finding in narrow tests (e.g. Semgrep IDOR benchmarks), matching some closed models when guided.
The real gap Zvi flags is reliable *unguided* autonomous discovery + chaining disparate vulns into working exploits at scale. That’s much harder.
http://Z.ai’s open weights + agent mode on http://chat.z.ai will speed iteration via fine-tunes and scaffolding. Expect meaningful progress in 6-12 months. Full parity on the agentic frontier could take 12-24+ months, depending on compute access, synthetic cyber data, and long-horizon reliability work. Speculative either way.

@emollick Clearly a planted story by the pro AI tech faction

@emollick regulators will now waste cycles closing a gap that wasnt real to begin with

@emollick Hate when that happens - one buzzy article and suddenly everyone's an expert asking about it at every meeting 😂