Calif researchers discover macOS kernel vulnerabilities using Mythos
Calif researchers discovered two previously unknown macOS kernel vulnerabilities while testing an early build of Anthropic’s Mythos AI in April. They chained the flaws with additional techniques to corrupt memory and bypass Apple’s hardware-backed protections, reaching inaccessible device regions. The privilege escalation exploit was completed in five days of evaluation. The team submitted a 55-page report on the findings to Apple.
Uh oh. Captain, its just thursday
Mythos has cracked MacOS. It took five days.
The Second Scaling Law remains undefeated. If you want better hacking (or math, or science, or crossword puzzle solving) out of an LLM, just add thinking tokens. There doesn't seem to be any plateau so far.
Very important update from UK AISI. This is a meaningful change from the previous report. Here’s what the new data would look like for “Mythos Preview (new)” with $ on the x-axis:
Mythos has cracked MacOS. It took five days.
Mythos has cracked MacOS. It took five days.

Head of Frontier Red-Team at Anthropic.

A lot of people have been wondering about Mythos, Glasswing, and the vulns we / our partners are fixing. Today, I’m excited for us to start sharing more. (For context, I lead Glasswing @AnthropicAI.) Two independent evaluations this week—from XBOW and the UK AISI—confirm what we've been seeing internally: Claude Mythos Preview is a step change in autonomous cybersecurity capabilities. We need to start preparing fast for a world of models with this level of capabilities. The UK AI Security Institute tested the model we shipped at the launch of Project Glasswing and found Mythos Preview is the first model to solve both of their end-to-end cyber ranges, including one (Cooling Tower) which no model had ever cleared. But attackers (and defenders) have sophistication & cost constraints – Mythos is also the only model that clears every one of their tasks estimated over 8 hours under their deliberately low 2.5M-token cap. XBOW tested it on their offensive security benchmarks, finding "token-for-token, unprecedented precision." It's the only model to succeed at subtle V8 sandbox work. Other Glasswing partners shared similar stories. In a few weeks of testing, Mythos Preview has helped them find many thousands of (estimated) high + critical severity vulnerabilities, sometimes double what they'd normally find in a year. I don't share this to boost Mythos. In fact, this is not about Mythos. It’s about preparing for the coming world of models being better, faster, cheaper, and more creative than some of the best human experts at dual use capabilities. Clearly, we need them supporting defenders as widely as can be done safely – and especially the least resourced ones. Within a year, Mythos will probably look quite dumb (relative to other new models). And others may release openly available or unguardrailed models of Mythos-level capabilities. We started Project Glasswing because capabilities like Mythos Preview's won't stay rare, or stay in careful hands. We are bringing it to defenders as fast as we responsibly can, while working to figure out, for example, the right safeguards and patching & disclosure processes. Also, to be clear, compute has never been a limiter in our rollout. Expect a fuller update on our Glasswing work in the coming days. XBOW report: https://xbow.com/blog/mythos-offensive-security-xbow-evaluation UK AISI report: https://www.aisi.gov.uk/blog/how-fast-is-autonomous-ai-cyber-capability-advancing
From Calif:

'Mythos Preview is powerful: once it has learned how to attack a class of problems, it generalizes to nearly any problem in that class.'
https://blog.calif.io/p/first-public-kernel-memory-corruption
narrator voice: it was not gpt-2
Mythos has cracked MacOS. It took five days.
the thing is: computer infrastructure is *already* under strain. there are hacks, exploits, vulnerabilities every day, all surfaced by current generation of models.
narrator voice: it was not gpt-2
WSJ: Anthropic’s Mythos helped researchers find 2 unknown macOS kernel bugs and turn them into a working privilege escalation exploit in 5 days.
The target was the macOS kernel, the deepest layer of Apple’s desktop operating system, where code controls memory, processes, permissions, and access to hardware.
Mythos helped connect 2 separate flaws with extra exploitation techniques, which means the attack did not rely on one bug but on a chain where each step made the next step possible.
The exploit allegedly corrupted memory, bypassed Apple’s memory integrity protections, and gained access to protected parts of the system that normal apps should never reach.
This is serious because modern macOS defenses are built to make memory bugs hard to convert into control of the machine, not just hard to find.
Mythos can become so powerful here because vulnerability research is a search problem with many dead ends, where the model can help form hypotheses, inspect code behavior, reason across low-level constraints, and suggest exploit paths faster than manual work alone.
---
wsj .com/tech/ai/anthropic-mythos-apple-macos-bug-339da403

Well, well, well... Good for hardening the system