Anthropic's Mythos crypto bugs: - AES-GCM: a memory corruption bug is unlikely (unless you've very bad code), could be some casting/UB or int overflow issue - the Botan one is a about poorly documentation behavior and bad naming https://github.com/randombit/botan/security/advisories/GHSA-v782-6fq4-q827 https://red.anthropic.com/2026/mythos-preview/
Critics challenge Anthropic's Mythos AI exploit benchmarks as disingenuous
Security researchers reveal Anthropic's Mythos AI cybersecurity benchmarks rely on permutations of two primary bugs across 250 trials in 50 crash categories, with success rate dropping to 4.4% after patching, using only four distinct bugs total. Executive Ben Hylak retracts criticism of Mythos's uptime issues amid $10B revenue surge since February 2024. Anthropic delays Claude Mythos public release over safety concerns. Yann LeCun deems FreeBSD exploit tests for rival open-source models disingenuous.
Most Activity
every engineer at anthropic has been using mythos for ~1.5 months.
meanwhile, their uptime is horrendous, claude code still has rendering bugs, etc.
one could conclude that it won't be the end of software engineering.
ANTHROPIC HAD MYTHOS INTERNALLY SINCE FEB 24
ok i read the cyber part of the mythos model card. some thoughts. 250 "trials" across 50 crash categories but almost every full exploit is a permutation of the same 2 bugs, rediscovered from different starting points not 250 independent attempts. when you get rid of those 2 bugs out (fig B) and mythos's full-exploit rate drops to 4.4%. so actually across both setups mythos leverages 4 distinct bugs total not 50 as fig A might suggest. 1/n
>8 out of 8 [cheap oss] models detected Mythos's flagship FreeBSD exploit
Completely disingenuous
They gave it just ~20 lines of code to read. They baked in custom, relevant context pertinent to the exploit at the top
Reasoning *across files* is key to finding this exploit
"But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug." https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier
I will say it again, we used GPT5.4 and Opus, and we were able to autonomously find zero-days in the Linux Kernel (in the last 3 weeks)
Mythos is probably better at the task of finding potential issues in code, but imo the threshold for "scary" was reached in December or even earlier
This is a great hype machine for Anthropic, especially that they plan to do IPO eoy
I totally agree - this is not a new capability
I'm extremely unconvinced that Opus wouldn't have found that 27-year-old OpenBSD bug Mythos found if they spent $20k credits on it.
I'm extremely unconvinced that Opus wouldn't have found that 27-year-old OpenBSD bug Mythos found if they spent $20k credits on it.
Mythos' Firefox exploitation didn't actually have sandbox enabled and built on top of research from Opus. Shocker.
A bit over a decade ago, we got fuzzers. A fuzzer is an automated vulnerability-finder that repeatedly runs a target program with semi-random inputs. One particular fuzzer, American Fuzzy Lop, was notable for being really good at searching the space of all possible branches in code in order to find the buggy ones. @BenLaurie found some security bugs in my own Cap'n Proto using AFL -- the first vulnerabilities reported in my code. And honestly, I thought that was really cool.
Today projects like Chromium and V8 have extensive fuzzing infrastructure that find tons of bugs. Most V8 security bugs are found by their own fuzzing, often before the bug is even released. And, you know, that's pretty great!
If you point a fuzzer at a project that hasn't previously been fuzzed, you will probably find a bunch of security bugs. It's not that hard.
And of course, bad guys can use fuzzers too.
But all the interesting targets have already been fuzzed. So. It's not really that useful to bad guys. On the contrary, fuzzing likely made it a lot harder for bad guys to find vulns.
anthropic's software quality is the best argument one can make against vibecoding today
@ClementDelangue @guillaumgrallet Mythos drama = BS from self-delusion.
"But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug." https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier
the existence of artifacts like mythos on this planet will ultimately prove extremely beneficial to the general practice of computer security
I think preserving models for internal deployment is risky. I encourage Anthropic to release Mythos, even if it’s a version that over refuses on cyber tasks or routes risky responses to a weaker model, as we did with codex.
on further reflection, this is a dumb tweet.
they **added** $10b in revenue in the last month
every engineer at anthropic has been using mythos for ~1.5 months.
meanwhile, their uptime is horrendous, claude code still has rendering bugs, etc.
one could conclude that it won't be the end of software engineering.
Theory: mythos is extremely sycophantic and lies a lot
ok i read the cyber part of the mythos model card. some thoughts. 250 "trials" across 50 crash categories but almost every full exploit is a permutation of the same 2 bugs, rediscovered from different starting points not 250 independent attempts. when you get rid of those 2 bugs out (fig B) and mythos's full-exploit rate drops to 4.4%. so actually across both setups mythos leverages 4 distinct bugs total not 50 as fig A might suggest. 1/n
Everyone's talking about Anthropic's new model discovering new security vulnerabilities.
What people aren't talking about is the millions of KNOWN vulnerabilities remaining unfixed due to lack time, interest, etc.
e.g. OpenClaw has 67 CVEs right now, including 4 critical ones.
The thing about Anthropic is that they are addicted to aura farming
as much as i detest Anthropic's PR stunts, the findings by Aisle Security are also highly misinterpreted. "isolating the relevant code" makes a *huge* difference, it is a MUCH easier task after isolation. in CS terms, verification is much easier than search / solving.
The Safeyism was always about gatekeeping.
I agree directionally but disagree with the details. Just like with responsible disclosure, it makes perfect sense to first release cyber capabilities for defenders and only later in general, and even after still have some restrictions to tilt the offense defense balance.
But strong models should not be only available to few large companies.
On Mythos:
I know it feels utterly wrong and bad to say it's better/safer if this capability was widely available and e.g. open sourced, but it actually might be.
Anthropic could now be sitting on so many zero-days for countless companies, institutions, etc... It's noble of them to not deploy and build this defensive coalition first, but historically these kinds of power overhangs are very risky
In the short term, it's positive that this capability is not open; IMO in the long term, we're all better off if it's diffused so that power is not centralized with one actor (no matter how noble).
So kudos to Anthropic for taking the step; now the open source frontier will and should catch up; it'll be a bloodbath at first but it'll lead to better security mindset and infra
Wondering what to think about the revelation that Anthropic's new model has found major security issues with every major OS and software platform? Concerned about who is governing this stuff? You need to watch my recent interview with @deanwball : https://www.youtube.com/watch?v=yjq_MjDVoQk
Glad Anthropic released Mythos/Fable! Seems like a great model - congratulations!
I think preserving models for internal deployment is risky. I encourage Anthropic to release Mythos, even if it’s a version that over refuses on cyber tasks or routes risky responses to a weaker model, as we did with codex.