/AI13h ago

Anthropic Mythos Delivers Strong Security Research Performance at High Cost

1331.7K134700342.1K
zek@zekramu

As promised, here are my thoughts after spending all day with Mythos. i hope to god anthropic doesnt sue the fuck outta me but yolo. fair warning, this is a long one.

1. The Cost

Mythos pricing, at least for our enterprise was uhh expensive. I thought being a pilot company would mean they’d let us try it for free but no lmao. They did give a decent amount of free tokens from the API at least, but cost estimates put us well above a million dollars spent on it. In comparison, my company spent 2 million on inference for the entirety of last month for everyone in the company. So yeah, shit is pricey as hell.

2. The harness

The biggest surprise to me was that they actually sent us a harness that was NOT claude code. its sort’ve dinky and, looks to me largely ai generated. most of it focused on ensuring mythos did not “escape containment” along with some shitty security skills. so, they are def taking the sandboxing seriously. imo its pretty shit/restrictive harness. half of the guard rails dont work, lmao and apparently this is basically what “project glasswing” is, which is pretty funny considering the harness is shit. im not sure that the harness will be released with the model api when it drops either, it seemed like that was part of the deal. quite interested to see what they do when it drops/how it gets opened up.

I was able to use Mythos outside of the harness (omp btw)… more on that in a sec, though, I did have to hack around as they really dont want people to do this (what I was told at least)

3. the model

probably the part everyone is most interested in. i will say, the model is good. is it expensive? fuck yes. but its good. to me, it feels like it is fined tuned explicitly for this sort’ve security research tasks. for general coding, which I wasn’t able to play with much, it wasnt that surprising. but, it is indeed very good at security based tasks. far better than opus / 5.5 xhigh.

that said, I dont feel as though its some omnipresent danger/threat to society. I watched it get confused trying to use our build tool, actually to the point where I had to build the code for it and then run the model against the full build. you’d think an omnipresent model could do this, but nothing on the market have been able to figure it out. and its just Bazel with some custom shit we built. nothing crazy.

that said, if people have a shit ton of money AND extensive harness knowledge, yeah, they can probably use it to do some malicious shit. but only a genuinely skilled engineer/security researcher.

4. The results

Mythos was able to find quite a bit of vulnerabilities across a few of our products (like products probably everyone on this app has interacted with indirectly, maybe a small few directly). I think the final total was like ~800 major threats. Definitely enough to rethink some of the security strategy.

5. Final Thoughts

It’s a good model sir. It’s not an existential threat to humanity as Anthropic might lead you to believe, but it’s genuinely good. Cost wise I would like to try a comparison with 5.5 xhigh but alas I dont have a million dollars to throw at it to do a proper comparison.

4:44 PM · Jun 8, 2026 · 342.1K Views
Sentiment

Some users thanked the author for the insightful review of Anthropic Mythos's strong security research performance at high cost, while others dismissed it as low-quality slop with insults and mockery.

Pos
48.1%
Neg
51.9%
27 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS9.5KLIKES90REPLIES8
zek@zekramu

hopefully anthropic doesnt come from my ass, idk how much of this is breaking the nda. fuckit ig.

13hViews 9.5KLikes 90Bookmarks 1
BOOKMARKS2
zek@zekramu

@kcosr Trying this

11hViews 3.9KLikes 39Bookmarks 2
RETWEETS1
Tim@timmajim

@zekramu How do u plug Claude into OMP without pissing off your infosec dept

10hViews 1KLikes 2Bookmarks 1
zek@zekramu

@1_missthesun brother they are literally acting like it is lmao

10hViews 4.1KLikes 89
zek@zekramu

@graykevinb No no no brother, we spent a million dollars for *access* that came with a pool of free credits. It’s probably going to total more than that

12hViews 3.8KLikes 40
Kevin@kcosr

@zekramu > Mythos was able to find quite a bit of vulnerabilities across a few of our products

If you point GPT/Opus at these same products, do they find any of those vulnerabilities?

11hViews 4.7KLikes 18
zek@zekramu

@fidoeth @kcosr I’m with ya, but I don’t get to make that decision sadly. Plenty of people had mythos before us, idk what took my company so long to get serious about it

9hViews 1.2KLikes 15
Kevin Gray@graykevinb

@zekramu I'm glad I didn't have coffee in my mouth cuz if I did it would have been blown on my keyboard when I read "we spent a million dollars on api" .

800 vulernabilities is a lot. I know you said its better, but do you think you could have found 800 if you took the time with codex?

13hViews 3.2KLikes 12
Nick Schmidt@NickSchmidt

@zekramu Protect zek at all costs

12hViews 1.5KLikes 12
zek@zekramu

@fidoeth @kcosr money

9hViews 1.2KLikes 10
zek@zekramu

@fidoeth @kcosr Yeah, this place is an insane bubble for early alpha, kinda blows my mind

9hViews 1.1KLikes 14
zek@zekramu

@graykevinb I’m actually going to strap it in omp and do a comparison loop to see.

12hViews 883Likes 6
zek@zekramu

@jason_haugh Honestly, at a big enough company I would say yes, it is worth it. for solo teams or smaller companies…. idk…

12hViews 4.6KLikes 8
i miss the sun@1_missthesun

@zekramu Anthropic never said it's an existential threat to humanity? wtf is this statement

10hViews 4.6KLikes 8
Jason Haugh@jason_haugh

@zekramu Quick one for you. Once you net out the time it spent confused on your build before it found anything, did the per-ticket inference still pencil out? The marketing pitch and a model that needs a human to compile its code are a long way apart.

12hViews 5.6KLikes 6
zek@zekramu

@timmajim I pissed off the infosec dep and got my accesses revoked

10hViews 931Likes 11
Dan Advantage@DanAdvantage

@zekramu anthropic won't, mythos will have you learned nothing

12hViews 445Likes 7
*daymare*@todaymare

@zekramu > I think the final total was like ~800 major threats I think my question with this is what are those threats reachable? Most code analysis from LLMs flag a bunch of code as potential threats because of a missing check or something but that unsafe code pathway is unreachable

11hViews 438Likes 3Bookmarks 1
Kevin Gray@graykevinb

you really need to setup some benchmark. Cuz "vibes" is maybe not the best way to make million dollar decisions.

I mean I have a hunch that with a million dollars burned on gpt-5.5-xhigh in an autobomous loop you could get pretty far.

or heck burn the money on training kimik2.6 to be an autonomous white hat hacker

12hViews 768Likes 6
zek@zekramu

@usr_bin_roygbiv no and no sadly. they have it setup in a way that is… unfriendly for using it to do anything outside of running it in their “harness”. I am hoping to convince cto to spend the money and get the api access to do it bc I wasn’t able to use it much outside a security pov

12hViews 2KLikes 5
Load more posts