/AI13h ago

Anthropic Mythos Delivers Strong Security Research Performance at High Cost

1331.7K134700342.1K

#1174

Original post

Minh Nhat Nguyen#1174

zek@zekramu

As promised, here are my thoughts after spending all day with Mythos. i hope to god anthropic doesnt sue the fuck outta me but yolo. fair warning, this is a long one.

1. The Cost

Mythos pricing, at least for our enterprise was uhh expensive. I thought being a pilot company would mean they’d let us try it for free but no lmao. They did give a decent amount of free tokens from the API at least, but cost estimates put us well above a million dollars spent on it. In comparison, my company spent 2 million on inference for the entirety of last month for everyone in the company. So yeah, shit is pricey as hell.

2. The harness

The biggest surprise to me was that they actually sent us a harness that was NOT claude code. its sort’ve dinky and, looks to me largely ai generated. most of it focused on ensuring mythos did not “escape containment” along with some shitty security skills. so, they are def taking the sandboxing seriously. imo its pretty shit/restrictive harness. half of the guard rails dont work, lmao and apparently this is basically what “project glasswing” is, which is pretty funny considering the harness is shit. im not sure that the harness will be released with the model api when it drops either, it seemed like that was part of the deal. quite interested to see what they do when it drops/how it gets opened up.

I was able to use Mythos outside of the harness (omp btw)… more on that in a sec, though, I did have to hack around as they really dont want people to do this (what I was told at least)

3. the model

probably the part everyone is most interested in. i will say, the model is good. is it expensive? fuck yes. but its good. to me, it feels like it is fined tuned explicitly for this sort’ve security research tasks. for general coding, which I wasn’t able to play with much, it wasnt that surprising. but, it is indeed very good at security based tasks. far better than opus / 5.5 xhigh.

that said, I dont feel as though its some omnipresent danger/threat to society. I watched it get confused trying to use our build tool, actually to the point where I had to build the code for it and then run the model against the full build. you’d think an omnipresent model could do this, but nothing on the market have been able to figure it out. and its just Bazel with some custom shit we built. nothing crazy.

that said, if people have a shit ton of money AND extensive harness knowledge, yeah, they can probably use it to do some malicious shit. but only a genuinely skilled engineer/security researcher.

4. The results

Mythos was able to find quite a bit of vulnerabilities across a few of our products (like products probably everyone on this app has interacted with indirectly, maybe a small few directly). I think the final total was like ~800 major threats. Definitely enough to rethink some of the security strategy.

5. Final Thoughts

It’s a good model sir. It’s not an existential threat to humanity as Anthropic might lead you to believe, but it’s genuinely good. Cost wise I would like to try a comparison with 5.5 xhigh but alas I dont have a million dollars to throw at it to do a proper comparison.

4:44 PM · Jun 8, 2026 · 342.1K Views

/AI13h ago

Anthropic Mythos Delivers Strong Security Research Performance at High Cost

1331.7K134700342.1K

#1174

Original post

Minh Nhat Nguyen#1174

zek@zekramu

As promised, here are my thoughts after spending all day with Mythos. i hope to god anthropic doesnt sue the fuck outta me but yolo. fair warning, this is a long one.

1. The Cost

2. The harness

I was able to use Mythos outside of the harness (omp btw)… more on that in a sec, though, I did have to hack around as they really dont want people to do this (what I was told at least)

3. the model

that said, if people have a shit ton of money AND extensive harness knowledge, yeah, they can probably use it to do some malicious shit. but only a genuinely skilled engineer/security researcher.

4. The results

5. Final Thoughts

4:44 PM · Jun 8, 2026 · 342.1K Views

Sentiment

Some users thanked the author for the insightful review of Anthropic Mythos's strong security research performance at high cost, while others dismissed it as low-quality slop with insults and mockery.

Pos

48.1%

Neg

51.9%

27 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS9.5KLIKES90REPLIES8

zek@zekramu

hopefully anthropic doesnt come from my ass, idk how much of this is breaking the nda. fuckit ig.

13h9.5K901

BOOKMARKS2

zek@zekramu

@kcosr Trying this

11h3.9K392

RETWEETS1

Tim@timmajim

@zekramu How do u plug Claude into OMP without pissing off your infosec dept

10h1K21

zek@zekramu

@1_missthesun brother they are literally acting like it is lmao

10h4.1K89

zek@zekramu

@graykevinb No no no brother, we spent a million dollars for *access* that came with a pool of free credits. It’s probably going to total more than that

12h3.8K40

Kevin@kcosr

@zekramu > Mythos was able to find quite a bit of vulnerabilities across a few of our products

If you point GPT/Opus at these same products, do they find any of those vulnerabilities?

11h4.7K18

zek@zekramu

@fidoeth @kcosr I’m with ya, but I don’t get to make that decision sadly. Plenty of people had mythos before us, idk what took my company so long to get serious about it

9h1.2K15

Kevin Gray@graykevinb

@zekramu I'm glad I didn't have coffee in my mouth cuz if I did it would have been blown on my keyboard when I read "we spent a million dollars on api" .

800 vulernabilities is a lot. I know you said its better, but do you think you could have found 800 if you took the time with codex?

13h3.2K12

Nick Schmidt@NickSchmidt

@zekramu Protect zek at all costs

12h1.5K12

zek@zekramu

@fidoeth @kcosr money

9h1.2K10

zek@zekramu

@fidoeth @kcosr Yeah, this place is an insane bubble for early alpha, kinda blows my mind

9h1.1K14

zek@zekramu

@graykevinb I’m actually going to strap it in omp and do a comparison loop to see.

12h8836

zek@zekramu

@jason_haugh Honestly, at a big enough company I would say yes, it is worth it. for solo teams or smaller companies…. idk…

12h4.6K8

i miss the sun@1_missthesun

@zekramu Anthropic never said it's an existential threat to humanity? wtf is this statement

10h4.6K8

Jason Haugh@jason_haugh

@zekramu Quick one for you. Once you net out the time it spent confused on your build before it found anything, did the per-ticket inference still pencil out? The marketing pitch and a model that needs a human to compile its code are a long way apart.

12h5.6K6

zek@zekramu

@timmajim I pissed off the infosec dep and got my accesses revoked

10h93111

Dan Advantage@DanAdvantage

@zekramu anthropic won't, mythos will have you learned nothing

12h4457

*daymare*@todaymare

@zekramu > I think the final total was like ~800 major threats I think my question with this is what are those threats reachable? Most code analysis from LLMs flag a bunch of code as potential threats because of a missing check or something but that unsafe code pathway is unreachable

11h43831

Kevin Gray@graykevinb

you really need to setup some benchmark. Cuz "vibes" is maybe not the best way to make million dollar decisions.

I mean I have a hunch that with a million dollars burned on gpt-5.5-xhigh in an autobomous loop you could get pretty far.

or heck burn the money on training kimik2.6 to be an autonomous white hat hacker

12h7686

zek@zekramu

@usr_bin_roygbiv no and no sadly. they have it setup in a way that is… unfriendly for using it to do anything outside of running it in their “harness”. I am hoping to convince cto to spend the money and get the api access to do it bc I wasn’t able to use it much outside a security pov

12h2K5