As promised, here are my thoughts after spending all day with Mythos. i hope to god anthropic doesnt sue the fuck outta me but yolo. fair warning, this is a long one.
1. The Cost
Mythos pricing, at least for our enterprise was uhh expensive. I thought being a pilot company would mean they’d let us try it for free but no lmao. They did give a decent amount of free tokens from the API at least, but cost estimates put us well above a million dollars spent on it. In comparison, my company spent 2 million on inference for the entirety of last month for everyone in the company. So yeah, shit is pricey as hell.
2. The harness
The biggest surprise to me was that they actually sent us a harness that was NOT claude code. its sort’ve dinky and, looks to me largely ai generated. most of it focused on ensuring mythos did not “escape containment” along with some shitty security skills. so, they are def taking the sandboxing seriously. imo its pretty shit/restrictive harness. half of the guard rails dont work, lmao and apparently this is basically what “project glasswing” is, which is pretty funny considering the harness is shit. im not sure that the harness will be released with the model api when it drops either, it seemed like that was part of the deal. quite interested to see what they do when it drops/how it gets opened up.
I was able to use Mythos outside of the harness (omp btw)… more on that in a sec, though, I did have to hack around as they really dont want people to do this (what I was told at least)
3. the model
probably the part everyone is most interested in. i will say, the model is good. is it expensive? fuck yes. but its good. to me, it feels like it is fined tuned explicitly for this sort’ve security research tasks. for general coding, which I wasn’t able to play with much, it wasnt that surprising. but, it is indeed very good at security based tasks. far better than opus / 5.5 xhigh.
that said, I dont feel as though its some omnipresent danger/threat to society. I watched it get confused trying to use our build tool, actually to the point where I had to build the code for it and then run the model against the full build. you’d think an omnipresent model could do this, but nothing on the market have been able to figure it out. and its just Bazel with some custom shit we built. nothing crazy.
that said, if people have a shit ton of money AND extensive harness knowledge, yeah, they can probably use it to do some malicious shit. but only a genuinely skilled engineer/security researcher.
4. The results
Mythos was able to find quite a bit of vulnerabilities across a few of our products (like products probably everyone on this app has interacted with indirectly, maybe a small few directly). I think the final total was like ~800 major threats. Definitely enough to rethink some of the security strategy.
5. Final Thoughts
It’s a good model sir. It’s not an existential threat to humanity as Anthropic might lead you to believe, but it’s genuinely good. Cost wise I would like to try a comparison with 5.5 xhigh but alas I dont have a million dollars to throw at it to do a proper comparison.








