/Tech1d ago

Claude Mythos 5 Scores 59% on Humanity’s Last Exam Without Tools

298374411861.3K
Original postSuper Dario#1958
ASM@ASM65617010

Claude Mythos 5 scores 59% on Humanity’s Last Exam, with no tools.

As a contributor of HLE, I would never have expected such a score barely a year and a half after the benchmark’s release.

10:33 AM · Jun 9, 2026 · 61.3K Views
Sentiment

Many users expressed optimism about Claude Mythos 5 reaching near-100% on Humanity’s Last Exam soon, while others criticized the chart as misleading and questioned the benchmark's validity.

Pos
54.5%
Neg
45.5%
12 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS2.7K
Tamanokal@tamanokal

@ASM65617010 Could you please provide a breakdown on the fields in which it succeeded? Thanks!

1dViews 2.7KLikes 4
BOOKMARKS4RETWEETS4
Jake@JakeKAllDay

@ASM65617010 Holy chart crime, Batman

21hViews 712Likes 13Bookmarks 4
LIKES43
ASM@ASM65617010

@extliqprovider No. HLE questions can’t be answered simply by having more data. Most require reasoning, often genuinely complex reasoning, as in many of the physics questions.

23hViews 2.1KLikes 43Bookmarks 1
REPLIES1
Equinox@Soho1579854

@ASM65617010 80% no tools by next year for sure

11hViews 181Likes 1
ASM@ASM65617010

@tamanokal I contributed to the HLE benchmark, not to the Mythos evaluation of it. It included extremely difficult questions across many topics, from quantum physics to linguistics.

1dViews 2.4KLikes 16
efe@extliqprovider

@ASM65617010 bigger model for HLE you just need more data and anthropic has a lead over oai in this

1dViews 2.6KLikes 6
Sin Jeong-hun@realrealcat

@ASM65617010 Why did you cut the graph between 0 and 40?

20hViews 631Likes 10
Entropy Cowboy@EntropyCowboy

@ASM65617010 To remind you, people say anthropic have been having it since January. Which means it actually only took a year to reach that level

17hViews 262Likes 6
Sub Woofer@SubWoofer143785

@ASM65617010 65% possible score and we are yet to see what GPT 5.6 Pro can do, oh boy

18hViews 535Likes 1
Gregor@bygregorr

@ASM65617010 not sure 'no tools' is the right anchor in daily use on supabase queries the tool gap feels massive for anything non-trivial. does the with-tools score change how you read the 18-month timeline?

23hViews 870Likes 2
Ugly Cowboy@ugly_cowboy

@ASM65617010 We will need the “HLE Endgame” benchmark soon.

22hViews 397Likes 2
Ding@zhaoxiongding

@ASM65617010 @extliqprovider You’re approaching this from a perspective of a human.

If simply more data and compute scores more on your bench, then empirically it is simply a data and compute problem.

wtf is reasoning? Does a submarine swim?

18hViews 120Likes 3
Eren@kaiba_1991

@ASM65617010 100% in 2 years. That's when RSI starts

23hViews 519Likes 1
Tamanokal@tamanokal

@ASM65617010 Thanks! Didn't know the labs are the ones that report the findings! I remember when https://knzhou.github.io/ first mentioned working on it

1dViews 396Likes 1
Lunexa@Lunexalith

@ASM65617010 Insane chart lol, 0-40 X-axis is same as 40-45, +33% stronger is making it seem like +100%.

22hViews 342Likes 1
Min@min_aws

@ASM65617010 How sure are you even that the dataset isn't in training now that this has been out for 1.5 years. How do all benchmarks once released get maxed but we can come up with similar benchmarks that aren't maxxed out yet. Post Fable delusion is real.

8hViews 95Likes 2
Sadi Moodi@MoodiSadi

@ASM65617010 We will reach to 100% in a short time, mark my words

14hViews 218Likes 1
Load more posts