/Tech1d ago

Claude Mythos 5 Scores 59% on Humanity’s Last Exam Without Tools

298374411861.3K

#1958

Original post

Super Dario#1958

ASM@ASM65617010

Claude Mythos 5 scores 59% on Humanity’s Last Exam, with no tools.

As a contributor of HLE, I would never have expected such a score barely a year and a half after the benchmark’s release.

10:33 AM · Jun 9, 2026 · 61.3K Views

/Tech1d ago

Claude Mythos 5 Scores 59% on Humanity’s Last Exam Without Tools

298374411861.3K

#1958

Original post

Super Dario#1958

ASM@ASM65617010

Claude Mythos 5 scores 59% on Humanity’s Last Exam, with no tools.

As a contributor of HLE, I would never have expected such a score barely a year and a half after the benchmark’s release.

10:33 AM · Jun 9, 2026 · 61.3K Views

Sentiment

Many users expressed optimism about Claude Mythos 5 reaching near-100% on Humanity’s Last Exam soon, while others criticized the chart as misleading and questioned the benchmark's validity.

Pos

54.5%

Neg

45.5%

12 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS2.7K

Tamanokal@tamanokal

@ASM65617010 Could you please provide a breakdown on the fields in which it succeeded? Thanks!

1d2.7K4

BOOKMARKS4RETWEETS4

Jake@JakeKAllDay

@ASM65617010 Holy chart crime, Batman

21h712134

LIKES43

ASM@ASM65617010

@extliqprovider No. HLE questions can’t be answered simply by having more data. Most require reasoning, often genuinely complex reasoning, as in many of the physics questions.

23h2.1K431

REPLIES1

Equinox@Soho1579854

@ASM65617010 80% no tools by next year for sure

11h1811

ASM@ASM65617010

@tamanokal I contributed to the HLE benchmark, not to the Mythos evaluation of it. It included extremely difficult questions across many topics, from quantum physics to linguistics.

1d2.4K16

efe@extliqprovider

@ASM65617010 bigger model for HLE you just need more data and anthropic has a lead over oai in this

1d2.6K6

Sin Jeong-hun@realrealcat

@ASM65617010 Why did you cut the graph between 0 and 40?

20h63110

Mikko Korhonen@MikkoKorhonen12

@ASM65617010

10h897

Entropy Cowboy@EntropyCowboy

@ASM65617010 To remind you, people say anthropic have been having it since January. Which means it actually only took a year to reach that level

17h2626

Sub Woofer@SubWoofer143785

@ASM65617010 65% possible score and we are yet to see what GPT 5.6 Pro can do, oh boy

18h5351

Gregor@bygregorr

@ASM65617010 not sure 'no tools' is the right anchor in daily use on supabase queries the tool gap feels massive for anything non-trivial. does the with-tools score change how you read the 18-month timeline?

23h8702

Ugly Cowboy@ugly_cowboy

@ASM65617010 We will need the “HLE Endgame” benchmark soon.

22h3972

Ding@zhaoxiongding

@ASM65617010 @extliqprovider You’re approaching this from a perspective of a human.

If simply more data and compute scores more on your bench, then empirically it is simply a data and compute problem.

wtf is reasoning? Does a submarine swim?

18h1203

Saffron Warlord e/acc@rawantitmc

@ASM65617010 Expecting 99% on HLE by Q3 of 2027

17h1702

Eren@kaiba_1991

@ASM65617010 100% in 2 years. That's when RSI starts

23h5191

Tamanokal@tamanokal

@ASM65617010 Thanks! Didn't know the labs are the ones that report the findings! I remember when https://knzhou.github.io/ first mentioned working on it

1d3961

Lunexa@Lunexalith

@ASM65617010 Insane chart lol, 0-40 X-axis is same as 40-45, +33% stronger is making it seem like +100%.

22h3421

Min@min_aws

@ASM65617010 How sure are you even that the dataset isn't in training now that this has been out for 1.5 years. How do all benchmarks once released get maxed but we can come up with similar benchmarks that aren't maxxed out yet. Post Fable delusion is real.

8h952

Sadi Moodi@MoodiSadi

@ASM65617010 We will reach to 100% in a short time, mark my words

14h2181

Law Student in Denial@DenialLaw

@ASM65617010 How did you get to answer?

16h1631