/AI11h ago

Lisan al Gaib says Chinese AI models trail Western rivals by four to six months, but Florian Brand argues benchmarks are flawed

The capability gap is narrowest in coding tasks.

4301613

#975

Original post

Lisan al Gaib@scaling01#975inAI

that's the backward looking gap which I think is ~4-6 months

and all of these tasks are primarily coding outside of coding the gap is larger

and the forward looking gap with Mythos is probably ~8-12 months, considering china bros will only get access to the compute and data necessary at the end of the year or early next year

6:42 AM · Jun 7, 2026 · 84 Views

/AI11h ago

Lisan al Gaib says Chinese AI models trail Western rivals by four to six months, but Florian Brand argues benchmarks are flawed

The capability gap is narrowest in coding tasks.

4301613

#975

Original post

Lisan al Gaib@scaling01#975inAI

that's the backward looking gap which I think is ~4-6 months

and all of these tasks are primarily coding outside of coding the gap is larger

and the forward looking gap with Mythos is probably ~8-12 months, considering china bros will only get access to the compute and data necessary at the end of the year or early next year

6:42 AM · Jun 7, 2026 · 84 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS181REPLIES1

Lisan al Gaib@scaling01

@xeophon @xpasky a lot of the leaderboards I shared are also very recent ones that don't have scores for Opus 4.5 or GPT-5.2

so an open-model being "right behind Sonnet 4.6 or Opus 4.6" doesn't mean much

Lisan al Gaib@scaling01

@xeophon @xpasky there's really too few live leaderboards outside of coding

11h18100

BOOKMARKS1

Florian Brand@xeophon

@scaling01 @xpasky Well and the same leaderboards use broken open model deployments to get their scores, which should be discarded. They are comparing closed models at their best vs open models at their worst / at mediocre setups at best

Lisan al Gaib@scaling01

@xeophon @xpasky a lot of the leaderboards I shared are also very recent ones that don't have scores for Opus 4.5 or GPT-5.2

so an open-model being "right behind Sonnet 4.6 or Opus 4.6" doesn't mean much

11h13111

LIKES1

Lisan al Gaib@scaling01

@xeophon @xpasky there's really too few live leaderboards outside of coding

Lisan al Gaib@scaling01

that's the backward looking gap which I think is ~4-6 months

and all of these tasks are primarily coding outside of coding the gap is larger

and the forward looking gap with Mythos is probably ~8-12 months, considering china bros will only get access to the compute and data necessary at the end of the year or early next year

11h11310

Florian Brand@xeophon

@scaling01 @xpasky I can prob engineer a leaderboard the same way. I use Opus 4.8, reasoning low, mini-swe-agent with its old settings (tool calling text based and no parallel tool calling allowed, 25 max turns or something) running on Bedrock vs. Kimi K2.6 @ high in Kimi CLI running on Kimi API

Florian Brand@xeophon

10h10410