A thread with a good collection of hard/private/OOD evals where the Western frontier is comprehensively dunking on Chinese/open source models and it's not remotely close.
the "narrow capability gap" in question
let's put this to rest please I can't hear the coping anymore




