i agree that the frontier american models are clearly better, but it doesn't help that the evals being used are such bs that the compelling way to actually assess as a user is to just try em and decide based on vibes.
eg many of these evals put opus 4.7 and 4.8 *way* higher than 4.6 which is nonsense to anyone that has used them.
pair that with the reality that most people just aren't yet using these for anything all that sophisticated (even among the power-ish users) and it makes sense that the chinese OSS models seem compelling.
You’d be shocked by how many people in think tanks/academia/government/“strategic classes,” including in the U.S., are convinced that Chinese models are now “good enough” and leading the world in adoption. Meanwhile, the reality I see is a fairly wide, and still widening, gap.