@teortaxesTex says newer reasoning and engineering benchmarks will prove fragile despite Chinese models closing the coding gap · Digg