/TECHStory update pending

ErdosBench math benchmark rerun ranks Kimi 2.7 second overall, ahead of GPT-5 and behind Fable 5

It also evaluated Qwen 3.7 Max and Grok 4.3.

Story Brief

It also evaluated Qwen 3.7 Max and Grok 4.3.

Commentary on X

Highest ranked

6) Overall Kimi new model is an amazing entrant to the top lists and is worth more extensive testing. As a remainder, the above comparison is based on 14-problem smoke test you can find here: https://github.com/ulamai/erdosbench together with all the problems. The full benchmark has 226 problems and except for these 14 problems they are private to allow models for the same starting place. If you want to run the full benchmark on your model and get the report on how to boost the reasoning, DM me! Models used: @claudeai @Kimi_Moonshot @OpenAIDevs @Alibaba_Qwen @grok

Kyle Choi 崔凯尔@KCDN19

View all

⿻ Andrew Trask@iamtraskTECH

BREAKING: American in-fighting vs Chinese focus https://twitter.com/prz_chojecki/status/2065741640635990128

Andrew Carr 🤸