Gemini 3.5 Flash ranks first on the CAIS Vision Index but fourth overall in text capabilities
DeepSeek-V4-Pro ranked fourteenth overall in text capabilities.
Opus 4.5 came out on Nov 24, 2025 and, according to CAIS capabilities index, has ≈same-ish ARC-AGI-2 as DeepSeek-V4-Pro. V3.2 was 5.0%. Kimi 2.5 is 10.8% here and 11.8% on official leaderboard (V3.2 4.0%), so it's aligned. This is a gap of 6 months.
Gemini 3.5 Flash ranking 4th on CAIS Text Capabilities and 1st on Vision and DeepSeek-V4-Pro is slightly ahead of Kimi-K2.6 on Text
Opus 4.5 was the first of its line, and 4.6 got significantly stronger. I expect that V4.1 or V4-non-preview will likewise improve by >10%. In the 40s of ARC-AGI-2, interesting things begin happening. I've said many times, GPT 5.2 was the true RSI inflection point.
Opus 4.5 came out on Nov 24, 2025 and, according to CAIS capabilities index, has ≈same-ish ARC-AGI-2 as DeepSeek-V4-Pro. V3.2 was 5.0%. Kimi 2.5 is 10.8% here and 11.8% on official leaderboard (V3.2 4.0%), so it's aligned. This is a gap of 6 months.
That said, they have Grok 4.2 at 55% and Grok 4.3 at 13%, despite reasonable alignment on other benches. Isn't that both v8-small base? I am not convinced their eval is accurate.
Opus 4.5 was the first of its line, and 4.6 got significantly stronger. I expect that V4.1 or V4-non-preview will likewise improve by >10%. In the 40s of ARC-AGI-2, interesting things begin happening. I've said many times, GPT 5.2 was the true RSI inflection point.
@theo It seems to be vastly easier to close the gap in "day to day work" than in ARC-AGI
@teortaxesTex Yeah but Opus 4.5 is actually useful for day to day work and Deepseek v4 Pro isn't 🙃
@teortaxesTex Yeah but Opus 4.5 is actually useful for day to day work and Deepseek v4 Pro isn't 🙃
Opus 4.5 came out on Nov 24, 2025 and, according to CAIS capabilities index, has ≈same-ish ARC-AGI-2 as DeepSeek-V4-Pro. V3.2 was 5.0%. Kimi 2.5 is 10.8% here and 11.8% on official leaderboard (V3.2 4.0%), so it's aligned. This is a gap of 6 months.

