7h ago

Gemini 3.5 Flash ranks first on the CAIS Vision Index but fourth overall in text capabilities

DeepSeek-V4-Pro ranked fourteenth overall in text capabilities.

0
Original post

Gemini 3.5 Flash ranking 4th on CAIS Text Capabilities and 1st on Vision and DeepSeek-V4-Pro is slightly ahead of Kimi-K2.6 on Text

1:48 PM · May 26, 2026 View on X

Opus 4.5 came out on Nov 24, 2025 and, according to CAIS capabilities index, has ≈same-ish ARC-AGI-2 as DeepSeek-V4-Pro. V3.2 was 5.0%. Kimi 2.5 is 10.8% here and 11.8% on official leaderboard (V3.2 4.0%), so it's aligned. This is a gap of 6 months.

Lisan al GaibLisan al Gaib@scaling01

Gemini 3.5 Flash ranking 4th on CAIS Text Capabilities and 1st on Vision and DeepSeek-V4-Pro is slightly ahead of Kimi-K2.6 on Text

8:48 PM · May 26, 2026 · 23.6K Views
10:45 PM · May 26, 2026 · 18.5K Views

Opus 4.5 was the first of its line, and 4.6 got significantly stronger. I expect that V4.1 or V4-non-preview will likewise improve by >10%. In the 40s of ARC-AGI-2, interesting things begin happening. I've said many times, GPT 5.2 was the true RSI inflection point.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Opus 4.5 came out on Nov 24, 2025 and, according to CAIS capabilities index, has ≈same-ish ARC-AGI-2 as DeepSeek-V4-Pro. V3.2 was 5.0%. Kimi 2.5 is 10.8% here and 11.8% on official leaderboard (V3.2 4.0%), so it's aligned. This is a gap of 6 months.

10:45 PM · May 26, 2026 · 18.5K Views
10:47 PM · May 26, 2026 · 2.6K Views

That said, they have Grok 4.2 at 55% and Grok 4.3 at 13%, despite reasonable alignment on other benches. Isn't that both v8-small base? I am not convinced their eval is accurate.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Opus 4.5 was the first of its line, and 4.6 got significantly stronger. I expect that V4.1 or V4-non-preview will likewise improve by >10%. In the 40s of ARC-AGI-2, interesting things begin happening. I've said many times, GPT 5.2 was the true RSI inflection point.

10:47 PM · May 26, 2026 · 2.6K Views
10:54 PM · May 26, 2026 · 2.1K Views

@teortaxesTex Yeah but Opus 4.5 is actually useful for day to day work and Deepseek v4 Pro isn't 🙃

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Opus 4.5 came out on Nov 24, 2025 and, according to CAIS capabilities index, has ≈same-ish ARC-AGI-2 as DeepSeek-V4-Pro. V3.2 was 5.0%. Kimi 2.5 is 10.8% here and 11.8% on official leaderboard (V3.2 4.0%), so it's aligned. This is a gap of 6 months.

10:45 PM · May 26, 2026 · 18.5K Views
11:10 PM · May 26, 2026 · 7.6K Views