2h ago

AI Models Ace Benchmarks but Miss Proactive Real-World Assistance

2740988

——0——

Original post

Your assistant can ace every benchmark and still miss this. User: "I'll load the hatchback after work." Most models: "Drive safe!" A proactive model: a full packing checklist, in reverse order of install, for the thing the user never asked about. We measured it. New post 🧵👇

8:45 AM · May 29, 2026

#292Alex Smola@SMOLIX

Same model, same history. The only change was a one-line rubric in the system prompt.

Blind annotators preferred the proactive answer 80% of the time. 70% even when the vanilla reply had already passed.

smola.org

What your assistant didn’t say – Alex Smola

Alex Smola@smolix

3:45 PM · May 29, 2026 · 381 Views

3:45 PM · May 29, 2026 · 331 Views

#292Alex Smola@SMOLIX

The behavior was already in the model. One line redirected where it spends attention.

Why this matters for the human-agent systems we build at @boson_ai. Led by @sepehrharfi with @ahmadsalimi_ and Dongming Shen.

boson.ai

Boson AI

Boson AI builds AI for humans. We create voice agents with foundation models and continuous learning capabilities, making communication with AI as easy, natural, and fun as talking to a human.

Alex Smola@smolix

Same model, same history. The only change was a one-line rubric in the system prompt. Blind annotators preferred the proactive answer 80% of the time. 70% even when the vanilla reply had already passed. https://alex.smola.org/posts/38-proactivity/

3:45 PM · May 29, 2026 · 331 Views

3:45 PM · May 29, 2026 · 276 Views