/AI4h ago

Instructed-Retriever-1 Matches Claude Sonnet 4.5 Retrieval Quality With Lower Latency

--0--
Original posts
Quote posts
Comments
Reposts
Original postAndrew Drozdov#954

Most agentic search systems get better by thinking longer: more tool calls, more reason-act loops, each step waiting on the last. Quality goes up, but so does latency.

Instructed-Retriever-1 takes a different route. Instead of scaling test-time compute sequentially, it scales it in parallel. One retrieval-specialized model fans the work out: it generates multiple query and filter formulations to widen recall, then reranks the merged evidence with a multi-pivot reranker to sharpen precision. Both stages run at once, so searching more broadly no longer means searching more slowly.

The result inside Knowledge Assistant: search time drops more than 3x and answer time 2x, with time to first token around two seconds, and no drop in quality (it matches Claude Sonnet 4.5 retrieval quality on KARLBench). For the people using it, that means far less waiting between question and answer, the freedom to ask more follow-ups, and more of the knowledge base actually surfaced. Rolling out to all customers now, with no reconfiguration.

Read how we did it: https://www.databricks.com/blog/3x-faster-search-parallel-test-time-scaling-instructed-retriever-1

9:46 AM · Jun 4, 2026 · 1.3K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS710BOOKMARKS2LIKES23RETWEETS9
Andrew Drozdov@mrdrozdov

New Product Update: We trained a retrieval-specialized model for Knowledge Assistant. It matches Claude Sonnet 4.5 retrieval quality at substantially lower latency.

Introducing Instructed-Retriever-1.

3hViews 710Likes 23Bookmarks 2
REPLIES2
Andrew Drozdov@mrdrozdov

I’m particularly excited about our multi-pivot groupwise reranker. Thoughtfully adding more context to each reranking call can really pay off.

Andrew Drozdov@mrdrozdov

The search harness for Instructed-Retriever-1 makes heavy use of Parallel Test-Time Scaling. By spending more compute in parallel, we expose several knobs for improving quality while keeping latency low.

3hViews 72Likes 2Bookmarks 0