Most agentic search systems get better by thinking longer: more tool calls, more reason-act loops, each step waiting on the last. Quality goes up, but so does latency.
Instructed-Retriever-1 takes a different route. Instead of scaling test-time compute sequentially, it scales it in parallel. One retrieval-specialized model fans the work out: it generates multiple query and filter formulations to widen recall, then reranks the merged evidence with a multi-pivot reranker to sharpen precision. Both stages run at once, so searching more broadly no longer means searching more slowly.
The result inside Knowledge Assistant: search time drops more than 3x and answer time 2x, with time to first token around two seconds, and no drop in quality (it matches Claude Sonnet 4.5 retrieval quality on KARLBench). For the people using it, that means far less waiting between question and answer, the freedom to ask more follow-ups, and more of the knowledge base actually surfaced. Rolling out to all customers now, with no reconfiguration.
Read how we did it: https://www.databricks.com/blog/3x-faster-search-parallel-test-time-scaling-instructed-retriever-1