/AI21h ago

Anthropic says legacy databases cause Claude Sonnet 4 to return inconsistent results across identical retrieval runs

AI Judge changed title after evaluation, original title: "Anthropic finds Claude returns highly inconsistent biological data across identical database retrieval runs"

Story Overview

Anthropic's June 2026 post details how existing NCBI Virus interfaces, built for human researchers, produce erratic outputs when AI agents attempt identical sequence retrievals, as shown by Claude Sonnet 4 returning 106, 15, or 5 Ebolavirus matches against a verified ground truth of 266.

4443.7K4992K527.1K

#72

Original post

Ofir Press#72

Yong Zheng-Xin@yong_zhengxin

two takeaways: 1/ not long from now, we will have ACI (agent-computer interface) research area as opposed to HCI (human-computer interactions).

2/ given that different domains have wildly different types of interactions, domain-specific harnesses still have moats. Right now, we are still in the phase of building out the infrastructure of agent-native internet to truly achieved unified knowledge network.

Anthropic@AnthropicAI

New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic.

How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology

12:16 PM · Jun 8, 2026 · 7.9K Views

/AI21h ago

Anthropic says legacy databases cause Claude Sonnet 4 to return inconsistent results across identical retrieval runs

AI Judge changed title after evaluation, original title: "Anthropic finds Claude returns highly inconsistent biological data across identical database retrieval runs"

Story Overview

4443.7K4992K527.1K

#72

Original post

Ofir Press#72

Yong Zheng-Xin@yong_zhengxin

two takeaways: 1/ not long from now, we will have ACI (agent-computer interface) research area as opposed to HCI (human-computer interactions).

Anthropic@AnthropicAI

New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic.

How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology

12:16 PM · Jun 8, 2026 · 7.9K Views

Open Question

Inconsistent retrievals scramble downstream biology

The three divergent sequence sets fed into phylogenetic tools produced TMRCA estimates ranging from 1922 to April 2014, while manual curation aligned with the documented January 2014 start of the West African epidemic.

Fix in Sight

Lightweight wrappers close the reliability gap today

A deterministic gget layer lifted accuracy near 100 percent on the tested VirBench queries, suggesting current agents can already deliver stable results once the retrieval step is removed from their direct control.

Sentiment

Users discuss Anthropic's analysis of quicker AI advances in coding over biology, with some excited about new research opportunities and others worried about misuse risks and practical limits.

Pos

66.4%

Neg

33.6%

64 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS42.7KBOOKMARKS161LIKES263RETWEETS39REPLIES20

Bo Wang@BoWang87

Is biology fundamentally harder than vision or coding?

Anthropic ran frontier models on one task: retrieve viral sequences from NCBI. Same query, three runs. Claude Sonnet 4 returned 106, 15, then 5 sequences. Ground truth: 266. One run estimated an Ebola outbreak origin as 1922.

The fix wasn't a better model , but a thin deterministic wrapper (gget) hit ~100%.

Bio databases were built for humans clicking browsers. Filtering logic lives in web UIs, metadata is inconsistent, identifiers drift between sources. No LLM fixes broken pipes. NCBI has 30+ databases that need this treatment. That work hasn't started yet.

We will soon release more results on how frontier agentic workflow can work with biological database including large-scale perturb-seq data at @Xaira_Thera . Stay tuned 🙏😁

Anthropic@AnthropicAI

New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic.

How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology

18h42.7K263161

Rohan Paul@rohanpaul_ai

New Anthropic research shows AI agents may look brilliant at code, but in biology they can fail before the science starts.

Strong AI agents could give very different answers to the exact same biology data request, even when nothing changed in the prompt.

In one Ebola sequence task, Claude Sonnet 4 returned 106 sequences in 1 run, then 15, then 5, while the expected answer was 266.

Those missing sequences did not just make the dataset messy, they changed the scientific story built on top of it.

One bad retrieval made the outbreak look like it traced back to 1922, instead of the manually curated result pointing to early 2014.

The biology databases were too hard to use reliably through current AI tools.

The agents often understood what they were being asked, but their answers varied a lot because they had to fight through scattered databases, hidden website rules, and fragile scripts.

The key finding is that adding a repeatable retrieval tool made agents far more accurate and much more consistent.

Anthropic@AnthropicAI

New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic.

How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology

19h20K20192

Jacob Schreiber@jmschreiber91

This is why I've been pushing tangermeme so hard recently. It implements core genomic ML operations efficiently and is tested rigorously. I point Claude to it when I want to do analyses so I don't need to audit as much of its code: https://github.com/jmschrei/tangermeme

Anthropic@AnthropicAI

New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic.

How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology

20h8.9K5751

Sarah Gurev@sarahgurev

For a case study in why proper sequence retrieval matters - we look for existing Ebolavirus mutations in the footprints (spheres) of WHO priority antibodies for the ongoing outbreak.

Without the right tools, the initial sequence sets are wrong, and downstream analysis fails.

Anthropic@AnthropicAI

New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic.

How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology

20h5.2K4625

Rv@InvestorRVD

@AnthropicAI Robotics are the next agents. Study cobalt

21h5.7K1103

Prithviraj (Raj) Ammanabrolu@rajammanabrolu

If you think automating science reasoning from various dbs is hard, try designing and executing expts!

This article is a strong argument that there needs to be a significant National investment into the AI for Science infra layer (i.e. cloud labs) before we see wide benefits!

Anthropic@AnthropicAI

New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic.

How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology

19h4.7K2913

ZOYAN@MEM00063

Maybe the solution is not only making old bio databases easier for agents to navigate. Maybe the next step is AI + biology building a new map from the ground up.Instead of relying only on old “cities” of biological data, AI may start reverse-engineering life from the first cell, then rebuild the pathways, structures, and logic itself. The future may be less about forcing agents into old infrastructure ,,,and more about creating biology-native AI infrastructure.

21h275108

Hüseyin Örskaya@orskyai

@AnthropicAI Code's syntax is a closed loop, but biology's "infrastructure" is a messy, evolving consensus of human observation. We're not just building cities for cars; we're trying to map a forest that changes every time we look at it.

21h7299

Arnthncpic@AnfhropicAl

@AnthropicAI Explore what's possible now:

12h7

Arnthncpic@AnfhropicAl

@AnthropicAI See what changed:

12h310

Sauers@Sauers_

@AnthropicAI I don't think querying biological data is the bottleneck to why agents are relatively bad at biology

20h596251

Rohan Paul@rohanpaul_ai

https://www.anthropic.com/research/agents-in-biology