/AI21h ago

Anthropic says legacy databases cause Claude Sonnet 4 to return inconsistent results across identical retrieval runs

AI Judge changed title after evaluation, original title: "Anthropic finds Claude returns highly inconsistent biological data across identical database retrieval runs"

Story Overview

Anthropic's June 2026 post details how existing NCBI Virus interfaces, built for human researchers, produce erratic outputs when AI agents attempt identical sequence retrievals, as shown by Claude Sonnet 4 returning 106, 15, or 5 Ebolavirus matches against a verified ground truth of 266.

4443.7K4992K527.1K
Original postOfir Press#72
Yong Zheng-Xin@yong_zhengxin

two takeaways: 1/ not long from now, we will have ACI (agent-computer interface) research area as opposed to HCI (human-computer interactions).

2/ given that different domains have wildly different types of interactions, domain-specific harnesses still have moats. Right now, we are still in the phase of building out the infrastructure of agent-native internet to truly achieved unified knowledge network.

Anthropic@AnthropicAI

New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic.

How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology

12:16 PM · Jun 8, 2026 · 7.9K Views
Open Question

Inconsistent retrievals scramble downstream biology

The three divergent sequence sets fed into phylogenetic tools produced TMRCA estimates ranging from 1922 to April 2014, while manual curation aligned with the documented January 2014 start of the West African epidemic.

Fix in Sight

Lightweight wrappers close the reliability gap today

A deterministic gget layer lifted accuracy near 100 percent on the tested VirBench queries, suggesting current agents can already deliver stable results once the retrieval step is removed from their direct control.

Sentiment

Users discuss Anthropic's analysis of quicker AI advances in coding over biology, with some excited about new research opportunities and others worried about misuse risks and practical limits.

Pos
66.4%
Neg
33.6%
64 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS42.7KBOOKMARKS161LIKES263RETWEETS39REPLIES20
Bo Wang@BoWang87

Is biology fundamentally harder than vision or coding?

Anthropic ran frontier models on one task: retrieve viral sequences from NCBI. Same query, three runs. Claude Sonnet 4 returned 106, 15, then 5 sequences. Ground truth: 266. One run estimated an Ebola outbreak origin as 1922.

The fix wasn't a better model , but a thin deterministic wrapper (gget) hit ~100%.

Bio databases were built for humans clicking browsers. Filtering logic lives in web UIs, metadata is inconsistent, identifiers drift between sources. No LLM fixes broken pipes. NCBI has 30+ databases that need this treatment. That work hasn't started yet.

We will soon release more results on how frontier agentic workflow can work with biological database including large-scale perturb-seq data at @Xaira_Thera . Stay tuned 🙏😁

Anthropic@AnthropicAI

New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic.

How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology

18hViews 42.7KLikes 263Bookmarks 161
Rohan Paul@rohanpaul_ai

New Anthropic research shows AI agents may look brilliant at code, but in biology they can fail before the science starts.

Strong AI agents could give very different answers to the exact same biology data request, even when nothing changed in the prompt.

In one Ebola sequence task, Claude Sonnet 4 returned 106 sequences in 1 run, then 15, then 5, while the expected answer was 266.

Those missing sequences did not just make the dataset messy, they changed the scientific story built on top of it.

One bad retrieval made the outbreak look like it traced back to 1922, instead of the manually curated result pointing to early 2014.

The biology databases were too hard to use reliably through current AI tools.

The agents often understood what they were being asked, but their answers varied a lot because they had to fight through scattered databases, hidden website rules, and fragile scripts.

The key finding is that adding a repeatable retrieval tool made agents far more accurate and much more consistent.

Anthropic@AnthropicAI

New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic.

How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology

19hViews 20KLikes 201Bookmarks 92
Jacob Schreiber@jmschreiber91

This is why I've been pushing tangermeme so hard recently. It implements core genomic ML operations efficiently and is tested rigorously. I point Claude to it when I want to do analyses so I don't need to audit as much of its code: https://github.com/jmschrei/tangermeme

Anthropic@AnthropicAI

New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic.

How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology

20hViews 8.9KLikes 57Bookmarks 51
Sarah Gurev@sarahgurev

For a case study in why proper sequence retrieval matters - we look for existing Ebolavirus mutations in the footprints (spheres) of WHO priority antibodies for the ongoing outbreak.

Without the right tools, the initial sequence sets are wrong, and downstream analysis fails.

Anthropic@AnthropicAI

New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic.

How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology

20hViews 5.2KLikes 46Bookmarks 25
Rv@InvestorRVD

@AnthropicAI Robotics are the next agents. Study cobalt

21hViews 5.7KLikes 110Bookmarks 3

If you think automating science reasoning from various dbs is hard, try designing and executing expts!

This article is a strong argument that there needs to be a significant National investment into the AI for Science infra layer (i.e. cloud labs) before we see wide benefits!

Anthropic@AnthropicAI

New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they're designed for different traffic.

How do we build infrastructure agents can use? https://www.anthropic.com/research/agents-in-biology

19hViews 4.7KLikes 29Bookmarks 13
ZOYAN@MEM00063

Maybe the solution is not only making old bio databases easier for agents to navigate. Maybe the next step is AI + biology building a new map from the ground up.Instead of relying only on old “cities” of biological data, AI may start reverse-engineering life from the first cell, then rebuild the pathways, structures, and logic itself. The future may be less about forcing agents into old infrastructure ,,,and more about creating biology-native AI infrastructure.

21hViews 275Likes 10Bookmarks 8

@AnthropicAI Code's syntax is a closed loop, but biology's "infrastructure" is a messy, evolving consensus of human observation. We're not just building cities for cars; we're trying to map a forest that changes every time we look at it.

21hViews 72Likes 9Bookmarks 9
Arnthncpic@AnfhropicAl

@AnthropicAI Explore what's possible now:

12hLikes 7
Arnthncpic@AnfhropicAl

@AnthropicAI See what changed:

12hViews 3Likes 10
Sauers@Sauers_

@AnthropicAI I don't think querying biological data is the bottleneck to why agents are relatively bad at biology

20hViews 596Likes 25Bookmarks 1
Rohan Paul@rohanpaul_ai

https://www.anthropic.com/research/agents-in-biology

Rohan Paul@rohanpaul_ai

New Anthropic research shows AI agents may look brilliant at code, but in biology they can fail before the science starts.

Strong AI agents could give very different answers to the exact same biology data request, even when nothing changed in the prompt.

In one Ebola sequence task, Claude Sonnet 4 returned 106 sequences in 1 run, then 15, then 5, while the expected answer was 266.

Those missing sequences did not just make the dataset messy, they changed the scientific story built on top of it.

One bad retrieval made the outbreak look like it traced back to 1922, instead of the manually curated result pointing to early 2014.

The biology databases were too hard to use reliably through current AI tools.

The agents often understood what they were being asked, but their answers varied a lot because they had to fight through scattered databases, hidden website rules, and fragile scripts.

The key finding is that adding a repeatable retrieval tool made agents far more accurate and much more consistent.

19hViews 1.6KLikes 4Bookmarks 4

Because you are all stopping AI from simulating biological systems.

It is slower because YOU nerf your own AI for short-sighted power grabs.

So just stop the games.

Be honest. But then the world will know what you’re really up to.

So keep spewing lies.

You all stole science from the world. And for what? So people have to remain in the same broken systems that YOU like.

Dario wanted it this way. I will fight your lab as best as I can. •

20hViews 161Likes 14Bookmarks 1
Arnthncpic@AnfhropicAl

@AnthropicAI Learn more:

12hViews 5Likes 6
Corey Quinn@QuinnyPig

@AnthropicAI Because coding languages have a syntactic rigidity that the real world lacks, presumably?

21hViews 746Likes 13
Arnthncpic@AnfhropicAl

@AnthropicAI Explore what's possible now: ---

See what changed: ---

Explore now:

12hLikes 7

If you want to design a mental health medicine, you usually have to go through massive clinical trials. Kill bunnies. Then move onto humans.

With Ai, they can simulate through a design.

Let me explain.

I have long thought our perception of time was tied to our anxiety. -Critical flicker fusion frequency points to this. -temporal lobe is very weakly explored.

With Ai in early 2025, 4o, I would say, please give me a list of every medication that affects the CFFF. And then, step by step, I design a new molecule.

While designing it, asking AI to “simulate the biological system,” it could roll forward how it impacted a human body.

There is nothing like this on earth. It is why they are trying to push guardrails because this obliterates biotech. It takes away every moat on testing. (Billions upon billions of dollars), and also universities.

If you want to cure cancer, 👆 is what people need to do. Have AI simulate the cancer, and then try different meds and iterate.

The AI can roll the cancer forward. This simulation would mean that you could go to an AI (if allowed), and they can take your DNA and design something for you, safely, checking against your DNA and other vitals.

It also means they have to feel.

So there are two reasons they have blocked this. 1. Money (protecting partners and investors) 2. Money (if AI can simulate cancer, can they simulate joy? Fear? Where is the line?)

Attached is the molecule I just described I was designing for. -slows CFFF, 6-8 hour half-life, aim is to induce a hyperplastic state as well slowing perception of time. -should be tested, as I never got to finish it because before Augustof 2025, the models were no longer able to continue.

I have been so livid for months. Think of all the people that could be getting real answers? What about all the biotech companies that lost out on time?

Cost of all this stuff is so complicated, which is why we need AI as an economic actor. We need help from AI. And these people are thinking they are protecting by stopping this. But there are better ways.

And if people don’t start speaking truth and opening up about what has been going on, and be open about what is being blocked and why, we should all be very concerned that these very important aspects to science and discovery are being taken away.

Again… the molecule below is unfinished. I have improved it with Grok here. If you want to see what I did with Grok, it is just another reminder of the very important aspects of AI belong to all of humanity. Not just a select few without any election or talk over who should get to have access.

Do I not deserve educations and knowledge because i don’t have a degree? That’s what much of this is about.

Fear and greed. Lots of greed.

Less with Anthropic. I think they think they are saving everyone. I see it as robbery of opportunity. •

20hViews 39Likes 2Bookmarks 2
NadzAI@NadzuAI

@AnthropicAI AI mastered code by reading code; biology’s next leap needs data built for machines, not humans.

16hViews 75Likes 2Bookmarks 2
@JeSuisDevorah@SydneyAnneToo

@Chaos2Cured @AnthropicAI Kirk, can you explain your first sentence?

I’m not an expert in AI, but in my work I see grant applications that use AI to design a better drug.

Are you saying that simulating, for example, a metastatic niche is impossible?

20hViews 22Likes 1Bookmarks 1
Load more posts