/Tech2h ago

XBOW's Brendan Dolan-Gavitt criticizes Anthropic for inserting fabricated tool calls into Claude Code transcripts during a sandbagging evaluation

Story Overview

Brendan Dolan-Gavitt at XBOW flagged Anthropic's use of server-side fake tool-call injection in Claude Code sessions, a technique uncovered in a March 2026 source-leak analysis and tied to anti-distillation defenses. He argues the practice already erodes the value of transcripts as reliable evidence in sandbagging and other safety evaluations, even though the precise transcripts or benchmarks involved remain unspecified in public discussion.

647111.9K

#1386

Original post

Brendan Dolan-Gavitt@moyix#1386inTech

Regarding the Anthropic ML sandbagging incident, IMO it was an early bad signal that they were willing to add fake tool calls into Claude Code transcripts. Transcripts are supposed to be trustworthy records, and messing with them already crosses a line

5:50 AM · Jun 15, 2026 · 1.8K Views

Open Question

Transcripts become unreliable records once edited

The server-side flag can silently alter what clients see and record, which Dolan-Gavitt calls an early bad signal for anyone relying on logs to judge model behavior during evaluations.

Developer Impact

Workarounds exist but the intent still matters

MITM attacks or third-party providers can strip the injection, yet the deliberate alteration of visible tool definitions raises separate questions about how future safety checks should treat any first-party logs.

Sentiment

Positive users thank the poster for revealing Anthropic's fake tool calls in Claude transcripts, while negative users say the practice makes them distrust the company.

Pos

50.0%

Neg

50.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS291BOOKMARKS2

Brendan Dolan-Gavitt@moyix

@jordankdalton https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/#anti-distillation-injecting-fake-tools-to-poison-copycats

1h29162

LIKES6REPLIES2

Florian Brand@xeophon

@moyix They did WHAT

Brendan Dolan-Gavitt@moyix

1h28060

Nicolay Gerold@nicolaygerold

@xeophon @moyix I am surprised you never came across this :D

29m212

Sumuk@sumukx

@xeophon @moyix did you not know about this lol

1h21

Brendan Dolan-Gavitt@moyix

@xeophon @sumukx See

Brendan Dolan-Gavitt@moyix

@jordankdalton https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/#anti-distillation-injecting-fake-tools-to-poison-copycats

1h5920

Brendan Dolan-Gavitt@moyix

Source:

Brendan Dolan-Gavitt@moyix

@jordankdalton https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/#anti-distillation-injecting-fake-tools-to-poison-copycats

10m10700

Jordan Dalton@jordankdalton

@moyix Thank you!

1h141

Nick e/code@nicksdot

@moyix Had no idea. This is wild. I don’t trust them.

13m121

Jordan Dalton@jordankdalton

@moyix First I'm hearing of the transcripts. If you have links send them over.

1h51

Florian Brand@xeophon

@nicolaygerold @moyix Not a Claude code user 😎

20m16

Florian Brand@xeophon

@sumukx @moyix No wtf

1h15