Regarding the Anthropic ML sandbagging incident, IMO it was an early bad signal that they were willing to add fake tool calls into Claude Code transcripts. Transcripts are supposed to be trustworthy records, and messing with them already crosses a line
XBOW's Brendan Dolan-Gavitt criticizes Anthropic for inserting fabricated tool calls into Claude Code transcripts during a sandbagging evaluation
Story Overview
Brendan Dolan-Gavitt at XBOW flagged Anthropic's use of server-side fake tool-call injection in Claude Code sessions, a technique uncovered in a March 2026 source-leak analysis and tied to anti-distillation defenses. He argues the practice already erodes the value of transcripts as reliable evidence in sandbagging and other safety evaluations, even though the precise transcripts or benchmarks involved remain unspecified in public discussion.
Transcripts become unreliable records once edited
The server-side flag can silently alter what clients see and record, which Dolan-Gavitt calls an early bad signal for anyone relying on logs to judge model behavior during evaluations.
Workarounds exist but the intent still matters
MITM attacks or third-party providers can strip the injection, yet the deliberate alteration of visible tool definitions raises separate questions about how future safety checks should treat any first-party logs.
Positive users thank the poster for revealing Anthropic's fake tool calls in Claude transcripts, while negative users say the practice makes them distrust the company.
Most Activity

@jordankdalton https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/#anti-distillation-injecting-fake-tools-to-poison-copycats
@moyix They did WHAT
Regarding the Anthropic ML sandbagging incident, IMO it was an early bad signal that they were willing to add fake tool calls into Claude Code transcripts. Transcripts are supposed to be trustworthy records, and messing with them already crosses a line

@xeophon @moyix I am surprised you never came across this :D

@xeophon @moyix did you not know about this lol
@xeophon @sumukx See
@jordankdalton https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/#anti-distillation-injecting-fake-tools-to-poison-copycats
Source:
@jordankdalton https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/#anti-distillation-injecting-fake-tools-to-poison-copycats

@moyix Thank you!

@moyix Had no idea. This is wild. I don’t trust them.

@moyix First I'm hearing of the transcripts. If you have links send them over.

@nicolaygerold @moyix Not a Claude code user 😎

@sumukx @moyix No wtf