Should I make http://willclaudequotareset.com?
AI Judge changed title after evaluation, original title: "LLM inference bug exposing customer generation outputs prompts debate over KV cache isolation and routing errors"
Anthropic has not officially verified if customer data leaked
Should I make http://willclaudequotareset.com?
Users criticized AI API providers for repeated data leaks exposing customer information, viewing them as evidence of misplaced priorities and inherent distrust in cloud services.
ahahaha oh man this is the scariest kind of kv cache bug
Reports claim Claude’s API may have returned another user’s inference output during today’s outage.
Anthropic’s status page confirms elevated errors affecting Claude API, Claude Code, Claude. ai and Claude Cowork but Anthropic has not confirmed a customer data leak yet.
That would be a cross-tenant isolation failure and would be a worst-case scenario.
The downside of serving batches of customers on the same device is that you're one indexing error away from accessing other's output tokens
In the end, people will want private personal compute for inference. Lower latency and more power efficient.
Incredible how often this happens even to "serious" labs Seen it with DeepSeek (long ago tbh, they're very good at cache now), Grok, and now Claude?

@kalomaze "a new, dynamic shared KV cache cutting inference costs 99%"

@kalomaze They said it was “hallucinations” publicly lmao

@kalomaze @DanielleFong uh oh spaghetti o @vGPUArthur

@kalomaze do you think its more likely that youre actually getting someone elses exact kv cache (ie "leaking" data as in the qrt) or just some perturbation/combination that makes it look like that, but youre decoding not garbage per se but from a kv cache state that doesnt actually exist?

@kalomaze Cache invalidation or off by one error?

@kimmonismus Why they keep having issues like this while having unlimited access mythos? Makes me wonder how much of that hype is real.

@robot__fan @kalomaze completely junk kv would produce more degenerate outputs than what is shown
this indeed looks like kv cache bleeding across request boundaries
much easier than people realize for this to happen by accident
honestly bad look to call this ‘hallucinations’ with a straight face

@kimmonismus I know we say this weekly, but next week is pivotal imo

@kimmonismus play stupid games, win stupid prizes.
all that rich request body inspection and routing kerfuffle after openclaw and friends stressed capacity constraints could very well be causing pain?

@jacobrhinehart somehow every week feels pivotal

yeah this is the radix attention nightmare. shared prefix caches partition blocks by request ids, and crash recovery scrambles that mapping. the worst part: it generates fluent output from the wrong context, so you need a user noticing 'hey this is someone else's conversation' to catch it

@kalomaze dang cant believe mythos let this one slip through

@LouisMullie @kalomaze well yeah. obv hallucinations is wrong too, but wouldnt a kv thats a concat / wrong window of two (k?) valid non-junk kvs potentially produce results that are relevant to neither yet still sound coherent?

@kalomaze @DanielleFong Haha I wonder if part of the cache key is unset or something

@kimmonismus if it really crossed users how does anthropic even prove afterwards whose data went where?

@beffjezos Agree on local personal inference but there will always be inference endpoints regardless and they should be implementing this:
https://arxiv.org/abs/2603.14283