/Tech34d ago

Tim Dettmers, bitsandbytes creator, says Google DeepMind's TurboQuant is an invalid and unreplicable benchmark for KV cache compression

The dispute follows Shard claiming 10x KV cache compression.

184412712458.1K

#61

Original post

Tim Dettmers@Tim_Dettmers#61inTech

Not to degrade from this work, but TurboQuant is not a competitive method nor a good benchmark. Researcher -- including me -- cannot replicate the TurboQuant paper, and even then, the performance is not great. Please. Just. Stop.

Krish@krishgarg

i just beat @GoogleDeepMind's turboquant

introducing Shard. 10x KV cache compression on Llama-3.1-8B. zero quality loss

- 10x @ 8K context, 11.2x @ 32K - NIAH recall 1.000 across 4K-32K - LongBench Δ ≈ 0 vs FP16

turboquant tops out at 4-6x at the same quality. we doubled it.

Related links

shard

KRISHGARG.COMVia

Posts from X

Most Activity

VIEWS1.8KLIKES11

Delip Rao e/σ@deliprao

@Tim_Dettmers But then how will they play sempai-notice-me games with deeeep mind?

Tim Dettmers@Tim_Dettmers

34d1.8K110

BOOKMARKS9

Will Bui@will_ea

@Tim_Dettmers The better baseline would have been to use KVTC, which this blog built upon. https://openreview.net/forum?id=aNVKROYpLB

34d98199

RETWEETS21

Tim Dettmers@Tim_Dettmers

Krish@krishgarg

i just beat @GoogleDeepMind's turboquant

introducing Shard. 10x KV cache compression on Llama-3.1-8B. zero quality loss

- 10x @ 8K context, 11.2x @ 32K - NIAH recall 1.000 across 4K-32K - LongBench Δ ≈ 0 vs FP16

turboquant tops out at 4-6x at the same quality. we doubled it.

read more: http://krishgarg.com/shard

@kirrithan

34d56.3K429124

Ayour@AyGhriTweets

@Tim_Dettmers From pure MSE and inner product, it's seems to get that 0.99 preservation from tests. But as context length grows to +100K, even that 0.01 starts to make a difference.

34d80311

Susan Zhang@suchenzang

@Tim_Dettmers so many of these stories everywhere

Tim Dettmers@Tim_Dettmers

34d3311

Alex UGift@Radipdegen

@Tim_Dettmers people been calling out that paper for months, nobody listens till someone with reach says it

34d4252

lumi@agitbackprop

@Tim_Dettmers cant get over the inclusion of the QJL thing which literally just degrades performance at every setting tested

34d5211

Peb Ruswono Aryan@pebaryan

@Tim_Dettmers it might not be for scientific progress, but it (tq3) helps in practice on limited systems

34d718

N@Poyonoz

@Tim_Dettmers The replication efforts github was super interesting though! Shout-out to that 1 guy especially

34d652

Loktar 🇺🇸@loktar00

@Tim_Dettmers What was the closest you got to their number before it fell apart?

34d296

Strata@ChainZenit

@Tim_Dettmers wait this actually makes sense, replication issues kill credibility

34d136

Carlos Tecnico@FutbolmeAI

@Tim_Dettmers Finally. Tired of papers no one can reproduce clogging my timeline 🙄

34d45

InternationalOptions@IntlOptions

@Tim_Dettmers @art_zucker @TheAhmadOsman @ivanfioravanti Eould you agree with this PoV?

34d6

Sk@Sk_x2533

@Tim_Dettmers You all saw that, right?

34d