/AI16h ago

Cristóbal Eyzaguirre Ercilla introduces StateKV, an inference-time method that lets pretrained video VLMs scale linearly with video length

It maintains VideoMME benchmark accuracy without model retraining.

45113166.8K

Original posts

Quote posts

Original post

Jiajun Wu#358

Cristóbal Eyzaguirre Ercilla@CristbalEyzagu2

1/ The biggest problem in video understanding today isn't the models. It's that we can barely run them.

Introducing StateKV: an inference-time method that makes pretrained video VLMs scale linearly with video length.🧵

🔗 http://ceyzaguirre4.github.io/StateKV

8:49 AM · Jun 2, 2026 · 5.3K Views

/AI16h ago

Cristóbal Eyzaguirre Ercilla introduces StateKV, an inference-time method that lets pretrained video VLMs scale linearly with video length

It maintains VideoMME benchmark accuracy without model retraining.

--0--

Original posts

Quote posts

Original post

Jiajun Wu#358

Cristóbal Eyzaguirre Ercilla@CristbalEyzagu2

1/ The biggest problem in video understanding today isn't the models. It's that we can barely run them.

Introducing StateKV: an inference-time method that makes pretrained video VLMs scale linearly with video length.🧵

🔗 http://ceyzaguirre4.github.io/StateKV

8:49 AM · Jun 2, 2026 · 5.3K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS1.5KBOOKMARKS3LIKES8RETWEETS2

Juan Carlos Niebles @CVPR@jcniebles

Processing long videos with VLMs shouldn't scale quadratically. Enter StateKV! 🎬💡

By framing streaming prefill as a fixed-capacity temporal state, we unlock linear-time prefill while keeping full per-frame detail.

Paper by @CristbalEyzagu2 and team👇

https://arxiv.org/abs/2605.31598

Cristóbal Eyzaguirre Ercilla@CristbalEyzagu2

1/ The biggest problem in video understanding today isn't the models. It's that we can barely run them.

Introducing StateKV: an inference-time method that makes pretrained video VLMs scale linearly with video length.🧵

🔗 http://ceyzaguirre4.github.io/StateKV

16h1.5K83