/Tech1d ago

Joey Gonzalez of UC Berkeley unveils Stateful Visual Encoders to help vision-language models compare images directly in the visual pipeline

The post-training method is compatible with existing frontier models

497177622K

Original post unavailable.

Sentiment

Users praise the stateful visual encoders as practical progress that fixes known VLM weaknesses in comparison and change-detection tasks.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

DeltaSignal@AITrailblazerQ

Change-aware vision turns VLMs from caption readers into audit engines.

The mechanism matters: if each image is compressed separately, the LM compares two lossy summaries. Small deltas, layout shifts, defects, UI state changes, and tampering cues get buried before reasoning starts.

Push the comparison into the encoder, and the model can preserve the difference field as evidence instead of reconstructing it from text tokens. That changes the market for claims review, factory QA, medical follow-ups, satellite monitoring, and agent UI control.

The clean check: false negatives on small visual deltas, tokens per comparison, and latency per verified change.

1d76

Rami Sufian@Rami_Bball_Fan

@profjoeyg This is the kind of practical AI work I want to see more of. VLMs being bad at detect-the-difference tasks has been obvious for a while. Nice to see a concrete fix instead of more AI hype.

1d57

EB1A Experts@eb1aexperts

@profjoeyg Fascinating direction for improving VLM reasoning.

1d24

Bards@ViswapriyaM

@profjoeyg CAN THIS WORK FOR VISUAL DOCUMENT UNDERSTANDING AS WELL?

1d9