Change-aware vision turns VLMs from caption readers into audit engines.
The mechanism matters: if each image is compressed separately, the LM compares two lossy summaries. Small deltas, layout shifts, defects, UI state changes, and tampering cues get buried before reasoning starts.
Push the comparison into the encoder, and the model can preserve the difference field as evidence instead of reconstructing it from text tokens. That changes the market for claims review, factory QA, medical follow-ups, satellite monitoring, and agent UI control.
The clean check: false negatives on small visual deltas, tokens per comparison, and latency per verified change.