/Tech2h ago

Mirage Probes Paper Examines Why VLMs Answer Image Questions Without Images

626071.3K

Original post

Ravid Shwartz Ziv@ziv_ravid#741inTech

Random thoughts about hallucinate, and world models. VLMs often answer image questions correctly with no image attached. This is the "mirage" effect (Asadi et al.), which inflates multimodal benchmark scores. In our recent paper, “Mirage Probes”, we asked why it happens.

1:19 PM · Jul 3, 2026 · 767 Views

Sentiment

Positive users highlight the Mirage Probes paper's mildly optimistic results showing that VLMs' reliance on imagined content is linearly decodable and thus potentially flaggable.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS312BOOKMARKS1LIKES4REPLIES1

Ravid Shwartz Ziv@ziv_ravid

We found two reasons: Textual bias. The question text alone points to one confident answer, so the model never touches its visual representations at all. Why look at the image?

Ravid Shwartz Ziv@ziv_ravid

Random thoughts about hallucinate, and world models. VLMs often answer image questions correctly with no image attached. This is the "mirage" effect (Asadi et al.), which inflates multimodal benchmark scores. In our recent paper, “Mirage Probes”, we asked why it happens.

2h31241

Ravid Shwartz Ziv@ziv_ravid

Spurious images. The text isn't enough to answer directly, but it evokes visual priors. So the model builds a fake image in latent space and answers from that, as if it were grounded.

Ravid Shwartz Ziv@ziv_ravid

We found two reasons: Textual bias. The question text alone points to one confident answer, so the model never touches its visual representations at all. Why look at the image?

2h13841

Ravid Shwartz Ziv@ziv_ravid

Both are visible inside the model. Mirage behavior is linearly decodable from internal activations even when the image IS present, and a text-only baseline can't recover the signal. We also found that you can separates mirage and mirage using the activations

Ravid Shwartz Ziv@ziv_ravid

Spurious images. The text isn't enough to answer directly, but it evokes visual priors. So the model builds a fake image in latent space and answers from that, as if it were grounded.

2h2831

Ravid Shwartz Ziv@ziv_ravid

Cleaning benchmark text can fix reason 1 but not 2. Spurious images live in the model's visual representations, so faithful grounding needs interventions at that level. One question is whether reason 2 is even a bug?

Ravid Shwartz Ziv@ziv_ravid

Both are visible inside the model. Mirage behavior is linearly decodable from internal activations even when the image IS present, and a text-only baseline can't recover the signal. We also found that you can separates mirage and mirage using the activations

2h2821

Ravid Shwartz Ziv@ziv_ravid

So the line isn't "models shouldn't imagine." We want models that complete scenes, predict, simulate. The line is that the model needs to know, and tell us, which parts came from input and which parts it filled in.

2h20

Ravid Shwartz Ziv@ziv_ravid

Ask a model what's next to the oven in a kitchen it's never seen. The right answer requires hallucinating: probably a fridge, a counter, and some cabinets. That's not a failure. That's what a world model is for.

2h4

Ravid Shwartz Ziv@ziv_ravid

None of this is specific to vision. A fabricated citation is the same move: a world model of the literature completing a gap with plausible authors and a plausible year. Ignoring retrieved context when priors are strong is another example.

2h2

Ravid Shwartz Ziv@ziv_ravid

So a spurious image is a world model doing its job at the wrong moment: Gap-filling is imagination when you're planning, and a mirage when you're supposed to report what you see.

2h2

Ravid Shwartz Ziv@ziv_ravid

Humans run on this machinery too. Perception is "controlled hallucination": the brain predicts the scene and uses input to correct errors. Your blind spot gets filled in from priors every waking second. What you have that models don't is source monitoring

2h2

Ravid Shwartz Ziv@ziv_ravid

Why do models gap-fill instead of saying "I can't tell"? Because we pay them to. Benchmarks grade correctness, not groundedness, so guessing from priors has positive expected value and abstaining has zero. Mirage is the optimal policy under our evals.

2h2

Ravid Shwartz Ziv@ziv_ravid

Everyone wants models with "good world models," but it's unclear what that means beyond describing the world well. A good world model isn't just perception. It predicts what's probably there when you don't have the input.

2h2

Ravid Shwartz Ziv@ziv_ravid

Our results are mildly optimistic here: if "I'm running on imagined content" is linearly decodable, it's flaggable in principle. Grounding without imagination is a camera. Imagination without grounding is a mirage. The interesting problem is the seam.

2h21

Mirage Probes Paper Examines Why VLMs Answer Image Questions Without Images · Digg