Imagine your home robot is pouring almonds while a horror movie is playing on your TV.
The vampire jumps out.
A good robot should not flinch, panic, or throw almonds everywhere 馃槀
It should know: the TV is visually salient, but control-irrelevant.
This distinction matters.
A vision model may care about the cup logo, table texture, shadows, background objects, lighting, or a TV in the corner.
A robot should care about something much narrower:
the hand, the object, the contact region, and the motion that follows.