18h ago

NVIDIA releases LocateAnything, a 3B local vision-language model for UI and object grounding

A researcher critiqued the model's discrete coordinate tokens.

Sentiment

Pos0%

Neg100%

Some users dismissed NVIDIA's LocateAnything 3B Model as obviously worse than moondream.

1 comment with sentiment.

NVIDIA releases LocateAnything, a 3B local vision-language model for UI and object grounding · Digg