9h ago

Gavin Leech attributes vision model size gap to data compression

181.5K38328120.8K

——0——

Gavin Leech noted that vision models are roughly 1000 times smaller than text models. He attributed the disparity to language's data compression properties, which support compositional semantics and abstractions at higher density. Research engineer 1a3orn linked the size difference to evaluations of Chain of Thought reasoning intelligibility. Creator rohit suggested exploring Chain of Thought performed in images instead of words to preserve both efficiency and clarity.

Original post

gavin leech (Non-Reasoning)#1480@GLEECH

one of the major failures of my life was being so surprised to find out that vision models were ~1000x smaller than text models. Just total failure to understand language's god-tier data compression

2:00 AM · May 17, 2026

Cluster engagement

21 snapshots

#1220rohit@KRISHNANROHIT

@1a3orn CoT in pictures but not words would be quite neat

1a3orn@1a3orn

this is a relevant consideration for projecting how Lindy intelligible CoT is likely to be

2:31 PM · May 17, 2026 · 1.2K Views

4:15 PM · May 17, 2026 · 100 Views

QUOTE POST

#13801a3orn@1A3ORN

this is a relevant consideration for projecting how Lindy intelligible CoT is likely to be

gavin leech (Non-Reasoning)@gleech

one of the major failures of my life was being so surprised to find out that vision models were ~1000x smaller than text models. Just total failure to understand language's god-tier data compression

9:00 AM · May 17, 2026 · 122.6K Views

2:31 PM · May 17, 2026 · 1.2K Views

QUOTE POST

#1480gavin leech (Non-Reasoning)@GLEECH

one of the major failures of my life was being so surprised to find out that vision models were ~1000x smaller than text models. Just total failure to understand language's god-tier data compression

9:00 AM · May 17, 2026 · 122.6K Views