2h ago

Keshigeyan Chandrasegaran and Kyle Sargent launch GPIC, a permissive image-text dataset and benchmark for training visual models

The 28-trillion-pixel corpus is fully permissive for commercial use.

0
Original post

1/ Introducing GPIC: a Giant Permissive Image Corpus and benchmark for visual generation! 🚀100M VLM-captioned image-text pairs for training 📊1M image-text pairs for benchmarking 🖼️~28 trillion pixels 🤗Centrally Hosted ✅Fully permissive for research + commercial use Dataset, benchmark and models🧵👇 Co-led with @KyleSargentAI

9:30 AM · May 29, 2026 View on X

I’m very excited by this new benchmark dataset for visual generation that is suitable for the modern era of large scale generative models!🤩

Keshigeyan ChandrasegaranKeshigeyan Chandrasegaran@keshigeyan

1/ Introducing GPIC: a Giant Permissive Image Corpus and benchmark for visual generation! 🚀100M VLM-captioned image-text pairs for training 📊1M image-text pairs for benchmarking 🖼️~28 trillion pixels 🤗Centrally Hosted ✅Fully permissive for research + commercial use Dataset, benchmark and models🧵👇 Co-led with @KyleSargentAI

4:30 PM · May 29, 2026 · 9.2K Views
4:56 PM · May 29, 2026 · 6.6K Views
Keshigeyan Chandrasegaran and Kyle Sargent launch GPIC, a permissive image-text dataset and benchmark for training visual models · Digg