/AI3h ago

Researchers Release KletterMix, 725B-Token German Pretraining Corpus

--0--
Original posts
Reposts

Excited to share KletterMix 🇩🇪🚀

A ~725B-token German pretraining + annealing corpus.

Proud to have co-led this with @HarleRuben, Sebastian Sztwiertnia, Abbas Goher Khan, Mehdi Ali, @effi288, and @kerstingAIML.

Paper: https://huggingface.co/papers/2606.03773

8:56 AM · Jun 4, 2026 · 98 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
No ranked X posts are available for this story yet.