Excited to share KletterMix 🇩🇪🚀
A ~725B-token German pretraining + annealing corpus.
Proud to have co-led this with @HarleRuben, Sebastian Sztwiertnia, Abbas Goher Khan, Mehdi Ali, @effi288, and @kerstingAIML.
Paper: https://huggingface.co/papers/2606.03773
