Cohere Labs Research Shows Multilingual Tokenizers Boost Language Plasticity

Original post

Huge congrats to @dianaabagyan who will be presenting this work next week at ACL.

We asked what relatively cheap interventions like tokenizer design early on in training improve "language plasticity" of the model post-training to adapt to new languages. 🎉🔥

Cohere Labs@Cohere_Labs

We’re thrilled to share that research from Cohere Labs and @Cohere will be presented at ACL this week! 🥳

🌐One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers. Congrats to authors @dianaabagyan, @alexrs95, Andres Felipe Cruz-Salinas, @kroscoo, Hangyu Lin, @acyr_l, @mziizm, @ahmetustun89, @sarahookr (https://arxiv.org/abs/2506.10766)

📚Is a Document Educational or Just Wikipedia-Style? Congrats to authors @m_klimasz & Piotr Andruszkiewicz (https://arxiv.org/abs/2605.23721)

✅Check Your Work: Structured Checklist Feedback for Improving Large Language Models. Congrats to authors Jonathan Cook, @_rockt, Jakob Nicolaus Foerster, @d_aumiller, Alex Wang (https://arxiv.org/html/2410.03608v1)

🛡️Robustness of Cultural Norm Reasoning Under Language and Context Perturbations. Congrats to authors Ankita Maity, Sajag Swami, Van Ngo, Akhil Arora, @nikita_moghe (https://openreview.net/forum?id=7mZvGJHeMN)

🎮Lost in Simulation: LLM-Simulated Users are Unreliable Proxies for Human Users in Agentic Evaluations. Congrats to authors @Preethi__S_, @SCahyawijaya, Ayomide Odumakinde, Sameer Singh, @seraphinagt (https://arxiv.org/abs/2601.17087)

4:24 AM · Jul 3, 2026 · 1.9K Views