6h agoNew RL Framework LRPO Lets LLMs Learn from Cross-Lingual Generations——0——Original postWX#415@COCOWEIXUOPGGGeyang Guo|@CHERYLOLGUOKnowledge is not evenly distributed across languages. For a given question, what if better information comes from another language? 🔥 We propose Language-Routed Policy Optimization (LRPO), an online reinforcement learning framework that lets LLMs explore and learn from cross-lingual generations during training. #ICML202610:56 AM · May 27, 2026 View on X