1d agoNew Introspective Training Method Boosts LLM Efficiency With Reward Feedback——0——Original postP(#461@RAJAMMANABROLUOPDADavid Acuna|@DAVIDJESUSACUData quality isn’t just a filter! It can be made explicit as feedback and conditioned on during training 💡 In our new preprint, we show we can use a thinking reward model to add quality-aware feedback—making the same underlying training data scale better! 🚀 Check it out 👇6:45 PM · May 24, 2026 View on X