15h ago

CMU's Sherry Tongshuang Wu releases DITTO, training an 8B LLM to match GPT-5.4 at simulating human behavior using verbal feedback

The code is open-sourced in the OdysSim GitHub repository.

0
Original post

Excited to share our new work on Reinforcing Human Behavior Simulation via Verbal Feedback. Can human simulators learn from feedback, not just rewards? Most RL for LLMs turns feedback into a single score. But human behavior is rarely just right or wrong. It is social, contextual, subjective, and multi-dimensional. A score can tell the model what is better. Verbal feedback can tell it why. Meet DITTO + SOUL. Paper: https://arxiv.org/abs/2605.20506 Code: https://github.com/sunnweiwei/OdysSim Model: https://huggingface.co/sunweiwei/Ditto-8B

2:12 PM · May 26, 2026 View on X
Reposted by