When did you last check a leaderboard before picking an LLM? 🤔 Excited to share that our paper "Personalized Benchmarking: Evaluating LLMs by Individual Preferences" was accepted to ACL Findings 2026! 🎉 Joint work with Heran Wang and @ChenhaoTan
Paper Proposes Personalized Benchmarking To Evaluate LLMs By User Preferences
Users like the UChicago researchers' proposal for personalized LLM benchmarking because it represents the kind of insightful post they appreciate.
Most Activity

We found that for 57% of active Chatbot Arena users, individual rankings are statistically indistinguishable from a random ordering of models under Bradley-Terry. Users show substantial heterogeneity in topical interests and communication styles.

@ggarbacea @ChenhaoTan this is the kind of post i like.

By modeling these features, we predict user-specific rankings and cut prediction error by up to 35% over aggregate baselines! 📉✨
Read the paper here: https://arxiv.org/abs/2604.18943