I’m giving a talk on LLM judges at the Toronto Machine Learning Summit next week. The talk will cover practical techniques like:
- Collecting high-quality expert feedback on subjective tasks. - Calibrating LLM judges with expert opinions. - Properly eliciting reasoning within an LLM judge. - Using multiple agents to decompose complex evaluation tasks. - Continually improving LLM judges with production monitoring / metrics.
This talk will be full of practical details for building useful evaluation systems. Hope to see you there!



