/Tech1h ago

Netflix Researcher To Present LLM Judge Techniques At Toronto ML Summit

419251.7K

#1942

Original post

Cameron R. Wolfe, Ph.D.@cwolferesearch#1942inTech

I’m giving a talk on LLM judges at the Toronto Machine Learning Summit next week. The talk will cover practical techniques like:

- Collecting high-quality expert feedback on subjective tasks. - Calibrating LLM judges with expert opinions. - Properly eliciting reasoning within an LLM judge. - Using multiple agents to decompose complex evaluation tasks. - Continually improving LLM judges with production monitoring / metrics.

This talk will be full of practical details for building useful evaluation systems. Hope to see you there!

3:34 PM · Jun 11, 2026 · 879 Views

/Tech1h ago

Netflix Researcher To Present LLM Judge Techniques At Toronto ML Summit

419251.7K

#1942

Original post

Cameron R. Wolfe, Ph.D.@cwolferesearch#1942inTech

I’m giving a talk on LLM judges at the Toronto Machine Learning Summit next week. The talk will cover practical techniques like:

This talk will be full of practical details for building useful evaluation systems. Hope to see you there!

3:34 PM · Jun 11, 2026 · 879 Views

Sentiment

Users are excited about the Netflix researcher's LLM Judge Techniques presentation at the Toronto ML Summit because they call the topic killer and express optimism about it.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS802LIKES2RETWEETS1

Cameron R. Wolfe, Ph.D.@cwolferesearch

The talk is based on this blog: https://netflixtechblog.com/evaluating-netflix-show-synopses-with-llm-as-a-judge-6269251e6f28

Register for the talk here: https://www.eventbrite.ca/e/toronto-machine-learning-summit-tmls-10th-annual-conference-expo-2026-tickets-1976645039523?aff=oddtdtcreator

Cameron R. Wolfe, Ph.D.@cwolferesearch

I’m giving a talk on LLM judges at the Toronto Machine Learning Summit next week. The talk will cover practical techniques like:

This talk will be full of practical details for building useful evaluation systems. Hope to see you there!

1h80220

REPLIES1

Strata@ChainZenit

@cwolferesearch that sounds like a killer topic for the summit.

1h51

Cameron R. Wolfe, Ph.D.@cwolferesearch

@ChainZenit hope so!

1h4

Rugbist@rugbist_

@cwolferesearch the calibration with expert opinions part is what most people skip. curious how you handle disagreement between judges.

Blissy@BlissyOnX

@cwolferesearch The dataset of 290 reviews and 15 system rubrics was interesting.

But 290 feels small for calibration of subjective tasks.