16h ago

Inference Backend Choice Shifts LLM Benchmark Scores by 16.6 Points

0
Original post

> the choice of [inference] backend alone can shift benchmark scores by up to 16.6 percentage points and induce high rates of output disagreement curious how many LLM RL methods would replicate across inference backends

5:11 AM · May 25, 2026 View on X