For a long time we didn't know if test-time randomization was needed for sample-optimal multicalibration and omniprediction. It's not. https://arxiv.org/abs/2606.20557
New Deterministic Algorithm Achieves Optimal Multicalibration and Omniprediction
No Digg Deeper questions have been answered for this story yet.
Most Activity
The best algorithms for multicalibration/omniprediction were for the online adversarial setting, where randomness really is needed. They could be applied to the standard distributional setting using an "online to batch reduction" but that introduced even more randomness.
For a long time we didn't know if test-time randomization was needed for sample-optimal multicalibration and omniprediction. It's not. https://arxiv.org/abs/2606.20557
You could try to derandomize these algorithms by fixing their randomness, but here's an obstruction. The distribution is uniform over (red, 1) and (blue, 0). The predictor predicts f(red) = 2/3) and f(blue) = 1/3 w.p. 2/3 and f(red) = 1/3 and f(blue) = 2/3 w.p. 1/3
The best algorithms for multicalibration/omniprediction were for the online adversarial setting, where randomness really is needed. They could be applied to the standard distributional setting using an "online to batch reduction" but that introduced even more randomness.

But what if your online learner had a hint: Every day t, it received an interval [a_t,b_t] and the promise that the true mean was in the interval. Now it can learn optimally and randomize only within the interval. But where can you get a hint from? Online it would be impossible.

But if you are running an online to batch reduction, you can get valid hints by forming confidence intervals around conditional means for points you see frequently in your training sample. The width of the interval hint now scales naturally with the frequency of the point.
This predictor is perfectly calibrated, but no deterministic predictor with range {1/3, 2/3} can be calibrated here, so fixing randomness fails. If you think about it for a bit, the problem is the combination of high variance of prediction and high weight in the distribution.
You could try to derandomize these algorithms by fixing their randomness, but here's an obstruction. The distribution is uniform over (red, 1) and (blue, 0). The predictor predicts f(red) = 2/3) and f(blue) = 1/3 w.p. 2/3 and f(red) = 1/3 and f(blue) = 2/3 w.p. 1/3

So do this. You get a randomized predictor whose feature conditional variance scales smoothly with the frequency of x. And once you have this you have avoided the obstruction in the red/blue example above, and fixing the randomness works!