But if you are running an online to batch reduction, you can get valid hints by forming confidence intervals around conditional means for points you see frequently in your training sample. The width of the interval hint now scales naturally with the frequency of the point.
But what if your online learner had a hint: Every day t, it received an interval [a_t,b_t] and the promise that the true mean was in the interval. Now it can learn optimally and randomize only within the interval. But where can you get a hint from? Online it would be impossible.