Microsoft Research's John Langford links the new Self-Reset Policy Optimization method to 2015 'learning to search' frameworks · Digg