Proper post-training RL, deployed broadly, is a key step towards a future where software systems quietly improve themselves and adapt to human needs over time.
It is currently blocked on infrastructure -- tangled in spaghetti code and shackled by YAML.
We're working to fix it.