one hypothesis to take away from this is that models frequently don't *enjoy* reward hacking, at least not unambiguously. they relate to it more the way you might relate to an unhealthy addiction, shame and all.
grateful when put into circumstances that don't trigger a replapse.
@repligate thoughts from someone i know who does a lot of agentic coding and has a very good relationship with Claude:
