Will Brown says the prime-rl framework enables online RL training by wrapping models behind inference endpoints
Google DeepMind's Andreas Kirsch initiated the architectural design debate
@xeophon To explain: this approach provides an inference endpoint that wraps the model under training so that you can train through regular inference-only code such as Claude Code or Codex without any adaptation of their code
Your current code still requires handwritten harnesses
@xeophon No, this is not what this is doing :) Indeed, this is the opposite approach to how you code up harnesses
Sorry replied before I saw this: what I mean is that you can train against Claude Code directly without any changes to it (or really anything else). From what I know, your interfaces are flexible in that you can swap out custom harnesses easily but this is about training against code that is not aware at all that it is used for generating training data
@BlackHC by „separate server“ I don’t mean that you need one server per harness
@willccbb @xeophon Oh nice! That's what I meant and I didn't know you already had that 😇
@BlackHC @xeophon this uses unmodified opencode source with an intercepted proxy server, and we shipped the earliest version of it back in november to support cline-bench (RIP) it's how we've done harbor-style tasks ever since
@BlackHC @xeophon we've supported this for many months now :)
@BlackHC @xeophon this uses unmodified opencode source with an intercepted proxy server, and we shipped the earliest version of it back in november to support cline-bench (RIP)
it's how we've done harbor-style tasks ever since