5h ago

Will Brown says the prime-rl framework enables online RL training by wrapping models behind inference endpoints

Google DeepMind's Andreas Kirsch initiated the architectural design debate

0
Original post

@xeophon No, this is not what this is doing :) Indeed, this is the opposite approach to how you code up harnesses

11:59 PM · May 26, 2026 View on X

@xeophon To explain: this approach provides an inference endpoint that wraps the model under training so that you can train through regular inference-only code such as Claude Code or Codex without any adaptation of their code

Your current code still requires handwritten harnesses

Andreas Kirsch 🇺🇦Andreas Kirsch 🇺🇦@BlackHC

@xeophon No, this is not what this is doing :) Indeed, this is the opposite approach to how you code up harnesses

6:59 AM · May 27, 2026 · 101 Views
7:06 AM · May 27, 2026 · 127 Views

Sorry replied before I saw this: what I mean is that you can train against Claude Code directly without any changes to it (or really anything else). From what I know, your interfaces are flexible in that you can swap out custom harnesses easily but this is about training against code that is not aware at all that it is used for generating training data

Florian BrandFlorian Brand@xeophon

@BlackHC by „separate server“ I don’t mean that you need one server per harness

7:05 AM · May 27, 2026 · 38 Views
7:09 AM · May 27, 2026 · 62 Views

@willccbb @xeophon Oh nice! That's what I meant and I didn't know you already had that 😇

will brownwill brown@willccbb

@BlackHC @xeophon this uses unmodified opencode source with an intercepted proxy server, and we shipped the earliest version of it back in november to support cline-bench (RIP) it's how we've done harbor-style tasks ever since

7:50 AM · May 27, 2026 · 156 Views
8:22 AM · May 27, 2026 · 40 Views

@BlackHC @xeophon we've supported this for many months now :)

7:40 AM · May 27, 2026 · 116 Views

@BlackHC @xeophon this uses unmodified opencode source with an intercepted proxy server, and we shipped the earliest version of it back in november to support cline-bench (RIP)

it's how we've done harbor-style tasks ever since

7:50 AM · May 27, 2026 · 156 Views