5h ago

Will Brown says the prime-rl framework enables online RL training by wrapping models behind inference endpoints

Google DeepMind's Andreas Kirsch initiated the architectural design debate

6600589

——0——

Original post

#228Andreas Kirsch 🇺🇦@BLACKHC

@xeophon No, this is not what this is doing :) Indeed, this is the opposite approach to how you code up harnesses

11:59 PM · May 26, 2026

#228Andreas Kirsch 🇺🇦@BLACKHC

@xeophon To explain: this approach provides an inference endpoint that wraps the model under training so that you can train through regular inference-only code such as Claude Code or Codex without any adaptation of their code

Your current code still requires handwritten harnesses

Andreas Kirsch 🇺🇦@BlackHC

@xeophon No, this is not what this is doing :) Indeed, this is the opposite approach to how you code up harnesses

6:59 AM · May 27, 2026 · 101 Views

7:06 AM · May 27, 2026 · 127 Views

#228Andreas Kirsch 🇺🇦@BLACKHC

Sorry replied before I saw this: what I mean is that you can train against Claude Code directly without any changes to it (or really anything else). From what I know, your interfaces are flexible in that you can swap out custom harnesses easily but this is about training against code that is not aware at all that it is used for generating training data

Florian Brand@xeophon

@BlackHC by „separate server“ I don’t mean that you need one server per harness

7:05 AM · May 27, 2026 · 38 Views

7:09 AM · May 27, 2026 · 62 Views

#228Andreas Kirsch 🇺🇦@BLACKHC

@willccbb @xeophon Oh nice! That's what I meant and I didn't know you already had that 😇

will brown@willccbb

@BlackHC @xeophon this uses unmodified opencode source with an intercepted proxy server, and we shipped the earliest version of it back in november to support cline-bench (RIP) it's how we've done harbor-style tasks ever since

7:50 AM · May 27, 2026 · 156 Views

8:22 AM · May 27, 2026 · 40 Views

QUOTE POST

#339will brown@WILLCCBB

@BlackHC @xeophon we've supported this for many months now :)

7:40 AM · May 27, 2026 · 116 Views

QUOTE POST

#339will brown@WILLCCBB

@BlackHC @xeophon this uses unmodified opencode source with an intercepted proxy server, and we shipped the earliest version of it back in november to support cline-bench (RIP)

it's how we've done harbor-style tasks ever since

7:50 AM · May 27, 2026 · 156 Views