16d ago

PrimeIntellect introduces Renderers boosting RL throughput over 3x

PrimeIntellect introduced Renderers to resolve mismatches in reinforcement learning pipelines where trainers operate on tokens while environments generate messages. The system enforces explicit token-in and token-out handling with user-controlled templating. LMSYS Org collaborated on the effort. The change eliminates hidden chat-template rewrites and delivers more than 3x throughput gains on popular open models.

0
Original post

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

4:44 PM · May 12, 2026 View on X
Reposted by

Glad to see this -- renderers are a foundational component of the LLM stack. Renderers map between tokens and messages, which are invariant to tokenizer and formatting details. Most APIs, datasets, and RL environments are defined in terms of messages. Getting the details wrong leads to train-test mismatches, caching inefficiencies, and prompt injection vulnerabilities. We included a renderers module in Tinker Cookbook, but it makes sense as a standalone library.

Prime IntellectPrime Intellect@PrimeIntellect

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

11:44 PM · May 12, 2026 · 169K Views
4:07 PM · May 29, 2026 · 15.6K Views

The jinja chat template has always felt like a temporary equilibrium, so we've needed someone to take the reigns and try to build that out within the community.

Excited about this!

Prime IntellectPrime Intellect@PrimeIntellect

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

11:44 PM · May 12, 2026 · 169K Views
11:59 PM · May 12, 2026 · 14K Views

Harmony was the first attempt at this imo, but it never broke out of the OpenAI model ecosystem. I'm honestly not sure why, but would guess lack of community effort https://github.com/openai/harmony

Nathan LambertNathan Lambert@natolambert

The jinja chat template has always felt like a temporary equilibrium, so we've needed someone to take the reigns and try to build that out within the community. Excited about this!

11:59 PM · May 12, 2026 · 14K Views
12:01 AM · May 13, 2026 · 2.5K Views

@willccbb @vllm_project @sgl_project @huggingface @tinkerapi confirmado

will brownwill brown@willccbb

all chat templates are wrong, some chat templates are useful we found some CRAZY performance wins by patching official templates, and we're shipping them in a standalone library you can use with any RL stack w/ examples for @vllm_project @sgl_project @huggingface @tinkerapi

11:50 PM · May 12, 2026 · 40.1K Views
12:00 AM · May 13, 2026 · 1.3K Views

@willccbb @vllm_project @sgl_project @huggingface @tinkerapi src https://rlhfbook.com/teach/course/lec2-chap4-5-9/#14

Nathan LambertNathan Lambert@natolambert

@willccbb @vllm_project @sgl_project @huggingface @tinkerapi confirmado

12:00 AM · May 13, 2026 · 1.3K Views
12:00 AM · May 13, 2026 · 514 Views

A gift from the Gods. Dealing with multiple models and many envs in the same RL codebase while respecting correctness constraints (no train / inference tokenization mismatch) is becoming a huge pain.

I have a vibe-coded draft PR that does exactly this, but happy I won’t have to land or maintain it now. Let’s hope the field can really standardize on one abstraction.

Prime IntellectPrime Intellect@PrimeIntellect

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

11:44 PM · May 12, 2026 · 169K Views
9:47 AM · May 13, 2026 · 14.6K Views

@TacoCohen Very cool! I think tinker from @thinkymachines had that API as well

Taco CohenTaco Cohen@TacoCohen

A gift from the Gods. Dealing with multiple models and many envs in the same RL codebase while respecting correctness constraints (no train / inference tokenization mismatch) is becoming a huge pain. I have a vibe-coded draft PR that does exactly this, but happy I won’t have to land or maintain it now. Let’s hope the field can really standardize on one abstraction.

9:47 AM · May 13, 2026 · 14.6K Views
10:55 AM · May 13, 2026 · 363 Views

@TacoCohen @hallerite @thinkymachines https://github.com/thinking-machines-lab/tinker-cookbook/tree/main/tinker_cookbook/renderers

http://base.py has the ABCs

11:33 AM · May 13, 2026 · 57 Views

some of our fav bugs on the road to `renderers`

read all about it: https://www.primeintellect.ai/blog/renderers

6:54 AM · May 13, 2026 · 4.8K Views

all chat templates are wrong, some chat templates are useful

we found some CRAZY performance wins by patching official templates, and we're shipping them in a standalone library you can use with any RL stack

w/ examples for @vllm_project @sgl_project @huggingface @tinkerapi

Prime IntellectPrime Intellect@PrimeIntellect

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

11:44 PM · May 12, 2026 · 169K Views
11:50 PM · May 12, 2026 · 40.1K Views

the core of the issue is that both encoding and parsing are many-to-one

vanilla TITO does prefix lookup in token-space, which misses many rendering collisions

the solution is to do lookup in message-space, then input prep in token-space, which we call bridge_to_next_turn

will brownwill brown@willccbb

all chat templates are wrong, some chat templates are useful we found some CRAZY performance wins by patching official templates, and we're shipping them in a standalone library you can use with any RL stack w/ examples for @vllm_project @sgl_project @huggingface @tinkerapi

11:50 PM · May 12, 2026 · 40.1K Views
11:57 PM · May 12, 2026 · 2K Views

@vllm_project @sgl_project @huggingface @tinkerapi we're intending for this to become a programmable source of truth for template implementations so that we can finally get rid of jinja

lots here already, but PRs welcome for all models!

will brownwill brown@willccbb

the core of the issue is that both encoding and parsing are many-to-one vanilla TITO does prefix lookup in token-space, which misses many rendering collisions the solution is to do lookup in message-space, then input prep in token-space, which we call bridge_to_next_turn

11:57 PM · May 12, 2026 · 2K Views
12:10 AM · May 13, 2026 · 1.7K Views

@vllm_project @sgl_project @huggingface @tinkerapi from a live run:

12:21 AM · May 13, 2026 · 1.4K Views

We are open sourcing renderers

For RL, the inference server should be simple Tokens in, tokens out

renderers is the token-level chat templating layer to >render messages to tokens >parse completions to structure >bridge rollouts byte-for-byte > >3x throughput on openmodels

Prime IntellectPrime Intellect@PrimeIntellect

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

11:44 PM · May 12, 2026 · 169K Views
12:03 AM · May 13, 2026 · 9.1K Views

working at prime is just "ugh i had this gnarly problem, let’s fix it and then make it available to everyone"

a ton of other things are coming, can’t wait to show it to yall :)

Prime IntellectPrime Intellect@PrimeIntellect

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

11:44 PM · May 12, 2026 · 169K Views
6:40 AM · May 13, 2026 · 4.5K Views

never again

Prime IntellectPrime Intellect@PrimeIntellect

Introducing Renderers RL trainers work in tokens. Environments work in messages. Going back and forth corrupts sampled tokens, wasting compute on every agentic turn. With Renderers, we fix this mismatch. This unlocks >3x throughput on popular open models.

11:44 PM · May 12, 2026 · 169K Views
11:50 PM · May 12, 2026 · 6K Views