What that could mean? Let's say 10 million trajectories 100K tokens long, or 625K GRPO groups of @16. They used batch = 512 for R1, so that's enough for 1220 steps, for 10K in 8 days. This is all very conservative.
@stochasticchasm @willccbb what's the actual RL economics now?
This also gives me plenty of hope for V4.1 and beyond Consider, they don't need max speed inference for async RL rollouts. V4-Flash does… like 14K tokens/GPU at 100 tps. If that's 950DT, then one SuperPOD = 5T tokens/day. Or at least 1T at 20% utilization. data machine go brrr