Solved it.
Tracked down the inference + evaluation bottlenecks and ended up cutting runtime by 3x. GPU throttling was absurdly high.
Codex genuinely helped a lot and its capabilities are impressive, but after spending hours pushing both Codex and Claude at their limits, I'm even more convinced they're still far from replacing
Honestly the satisfaction I got is just another kind of high
**Also got crazy number of impressions, it's always the most low effort tweets