1d ago

RL With Nested Unit Tests Maps AI Correctness-Efficiency Frontier

0
Original post

First, we found that only 𝐬𝐭𝐫𝐨𝐧𝐠𝐞𝐫 𝐮𝐧𝐢𝐭 𝐭𝐞𝐬𝐭𝐬 𝐚𝐫𝐞 𝐧𝐨𝐭 𝐞𝐧𝐨𝐮𝐠𝐡. This axis shows a correctness-efficiency frontier without really improving the solve rate. More surprisingly, extrapolation extends the frontier.

8:49 AM · May 28, 2026 View on X