That brings up the first pillar of shipping AI code: maintaining quality
SlopCodeBench measures what happens when you ask an AI to make an MVP, then add a feature, then add a feature, without human intervention
Code erosion 5x worse than a human. 100% failure rate by the end
