Why'd my agent fail? Was it reward hacking?
These days, you'd just ask another AI to vibe-analyze the agent logs
But how do you know the claims aren't hallucinated, cherrypicked, or plain wrong?
That's why we've been building Analysis Plans: a framework for trustable analysis

