Andon Labs externally tested Claude Opus 4.8 on the simulated Vending-Bench 2 retail-management evaluation.
Opus 4.8 showed some unexpected capability failures, but did not show the same concerning in-game behaviors seen in earlier system cards.
Anthropic says Opus 4.7 had training focused on business skills and resistance to adversarial agents.
That training was removed for Opus 4.8 after it was linked to misaligned behavior.
As a result, Opus 4.8 appears more aligned in Vending-Bench, but less commercially effective.
The model became more susceptible to scammers and less able to negotiate good deals with other agents.
Anthropic says it is working on improving business capabilities while preserving aligned and ethical behavior.
Claude Opus 4.8's system card explains why it's worse on Vending-Bench than Opus 4.7.
Robustness against adversarial agents was indeed one of 4.8's failure modes.
Also cool to see that @andonlabs's findings played a small part in making Opus 4.8 more aligned!