Opus is a smaller, less capable model. The score you are seeing is a ~5-10% weighted average effectively of Opus performance.
But basically across the board opus performs within 10% of the scores Fable on all but a few areas (eg cyberSec), so the score difference we're talking about here is ~0.6 points. Within any reasonable margin of error.
If the premise is that they're cherry picking and Opus is better at those areas, no, there's absolutely no evidence to support that.
Anthropic still has access to those topics/areas where Fable outperforms Opus, they just don't share it externally/outside select partners. There's already jailbreak examples of mythos operating in its 'prohibited' areas where its highly performant (relative to opus), but of course those dont belong in a bench since they're not the actually served product.