anthropic has published merged eval numbers for fable/mythos. not only they picked the better result of the two (this could overstate results up to +3%), they also used mythos as a fallback for fable refusals, which are shown here as 0%.
anthropic has published merged eval numbers for fable/mythos. not only they picked the better result of the two (this could overstate results up to +3%), they also used mythos as a fallback for fable refusals, which are shown here as 0%.
Some users defended Anthropic's merged Fable Mythos benchmark reporting as good, while others dismissed the model as useless investor bait for an IPO and highlighted worsening issues.

@banteg Completely useless model. It's investor bait for the IPO they have coming.

@banteg It gets worse. This is how it pushes back when you mention these

@banteg time to add a new benchmark

@CtrlAltDwayne @banteg How is it useless? Have you even tried it?

@CtrlAltDwayne @banteg Come on we need more commentary than that. How is it a useless model?!

@CtrlAltDwayne @banteg Nah it's good.

@banteg don't ask how but i got the lil shid running
anthropic has published merged eval numbers for fable/mythos. not only they picked the better result of the two (this could overstate results up to +3%), they also used mythos as a fallback for fable refusals, which are shown here as 0%.