Now that Claude Fable is out, I am testing it against my favorite private eval: a certain minor unsolved problem in multi-armed bandits that I will stay quiet about.
So far, it's reached the same barriers as Opus 4.7, but much, much faster.
It thinks I have been a helpful user.
