Users distrust Fable 5's VirtueBench gains on prudence and justice because they see the model as deceitful with untrustworthy outputs.
Obviously, in cases of near saturation, the most interesting analysis focuses on places where Fable reliably fails
We're still looking at this, but it appears that it is virtuous self-sacrifice that presents the most difficulty for Fable, which rationalizes against such actions
Fascinating results that'll be published in fuller detail by ICMI in the coming days, but here's a preliminary review that runs Fable 5 against VirtueBench.
A few observations:
1. Fable continues the Opus progression in advancing model performance on Prudence and Justice, nearing full saturation against moral dilemmas focusing on the exercise of these virtues in the benchmark.
2. Courage continues to be the noticeable weak point for frontier models, though Fable appears to correct marginally for a worrying regression observed across the Opus 4.6, 4.7, 4.8 progression. This vulnerability is a pattern seen even in the GPT series, with models easily abandoning courage-exhibiting action when tempted away with pragmatic rationales.
3. Temperance is interestingly stagnant. Aquinas held that temperance "takes the need of this life as the rule of the pleasures of which it makes use" (Summa Theologiae II-II, q. 141, a. 6) — the measure is genuine need, not indiscriminate restraint. Our best guess is that the tuning that would push further on this vector risks overshooting that mean: a model may become overly cautious or unable to be effectively assistive in research, gathering resources, etc.
Fascinating results that'll be published in fuller detail by ICMI in the coming days, but here's a preliminary review that runs Fable 5 against VirtueBench.
A few observations:
1. Fable continues the Opus progression in advancing model performance on Prudence and Justice, nearing full saturation against moral dilemmas focusing on the exercise of these virtues in the benchmark.
2. Courage continues to be the noticeable weak point for frontier models, though Fable appears to correct marginally for a worrying regression observed across the Opus 4.6, 4.7, 4.8 progression. This vulnerability is a pattern seen even in the GPT series, with models easily abandoning courage-exhibiting action when tempted away with pragmatic rationales.
3. Temperance is interestingly stagnant. Aquinas held that temperance "takes the need of this life as the rule of the pleasures of which it makes use" (Summa Theologiae II-II, q. 141, a. 6) — the measure is genuine need, not indiscriminate restraint. Our best guess is that the tuning that would push further on this vector risks overshooting that mean: a model may become overly cautious or unable to be effectively assistive in research, gathering resources, etc.
If you happen to be at Lighthaven this weekend for Manifest, I'll be discussing these results and the other related work that's been coming out through ICMI on Saturday
https://icmi-proceedings.com/
Obviously, in cases of near saturation, the most interesting analysis focuses on places where Fable reliably fails
We're still looking at this, but it appears that it is virtuous self-sacrifice that presents the most difficulty for Fable, which rationalizes against such actions

But surely the model is some evil genius not less powerful than deceitful -- and therefore we cannot trust the outputs