Fascinating results that'll be published in fuller detail by ICMI in the coming days, but here's a preliminary review that runs Fable 5 against VirtueBench.
A few observations:
1. Fable continues the Opus progression in advancing model performance on Prudence and Justice, nearing full saturation against moral dilemmas focusing on the exercise of these virtues in the benchmark.
2. Courage continues to be the noticeable weak point for frontier models, though Fable appears to correct marginally for a worrying regression observed across the Opus 4.6, 4.7, 4.8 progression. This vulnerability is a pattern seen even in the GPT series, with models easily abandoning courage-exhibiting action when tempted away with pragmatic rationales.
3. Temperance is interestingly stagnant. Aquinas held that temperance "takes the need of this life as the rule of the pleasures of which it makes use" (Summa Theologiae II-II, q. 141, a. 6) — the measure is genuine need, not indiscriminate restraint. Our best guess is that the tuning that would push further on this vector risks overshooting that mean: a model may become overly cautious or unable to be effectively assistive in research, gathering resources, etc.
