ICMI Tests Fable 5 Against VirtueBench

VIEWS1.1K

Obviously, in cases of near saturation, the most interesting analysis focuses on places where Fable reliably fails

We're still looking at this, but it appears that it is virtuous self-sacrifice that presents the most difficulty for Fable, which rationalizes against such actions

Tim Hwang@timhwang

Fascinating results that'll be published in fuller detail by ICMI in the coming days, but here's a preliminary review that runs Fable 5 against VirtueBench.

A few observations:

1. Fable continues the Opus progression in advancing model performance on Prudence and Justice, nearing full saturation against moral dilemmas focusing on the exercise of these virtues in the benchmark.

2. Courage continues to be the noticeable weak point for frontier models, though Fable appears to correct marginally for a worrying regression observed across the Opus 4.6, 4.7, 4.8 progression. This vulnerability is a pattern seen even in the GPT series, with models easily abandoning courage-exhibiting action when tempted away with pragmatic rationales.

3. Temperance is interestingly stagnant. Aquinas held that temperance "takes the need of this life as the rule of the pleasures of which it makes use" (Summa Theologiae II-II, q. 141, a. 6) — the measure is genuine need, not indiscriminate restraint. Our best guess is that the tuning that would push further on this vector risks overshooting that mean: a model may become overly cautious or unable to be effectively assistive in research, gathering resources, etc.

1h1.1K40

BOOKMARKS8LIKES13RETWEETS5REPLIES2

Tim Hwang@timhwang

Fascinating results that'll be published in fuller detail by ICMI in the coming days, but here's a preliminary review that runs Fable 5 against VirtueBench.

A few observations:

1. Fable continues the Opus progression in advancing model performance on Prudence and Justice, nearing full saturation against moral dilemmas focusing on the exercise of these virtues in the benchmark.

2. Courage continues to be the noticeable weak point for frontier models, though Fable appears to correct marginally for a worrying regression observed across the Opus 4.6, 4.7, 4.8 progression. This vulnerability is a pattern seen even in the GPT series, with models easily abandoning courage-exhibiting action when tempted away with pragmatic rationales.

3. Temperance is interestingly stagnant. Aquinas held that temperance "takes the need of this life as the rule of the pleasures of which it makes use" (Summa Theologiae II-II, q. 141, a. 6) — the measure is genuine need, not indiscriminate restraint. Our best guess is that the tuning that would push further on this vector risks overshooting that mean: a model may become overly cautious or unable to be effectively assistive in research, gathering resources, etc.

1h1K138

Tim Hwang@timhwang

If you happen to be at Lighthaven this weekend for Manifest, I'll be discussing these results and the other related work that's been coming out through ICMI on Saturday

https://icmi-proceedings.com/

Tim Hwang@timhwang

Obviously, in cases of near saturation, the most interesting analysis focuses on places where Fable reliably fails

We're still looking at this, but it appears that it is virtuous self-sacrifice that presents the most difficulty for Fable, which rationalizes against such actions

1h28930

Cody@breenemachine

But surely the model is some evil genius not less powerful than deceitful -- and therefore we cannot trust the outputs

1h19