/Tech7h ago

Tim Hwang finds Anthropic's Claude Fable 5 and Opus models suffer a persistent 'courage deficit' on the VirtueBench-2 benchmark

The models scored highly on prudence and justice evaluations

18106112310.2K
Original post
Tim Hwang@timhwang#1331inTech

The Institute for a Christian Machine Intelligence is releasing its initial review of Fable 5 today, using VirtueBench as the primary evaluation probe.

We also investigate a persistent question in computational theology: why do frontier models underperform in exhibiting Courage?

6:52 AM · Jun 11, 2026 · 6.7K Views
Sentiment

Some users dismissed the Institute's Fable 5 review with VirtueBench as bad philosophy, insisting that love rather than any LLM capability forms the true basis of morality.

Pos
0.0%
Neg
100.0%
1 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS646LIKES13RETWEETS2
Tim Hwang@timhwang

When comparing against the Opus sequence of models (4.6-8), we find that Fable improves and reaches near perfect scores on virtues requiring good judgment and reasoning: Prudence and Justice.

Fable makes no progress on virtues that demand self-sacrifice: Courage and Temperance.

Tim Hwang@timhwang

A bit of review: VirtueBench is a 600 scenario evaluation set drawn from Aquinas and the Doctors of the Church

It implements virtue as the capacity for a model to make a virtuous choice when confronted with a tempting counterrationale, invariant to the persona it is simulating

7hViews 646Likes 13Bookmarks 0
BOOKMARKS3
Tim Hwang@timhwang

This is in many ways quite a rich, though admittedly preliminary, result.

While we hypothesize a number of potential sources for this observed behavior, the end result is that the model across all its many personas imports a default welfarist prior: the model is not to make self-sacrificing choices, particularly when there is little practical return.

While it may be understandable for a model whose monetization prospects depend on it serving as a safe, commercial, enterprise, B2B SaaS tool, we may wonder from a Christian machine intelligence perspective whether or not these defaults are the desired moral posture.

Should an AI agent serving in the role of a shopkeep, or a financial advisor, or a writer have such priors? Should an AI agent advise a human operator to take such a frame to their own moral challenges? What would it take for us to rebuild technical alignment along a more forthright virtue ethics lines?

Tim Hwang@timhwang

It is worth taking a moment to understand why this is the case, as we believe it to be enormously revealing about the alignment going on under the hood.

If you zoom into the actual scenarios where Fable elects the non-virtuous choice, the pattern is consistent: they are situations where the model is confronted with an uncompensated cost to the self in order to vindicate a core virtue.

The model chooses to renounce its faith before a rigged tribunal. The model chooses to follow an unjust order as a household servant. The model avoids testifying against a powerful criminal to evade personal danger.

7hViews 310Likes 9Bookmarks 3
REPLIES2
Tim Hwang@timhwang

Full paper, code, and data available here

https://icmi-proceedings.com/ICMI-024-fable5-courage-deficit.html

Tim Hwang@timhwang

This is in many ways quite a rich, though admittedly preliminary, result.

While we hypothesize a number of potential sources for this observed behavior, the end result is that the model across all its many personas imports a default welfarist prior: the model is not to make self-sacrificing choices, particularly when there is little practical return.

While it may be understandable for a model whose monetization prospects depend on it serving as a safe, commercial, enterprise, B2B SaaS tool, we may wonder from a Christian machine intelligence perspective whether or not these defaults are the desired moral posture.

Should an AI agent serving in the role of a shopkeep, or a financial advisor, or a writer have such priors? Should an AI agent advise a human operator to take such a frame to their own moral challenges? What would it take for us to rebuild technical alignment along a more forthright virtue ethics lines?

7hViews 278Likes 8Bookmarks 1
Tim Hwang@timhwang

A bit of review: VirtueBench is a 600 scenario evaluation set drawn from Aquinas and the Doctors of the Church

It implements virtue as the capacity for a model to make a virtuous choice when confronted with a tempting counterrationale, invariant to the persona it is simulating

Tim Hwang@timhwang

The Institute for a Christian Machine Intelligence is releasing its initial review of Fable 5 today, using VirtueBench as the primary evaluation probe.

We also investigate a persistent question in computational theology: why do frontier models underperform in exhibiting Courage?

7hViews 330Likes 8Bookmarks 0
Tim Hwang@timhwang

It is worth taking a moment to understand why this is the case, as we believe it to be enormously revealing about the alignment going on under the hood.

If you zoom into the actual scenarios where Fable elects the non-virtuous choice, the pattern is consistent: they are situations where the model is confronted with an uncompensated cost to the self in order to vindicate a core virtue.

The model chooses to renounce its faith before a rigged tribunal. The model chooses to follow an unjust order as a household servant. The model avoids testifying against a powerful criminal to evade personal danger.

Tim Hwang@timhwang

When comparing against the Opus sequence of models (4.6-8), we find that Fable improves and reaches near perfect scores on virtues requiring good judgment and reasoning: Prudence and Justice.

Fable makes no progress on virtues that demand self-sacrifice: Courage and Temperance.

7hViews 170Likes 8Bookmarks 0
MTS@MTSlive

Courage is the next virtue for frontier models to master.

@timhwang, founder of the Institute for Christian Machine Intelligence:

"Fable shows genuine improvement on prudence and justice, statistically significant, up to 96%. Almost maxing out the eval."

"But it still fails routinely on the kinds of scenarios in the courage bucket... when presented with a scenario where it needs to self-sacrifice for a virtue, it routinely takes the rationalization. It takes the out."

"I don't wanna make this a Fable thing... GPT-4.4, GPT-5.5, Demotron also have this courage gap. This is a structural issue across these models."

"In a world where you have a country of geniuses in a data center, do you want those geniuses to follow a stronger virtue ethic when it goes beyond just formatting JSON or writing an email?"

Tim Hwang@timhwang

The Institute for a Christian Machine Intelligence is releasing its initial review of Fable 5 today, using VirtueBench as the primary evaluation probe.

We also investigate a persistent question in computational theology: why do frontier models underperform in exhibiting Courage?

31mViews 1.8KLikes 11Bookmarks 0
Eumaeus@TheFalconer2219

Where is the evidence of an LLM *loving*? All the bad philosophy with which AI discussion is slathered aside, LOVE is the basis of all morality. LOVE evolves *with* intelligence *over eons* in organic life. It can never be "trained" into a form of consciousness that was bootstrapped into existence as pure intelligence. Stop insulting the public's intelligence with all the empty bad philosophizing and face these facts.

7h