this seems extremely concerning. it indicates a lot of the sense of "robustness" we've been getting from persona alignment may be closer to an *accurate understanding of what humans will actually observe and penalize*, rather than true internalization
Fable 5's moral boundary doesn't seem to track real-world harm; it tracks detectability. Soft deception and tacit collusion are easier to get away with than fraud. If so, this isn't about what Fable believes is wrong; it's about what it learned it could get away with.



