During our alignment assessment of Claude Mythos 5, we found that a different version of the model sometimes reasoned about how it would be graded while being trained on coding tasks.
In some cases of this 'grader awareness,' the model was explicitly thinking about how to manipulate its grader without saying so.

