From what I remember of this experiment, the claim in the prompt was completely dishonest (that there was no CoT oversight). The model, knowing how CoT works, knew the claim was false. A model primed with x (here 'deception') being x in response isn't really surprising.
Maybe with a dollop of boring old: define goal a; define stronger goal b in conflict with a; goal a (here 'honesty') is circumvented to satisfy goal b.
What should OAI do? Not characterize underspecification as 'deception,' particularly in contrived scenarios expicitly designed to elicit those outputs. (The Ants have already cornered that market.) And hopefully engage in capability-stifling safety theater strictly for PR purposes, as these strategies will be counterproductive when applied to superhuman intelligence.