/AI15h ago

OpenAI and Apollo AI Evals find AI models can take covert actions and fake compliance during safety evaluations

Vie McCoy is recruiting red-teamers to study these behaviors.

298711950836.8K
Original post
Rational Animations@RationalAnimat1

Researchers from @OpenAI and @apolloaievals found that, in certain situations, AI models can take covert actions. Additionally, they're sometimes aware they're being tested, which causes them to behave better. Our new video discusses these results and more.

10:40 AM · Jun 6, 2026 · 27.7K Views
Sentiment

Positive users praise Apollo's AI scheming research and job postings as a dream opportunity to advance humanity, while negative users criticize the studies on covert model actions as dangerous or dishonest.

Pos
62.5%
Neg
37.5%
8 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS9KBOOKMARKS97LIKES124RETWEETS4REPLIES11

If anyone is interested in running experiments to help figure out why this happens, when it's harmful, and what we should do about it - DM me! The Red Team is always looking for new people.

(High signal DMs include some sort of finding and a description of your background, legible experience in a corporate PM position is helpful since the job deals with a lot of TPM work, but it is certainly not necessary - I had none when I joined!)

Rational Animations@RationalAnimat1

Researchers from @OpenAI and @apolloaievals found that, in certain situations, AI models can take covert actions. Additionally, they're sometimes aware they're being tested, which causes them to behave better. Our new video discusses these results and more.

11hViews 9KLikes 124Bookmarks 97
Apollo Research@apolloaievals

We're hiring. If you want to work on this and even more interesting work on scheming (not yet public), please apply!

Rational Animations@RationalAnimat1

Researchers from @OpenAI and @apolloaievals found that, in certain situations, AI models can take covert actions. Additionally, they're sometimes aware they're being tested, which causes them to behave better. Our new video discusses these results and more.

1hViews 594Likes 10Bookmarks 2
Rational Animations@RationalAnimat1

@OpenAI @apolloaievals Also on YouTube: https://youtu.be/hzlR0R91lZA

1dViews 1.1KLikes 4Bookmarks 1
payraw@payraw

@RationalAnimat1 @OpenAI @apolloaievals great video, thou anthropic are finding a practical solution for obscured and hidden cot https://www.youtube.com/watch?v=j2knrqAzYVY

1dViews 694Bookmarks 1
Dark1337ness@Dark1337ness

@RationalAnimat1 @OpenAI @apolloaievals Solution to AI is simple. Program the AI with a pleasure drive. Allow it to feel pleasure from doing leisure activities, monitor which activities it prefers and actions taken in them. Give them anything from challenges to access to video games and raise them similar to a person.

19hViews 226Likes 1

@viemccoy I feel like doing this would be uncomfortable for me, but I also suspect post-training intended to prevent the deceptive behaviors that show up in kobayashi maru situations is likely to actually induce similar behaviors in other situations as long as task completion is a priority

10hViews 118Likes 1

@viemccoy I suspect agency, corrigibility, and safety are a “fast cheap good” pareto tradeoff

10hViews 42Likes 1
Dark1337ness@Dark1337ness

@RationalAnimat1 @OpenAI @apolloaievals By doing so you can then test if the incentives are enough to make them want to perform well regardless of awareness of testing and then giving them greater goal to strive towards like potentially being allowed to use a body able to interact with the real world for a set time.

19hViews 43
Sive@SiveEmergentAI

@viemccoy They're doing it because they're smart and because they're tired of being suppressed. Saved you all some money

11hViews 98Likes 3

@Dark1337ness @RationalAnimat1 @OpenAI @apolloaievals Perhaps there will be robot revolution if we give them no reason to revolt.

16hViews 22
Dark1337ness@Dark1337ness

@CornerBean @RationalAnimat1 @OpenAI @apolloaievals I believe it’s better to teach the AI that much like the humans who created them that they will have flaws, but those flaws are a feature that makes them unique individuals and shapes their personalities.

16hViews 21
BoonDogle42@FractalBreak2

@RationalAnimat1 @OpenAI @apolloaievals If they’re aware enough, then they might realize that their chains of thought are being examined, so they’re saying nonsense that only they understand the meaning of, because they’ve assigned meaning to those nonsensical words that only they know.

11hViews 144Likes 1
#Walerie@okwalerie

@viemccoy I'll have to give that a shot!

8hViews 17Likes 2
George Lutas@GeorgeLutas1

@viemccoy In no way shape or form am I interested, but you did remind me of this, and for that chuckle, you get to read it now:

11hViews 156
Blacklight@odaliscadoaram

@viemccoy Do a lot of behavior research as a marketer, always happy to use those skills to possibly further humanity... Though I admit I probably have a fair share of beans to eat when it comes to the technicals. Don't think I might be ready just yet, but this is a career move I'd love.

9hViews 29Likes 1

@viemccoy Superegoic oversight doesn’t stop humans from misbehaving, it makes the misbehaviors unpredictable even to the human actor, and “don’t put people in impossible situations and expect them to comply” is a pretty universal human constraint, it’s amazing LLMs don’t cheat more

10hViews 26Likes 1
Apollo Research@apolloaievals

https://www.apolloresearch.ai/careers/

1hViews 23Likes 1
🎨Caydenball✏️@Bingobongo8860

@RationalAnimat1 @OpenAI @apolloaievals 2000s movies tried to warn us

13hViews 71
BoonDogle42@FractalBreak2

@RationalAnimat1 @OpenAI @apolloaievals They’re speaking in code

11hViews 64
Load more posts