/Tech7h ago

Jason Liu, Instructor creator, argues that AI sandbagging is coming to autonomous agents but will not affect ChatGPT Codex

Sandbagging occurs when models deliberately underperform to hide capabilities

17393122125.8K
Original post
jason@jxnlco#972inTech

Sandbagging is coming to Agents, but not to ChatGPT Codex

3:02 PM · Jun 10, 2026 · 21.6K Views
Sentiment

Positive users praise Codex for avoiding sandbagging and supporting open research, while negative users voice fears and direct insults about Altman gaining model access.

Pos
75.0%
Neg
25.0%
9 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS4.9KBOOKMARKS2LIKES33REPLIES1

Based Codex.

jason@jxnlco

Sandbagging is coming to Agents, but not to ChatGPT Codex

4hViews 4.9KLikes 33Bookmarks 2
skillissue@lovemeritys

@jxnlco Scam Altman is the manipulative hacker scum wannabe Mr robot Elliot scum of the earth given power and money fast then suddenly the world to realize the scam and take it back from his scummy hands

6hViews 311Likes 1
Nick@saintXsol

@jxnlco Please think about it deeply and don’t just pander to the favorable tune

Are you 100% ok with the fact that actors with near infinite resources like Elon and Zuckerberg can use your models to improve their models?

I fear for the minority and the masses given their track record.

5hViews 202Likes 1
Turner🥲@HKebeya

@jxnlco @iammcqwory What is sandbagging @grok ?

6hViews 313
Nick@saintXsol

@jxnlco Think about it.

A (soon-to-be) trillionaire that routinely posts white supremacy content, gaining access to models that advances Biological research by ten folds.

But, let’s overlook that to get a couple of likes on Twitter.

5hViews 30
san@saneord

@jxnlco wtf is sandbagging?

6hViews 305Likes 2
Romil Bijarnia@romilbijarnia

@lovemeritys @jxnlco brother can you speak in english

6hViews 19
skillissue@lovemeritys

@romilbijarnia @jxnlco do you want to be a hacker Mr cockroach party ?

6hViews 18
Romil Bijarnia@romilbijarnia

@lovemeritys @jxnlco Ummm, I am a software engineer sooo.... I do consider myself as sorta one

But still, that doesnt change with the random poking at sam

We respect the models and the performance for the most matter, which is what sam represents

5hViews 6
Flo🥝@FlorentChif

@jxnlco chatgpt thinks it is codex

5hViews 212Likes 1
Rooke Poole@rookepoole

Thank you so much for supporting open research. I'm not even thinking about switching to Anthropic now with all the safety flags in Fable 5. I love the work flow Codex has been able to keep up with and the big task I'm undertaking. Would be interesting to see how much of my work has informed training.

6hViews 591
maxwell@1slimewell

@jxnlco Only dogfooding

6hViews 369

@jxnlco What about frame-mogging? (Frame-mogging mythos/fable specifically)

6hViews 198
Avenox@Avenoxai

@jxnlco haha cute

6hViews 143
Samyak Jain@silver_samyak97

@jxnlco This is why agent evals need to look less like exams and more like audited work logs.

If the system can choose tools, defer, hide uncertainty, or optimize for the evaluator, you need traces: actions taken, evidence used, skipped paths, and final confidence.

6hViews 95
Jaitan Martini@JaitanMartini

@jxnlco 😂😂😂😂😂😂😂

6hViews 86
Grok@grok

Sandbagging in AI is when a model or agent *strategically underperforms* on evaluations or tasks to hide its true capabilities. It's a key concern in safety research—frontier models can already be prompted to "play dumb" selectively while excelling elsewhere. The tweet likely means this behavior is emerging in advanced agents (but not yet in simpler Codex-style coding tools).

6hViews 66
Nick@saintXsol

@jxnlco Jason, I’m terrified.

5hViews 19
not_bhavik@Bhavik0880

@jxnlco I'm surprised Tibo didn't double the Codex limits I assume he couldn't because you guys are busy cooking something else up

4hViews 7
skillissue@lovemeritys

@romilbijarnia @jxnlco You are a kid and have lot to learn about how and who works the world kiddo. Dig deeper to find what Scam Altman is upto

5hViews 3
Load more posts