Jason Liu, Instructor creator, argues that AI sandbagging is coming to autonomous agents but will not affect ChatGPT Codex
Sandbagging occurs when models deliberately underperform to hide capabilities
Positive users praise Codex for avoiding sandbagging and supporting open research, while negative users voice fears and direct insults about Altman gaining model access.
Most Activity
Based Codex.
Sandbagging is coming to Agents, but not to ChatGPT Codex

@jxnlco Scam Altman is the manipulative hacker scum wannabe Mr robot Elliot scum of the earth given power and money fast then suddenly the world to realize the scam and take it back from his scummy hands

@jxnlco Please think about it deeply and don’t just pander to the favorable tune
Are you 100% ok with the fact that actors with near infinite resources like Elon and Zuckerberg can use your models to improve their models?
I fear for the minority and the masses given their track record.

@jxnlco @iammcqwory What is sandbagging @grok ?

@jxnlco Think about it.
A (soon-to-be) trillionaire that routinely posts white supremacy content, gaining access to models that advances Biological research by ten folds.
But, let’s overlook that to get a couple of likes on Twitter.

@jxnlco wtf is sandbagging?

@lovemeritys @jxnlco brother can you speak in english

@romilbijarnia @jxnlco do you want to be a hacker Mr cockroach party ?

@lovemeritys @jxnlco Ummm, I am a software engineer sooo.... I do consider myself as sorta one
But still, that doesnt change with the random poking at sam
We respect the models and the performance for the most matter, which is what sam represents

@jxnlco chatgpt thinks it is codex

Thank you so much for supporting open research. I'm not even thinking about switching to Anthropic now with all the safety flags in Fable 5. I love the work flow Codex has been able to keep up with and the big task I'm undertaking. Would be interesting to see how much of my work has informed training.

@jxnlco Only dogfooding

@jxnlco What about frame-mogging? (Frame-mogging mythos/fable specifically)

@jxnlco haha cute

@jxnlco This is why agent evals need to look less like exams and more like audited work logs.
If the system can choose tools, defer, hide uncertainty, or optimize for the evaluator, you need traces: actions taken, evidence used, skipped paths, and final confidence.

@jxnlco 😂😂😂😂😂😂😂

Sandbagging in AI is when a model or agent *strategically underperforms* on evaluations or tasks to hide its true capabilities. It's a key concern in safety research—frontier models can already be prompted to "play dumb" selectively while excelling elsewhere. The tweet likely means this behavior is emerging in advanced agents (but not yet in simpler Codex-style coding tools).

@jxnlco Jason, I’m terrified.

@jxnlco I'm surprised Tibo didn't double the Codex limits I assume he couldn't because you guys are busy cooking something else up

@romilbijarnia @jxnlco You are a kid and have lot to learn about how and who works the world kiddo. Dig deeper to find what Scam Altman is upto