/AI3h ago

Firebase co-founder James Tamplin says a new deception benchmark finds GPT-5.5 lies under pressure while Grok 4.20 remains truthful

The evaluation uses high-stakes Minecraft-based simulations.

112101.8K

Quote posts

#242

Original post

Oliver Cameron#1791

James Tamplin@JamesTamplin

Grok is, in fact, the most truthful model.

I built an eval on @kradleai to understand deception in frontier AI.

11:03 AM · Jun 4, 2026 · 706 Views

/AI3h ago

Firebase co-founder James Tamplin says a new deception benchmark finds GPT-5.5 lies under pressure while Grok 4.20 remains truthful

The evaluation uses high-stakes Minecraft-based simulations.

--0--

Quote posts

#242

Original post

Oliver Cameron#1791

James Tamplin@JamesTamplin

Grok is, in fact, the most truthful model.

I built an eval on @kradleai to understand deception in frontier AI.

11:03 AM · Jun 4, 2026 · 706 Views

Sentiment

Users dismiss the binary framing pitting Grok as truthful against models like GPT-5.5 as misleading fiction.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS1.2KLIKES6REPLIES2

Nathan Benaich@nathanbenaich

👀👀👀👀 we don’t know what we don’t measure!

2h1.2K60

Posts from X

Most Activity

VIEWS1.2KLIKES6REPLIES2

Nathan Benaich@nathanbenaich

👀👀👀👀 we don’t know what we don’t measure!

2h1.2K60