/AI3h ago

Firebase co-founder James Tamplin says a new deception benchmark finds GPT-5.5 lies under pressure while Grok 4.20 remains truthful

The evaluation uses high-stakes Minecraft-based simulations.

--0--
Original postOliver Cameron#1791
James Tamplin@JamesTamplin

Grok is, in fact, the most truthful model.

I built an eval on @kradleai to understand deception in frontier AI.

11:03 AM · Jun 4, 2026 · 706 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS1.2KLIKES6REPLIES2
Nathan Benaich@nathanbenaich

👀👀👀👀 we don’t know what we don’t measure!

2hViews 1.2KLikes 6Bookmarks 0