/Tech39d ago

AI Security Institute published a report on May 21, 2026, assessing oversight mechanisms for advanced AI systems against rapid capability gains and identifying degradation pathways.

Report section records disputed advantages of discrete token reasoning.

283315415248.7K

#57

Original post

Geoffrey Irving#419

AI Security Institute@AISecurityInst

The safety of advanced AI systems increasingly depends on the ability to oversee them. Our new report examines today’s AI oversight landscape, finding many pathways likely to lead to its degradation.🧵

7:03 AM · May 21, 2026 · 29.2K Views

Sentiment

Positive users thank posters for sharing the report on pathways degrading AI oversight, while negative users criticize latent reasoning architectures as flawed spaghetti code causing opacity.

Pos

50.0%

Neg

50.0%

6 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

AI SECURITY INSTITUTEVia

#57

Posts from X

Most Activity

VIEWS3.3K

Joseph Bloom@JBloomAus

This report is an incredibly detailed and broad look into how it might become harder to monitor, audit or generally make confident claims about frontier AI systems. We interviewed an exceptional array of experts from multiple frontier labs, academia and industry. Worth a read!

AI Security Institute@AISecurityInst

39d3.3K3214

BOOKMARKS14LIKES37

Tomek Korbak@tomekkorbak

another banger from UK AISI!

AI Security Institute@AISecurityInst

39d3.2K3714

RETWEETS35

AI Security Institute@AISecurityInst

39d29.2K13381

REPLIES4

1a3orn@1a3orn

@jiaxinwen22 Yeah thinking in discrete signals plausibly is better for the same reason communicating in discrete signals plausibly is better.

https://en.wikipedia.org/wiki/Noisy-channel_coding_theorem

39d191102

Jordan Taylor@JordanTensor

There are a lot of pathways via which AI oversight is likely to degrade! Latent reasoning architectures, situational awareness, representational drift... We wrote a report ranking them.

Here I'll go into some which worry me most 🧵

AI Security Institute@AISecurityInst

39d1.6K217

AI Security Institute@AISecurityInst

See more analysis, and recommendations for developers and deployers, in the full report and blog: https://www.aisi.gov.uk/blog/will-it-become-harder-to-oversee-ai-systems

39d44152

Miles Brundage@Miles_Brundage

"Concerning" but unironically

AI Security Institute@AISecurityInst

39d1.6K191

AI Security Institute@AISecurityInst

The report maps current oversight methods for AI systems and how they could degrade, based on 25 expert interviews, a literature review, and our own analysis. We examine techniques across four oversight surfaces:

39d62871

AI Security Institute@AISecurityInst

An example is chain-of-thought oversight. Frontier models currently reason "out loud" in human-readable text - one of the most informative sources of oversight we have today. But the properties this rests on face pressure from many directions:

39d49861

Miles Brundage@Miles_Brundage

From https://www.aisi.gov.uk/blog/will-it-become-harder-to-oversee-ai-systems

Miles Brundage@Miles_Brundage

Encyclical or actual AI safety report, who is to say

39d77811

AI Security Institute@AISecurityInst

If this type of work excites you, the Model Transparency team is hiring - come and work with us! Apply here: https://job-boards.eu.greenhouse.io/aisi/jobs/4848454101

39d40151

Jiaxin Wen@jiaxinwen22

@1a3orn is this a theoretical argument?

1a3orn@1a3orn

Kudos to the one (??) expert in the report who pointed out that discrete token reasoning has better error correction, which is a factor decreasing the advantage of recurrent neuralese.

Also kudos to the report for tagging this as disputed.

39d56850

AI Security Institute@AISecurityInst

The report also surfaces and explores disagreements between experts. Some examples:

- Will latent reasoning architectures take over?

- Will action monitoring and control be sufficient for harm prevention?

- When is evidence from misalignment honeypots meaningful?

39d1934

AI Security Institute@AISecurityInst

Some pressures on oversight are already visible, such as evaluation gaming undermining behavioural audits. But because many oversight-relevant properties are not currently tracked, some loss of oversight could go unnoticed in future.

39d1774

Jordan Taylor@JordanTensor

@1a3orn @jiaxinwen22 Though notably there's nothing requiring the discrete tokens to be legible english. "Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought" is an example of learning to reason in an abstract discrete token vocabulary. https://arxiv.org/abs/2604.22709

39d1145

Jordan Taylor@JordanTensor

Running evals for misalignment ahead of time would be ideal, but eval gaming is already threatening to undermine the validity of these tests:

39d721

Jordan Taylor@JordanTensor

Right now, we have decent oversight IMO. Not great, not terrible. When AIs do bad things, they can often be caught through a range of techniques:

39d451

Jordan Taylor@JordanTensor

We interviewed a lot of experts and did our own analysis on how it will get harder to tell if AI systems are safe.

39d421

Jordan Taylor@JordanTensor

But there are a bunch of ways in which we're playing on easy-mode today, relative to how difficult oversight like this could be in the future.

Chain-of-thought reasoning is currently the most informative monitoring signal, but it is also at the most risk of degradation:

39d381

Jordan Taylor@JordanTensor

See more in the the paper: https://www.aisi.gov.uk/blog/will-it-become-harder-to-oversee-ai-systems

39d281