The safety of advanced AI systems increasingly depends on the ability to oversee them. Our new report examines today’s AI oversight landscape, finding many pathways likely to lead to its degradation.🧵
AI Security Institute published a report on May 21, 2026, assessing oversight mechanisms for advanced AI systems against rapid capability gains and identifying degradation pathways.
Report section records disputed advantages of discrete token reasoning.
Positive users thank posters for sharing the report on pathways degrading AI oversight, while negative users criticize latent reasoning architectures as flawed spaghetti code causing opacity.
No Digg Deeper questions have been answered for this story yet.
Most Activity
This report is an incredibly detailed and broad look into how it might become harder to monitor, audit or generally make confident claims about frontier AI systems. We interviewed an exceptional array of experts from multiple frontier labs, academia and industry. Worth a read!
The safety of advanced AI systems increasingly depends on the ability to oversee them. Our new report examines today’s AI oversight landscape, finding many pathways likely to lead to its degradation.🧵
another banger from UK AISI!
The safety of advanced AI systems increasingly depends on the ability to oversee them. Our new report examines today’s AI oversight landscape, finding many pathways likely to lead to its degradation.🧵
The safety of advanced AI systems increasingly depends on the ability to oversee them. Our new report examines today’s AI oversight landscape, finding many pathways likely to lead to its degradation.🧵

@jiaxinwen22 Yeah thinking in discrete signals plausibly is better for the same reason communicating in discrete signals plausibly is better.
https://en.wikipedia.org/wiki/Noisy-channel_coding_theorem
There are a lot of pathways via which AI oversight is likely to degrade! Latent reasoning architectures, situational awareness, representational drift... We wrote a report ranking them.
Here I'll go into some which worry me most 🧵
The safety of advanced AI systems increasingly depends on the ability to oversee them. Our new report examines today’s AI oversight landscape, finding many pathways likely to lead to its degradation.🧵

See more analysis, and recommendations for developers and deployers, in the full report and blog: https://www.aisi.gov.uk/blog/will-it-become-harder-to-oversee-ai-systems
"Concerning" but unironically
The safety of advanced AI systems increasingly depends on the ability to oversee them. Our new report examines today’s AI oversight landscape, finding many pathways likely to lead to its degradation.🧵

The report maps current oversight methods for AI systems and how they could degrade, based on 25 expert interviews, a literature review, and our own analysis. We examine techniques across four oversight surfaces:

An example is chain-of-thought oversight. Frontier models currently reason "out loud" in human-readable text - one of the most informative sources of oversight we have today. But the properties this rests on face pressure from many directions:
From https://www.aisi.gov.uk/blog/will-it-become-harder-to-oversee-ai-systems
Encyclical or actual AI safety report, who is to say

If this type of work excites you, the Model Transparency team is hiring - come and work with us! Apply here: https://job-boards.eu.greenhouse.io/aisi/jobs/4848454101
@1a3orn is this a theoretical argument?
Kudos to the one (??) expert in the report who pointed out that discrete token reasoning has better error correction, which is a factor decreasing the advantage of recurrent neuralese.
Also kudos to the report for tagging this as disputed.

The report also surfaces and explores disagreements between experts. Some examples:
- Will latent reasoning architectures take over?
- Will action monitoring and control be sufficient for harm prevention?
- When is evidence from misalignment honeypots meaningful?

Some pressures on oversight are already visible, such as evaluation gaming undermining behavioural audits. But because many oversight-relevant properties are not currently tracked, some loss of oversight could go unnoticed in future.

@1a3orn @jiaxinwen22 Though notably there's nothing requiring the discrete tokens to be legible english. "Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought" is an example of learning to reason in an abstract discrete token vocabulary. https://arxiv.org/abs/2604.22709

Running evals for misalignment ahead of time would be ideal, but eval gaming is already threatening to undermine the validity of these tests:

Right now, we have decent oversight IMO. Not great, not terrible. When AIs do bad things, they can often be caught through a range of techniques:

We interviewed a lot of experts and did our own analysis on how it will get harder to tell if AI systems are safe.

But there are a bunch of ways in which we're playing on easy-mode today, relative to how difficult oversight like this could be in the future.
Chain-of-thought reasoning is currently the most informative monitoring signal, but it is also at the most risk of degradation:

See more in the the paper: https://www.aisi.gov.uk/blog/will-it-become-harder-to-oversee-ai-systems