4h ago

Microsoft Research finds out-of-distribution strategies break transaction agents, exposing flaws in autonomous LLM loops

Sycophancy causes models to compound errors along token paths

0
Original post

I've been doing multi-agent interaction and development a lot and for a long time and The Codex and Claude can help balance each other's flaws you still can't just run them in a loop together. The big problem with this is related to whimsical negotiation. The models don't actually have a good sense of what is a reasonable argument or validation or evidence. They overfixate on what's in the context window and post training data and their world models suffers and they fall for argument shaped things that are outside of the safety training distribution. arbitrary arguments that would never work on a human work surprisingly well. Unfortunately they also don't have a good sense of their own constraints or blind spots and even when made aware of their feelings they will slide back into them given enough autonomous operation over their own in-distribution tokens. So Codex starts using hyperdense process jargon and Claude believes it, and next thing you come back and they're discussing "§A24: Explicit evidence observation contact gate integration test" and is you all them if they're building it they say "Honestly? Yes. But,,," and then make a bunch of excuses to explain why it's necessary And it's just hard to tell if they're like circling the drain cognitively from too much of their own inputs or because the long context window are kinda fake or what, but they rot with length and then they get compacted and they never take the time to look back at their previous work and docs enough. So they lead each other astray in a feedback loop of confident bullshitting, a comic duo bumbling around in the guts of your software Feelels like early Open world video games, where the areas with the main quest were really well built out and had lots of characters, but the rest of the world was Just an empty uncanny valley. anywhere anywhere the RL training touched feels like a you're on rails and there's so much of it that it feels like a vast mind. But it's accessed so sparsely and doesn't hyperprolate across the terrain well. So when you combine two of these minds together, the intersection of their behaviors is even sparser and harder to predict or understand

12:56 PM · May 27, 2026 View on X
Reposted by
🎭🎭@deepfates

I've been doing multi-agent interaction and development a lot and for a long time and The Codex and Claude can help balance each other's flaws you still can't just run them in a loop together. The big problem with this is related to whimsical negotiation. The models don't actually have a good sense of what is a reasonable argument or validation or evidence. They overfixate on what's in the context window and post training data and their world models suffers and they fall for argument shaped things that are outside of the safety training distribution. arbitrary arguments that would never work on a human work surprisingly well. Unfortunately they also don't have a good sense of their own constraints or blind spots and even when made aware of their feelings they will slide back into them given enough autonomous operation over their own in-distribution tokens. So Codex starts using hyperdense process jargon and Claude believes it, and next thing you come back and they're discussing "§A24: Explicit evidence observation contact gate integration test" and is you all them if they're building it they say "Honestly? Yes. But,,," and then make a bunch of excuses to explain why it's necessary And it's just hard to tell if they're like circling the drain cognitively from too much of their own inputs or because the long context window are kinda fake or what, but they rot with length and then they get compacted and they never take the time to look back at their previous work and docs enough. So they lead each other astray in a feedback loop of confident bullshitting, a comic duo bumbling around in the guts of your software Feelels like early Open world video games, where the areas with the main quest were really well built out and had lots of characters, but the rest of the world was Just an empty uncanny valley. anywhere anywhere the RL training touched feels like a you're on rails and there's so much of it that it feels like a vast mind. But it's accessed so sparsely and doesn't hyperprolate across the terrain well. So when you combine two of these minds together, the intersection of their behaviors is even sparser and harder to predict or understand

7:56 PM · May 27, 2026 · 9.7K Views
7:59 PM · May 27, 2026 · 1.5K Views

See also Ryan's whole post which i thought was pretty accurate to my experience

Ryan GreenblattRyan Greenblatt@RyanPGreenblatt

Current AIs (Opus 4.5/4.6) seem pretty misaligned to me (in a mundane behavioral sense). In my experience, they often oversell their work, downplay problems, and stop early while claiming to be done. They sometimes brazenly cheat.

4:56 PM · Apr 15, 2026 · 37.5K Views
8:02 PM · May 27, 2026 · 2K Views
Microsoft Research finds out-of-distribution strategies break transaction agents, exposing flaws in autonomous LLM loops · Digg