MIT CSAIL's Alex Zhang clarifies that his recursive language model research only connects to two Anthropic agent systems

VIEWS3.7KLIKES75

@a1zhang stop wasting your time with ppl unable to read and come into the training mines instead

Last thing and then I'll shut up because people on this website genuinely have trouble 1) reading and 2) remembering anything:

I have referenced something else not openly called an RLM no more than 2 times as being RLM-like. Both were Anthropic releases. The first was "Scaling Managed Agents" post which explicitly links to the RLM paper as inspiration (and no one has issue with), and the second was the most recent "dynamic workflows" release. That's literally it. And all I said was "Every day we move closer to the RLM". I haven't deleted any tweets, you can check this yourself.

People are acting like I've egregiously done this over the past year and have hyped up this paper to oblivion because you see your favorite illiterate influencer shitpost about it. Genuinely fuck off, you're remembering all the bullshit AI influencers that were spouting nonsense about "RLMs solved long-context" or "RLMs were the first to do sub-agents". @yacinelearning remembers all of this and even made a good YT video about it, and while I am sorry that your TLs were polluted with it, it has nothing to do with me or the paper (half the time they butchered the name anyways). If you're sick of that hype, fine, don't take it up with me, I don't condone it either.

Since the release, every time I go on this website and see an RLM-specific paper or tweet I like, I retweet it. That's it. Stop acting like every release I claim is a new RLM feature. IT'S LITERALLY JUST THIS ONE.

That's all. If you have more problems, take it up in DM. I'll even call you if you're that desperate. This is so fucking stupid man

31d3.7K752

BOOKMARKS18

alex zhang@a1zhang

cool, you can triple down on this and that's fine, but in line with your own phrase "just trying to have a net-positive effect on the world" let's have an actual discussion about this instead of ragebaiting on my feed. again, whether anthropic was inspired or not is all speculative and I agree with you I can't make that claim.

> the authors themselves decided to grandly establish a new label 'RLM' — for LLMs that are prompted with big text files + search/call tools

it's ironic because you somehow still don't get it despite you genuinely being right that it's a super simple concept. I'm not masking the fact that it's a simple concept either, it's been stated countless times in my own writing. the paper itself, if you've read it, is about experimentally why such a system is useful and how out-of-the-box it does very well on long context tasks. and by the way, Claude Code could handle big text files with search / call tools before dynamic workflows, so I'm not really sure what your point is. it's code. it's giving an LLM unfiltered access to code where sub-LLM calls are available as functions in code, and everything, including the prompt, is an object in code. there's a one sentence summary of how simple it is, you could've filled that into your monologue instead. you can also call it "CodeAct with sub-LLM calls and explicit context offloading" if that makes you happy, it's even ablated in this way in the paper

and by the way, the naming comes from the fact that such a system lends itself well to acting as a "language model" around neural LMs calling themselves. it was originally meant to be treated as a "language model", that centers around using sub-calling at arbitrary levels of recursion to handle arbitrarily long tasks. you can hate the name, but don't act like "naming" formalizations of concepts isn't normal in both academia and industry. are you going to throw a fit at anthropic for calling their feature "dynamic workflows"?

> except for the subject of concern is, i kid you not, just 'treating long prompts as an environment and letting the LLM programmatically search, split up, and call itself over snippets of the prompt'

...and?

Chain-of-thought is just asking the LM to think before it acts. yet the paper was insightful in its experiments and showcasing that it works

ReAct is just applying CoT and saying "act" to a for loop. same deal.

CodeAct is just making code the only tool for an LM and implementing all tools in code. again, same deal.

GRPO is just PPO but with shittier estimates.

there's a giant list of industry and academic papers that are of this form, don't act like this is a bad thing or out of place. I'm not implying that RLMs are as impactful as any of these ideas, but you can at least acknowledge prior coding agents are clearly not RLMs and cannot perform the same set of actions, but dynamic workflows can. it's a simple idea, and it seems to work in many applications

> and they are receiving the attention that comes with that declaration

LMAO bro it's you and a guy who embarrassingly said RLMs claimed to invent sub-agents. it's totally fine if you just don't like the idea and think it's useless and I wouldn't have responded otherwise, but of course I'm going to respond if you reduce it down to the paper is saying nothing

will depue@willdepue

@georgejrjrjr this is an excellent point, and surely true elsewhere, except for the subject of concern is, i kid you not, just 'treating long prompts as an environment and letting the LLM programmatically search, split up, and call itself over snippets of the prompt', per their paper. what

31d1.8K6318

REPLIES3

Grad@Grad62304977

Tbf a significant part of RLM is programmatic tool calling which Anthropic had blogs on months after the RLM initial blog with pretty big gains (I think they use it for certain evals too) Also things like tool search which many adopted and made blogs on for their importance, is a part of the RLM idea (iirc these were after the initial RLM blog in oct 2025) Also most of the cursor dynamic context discovery blog done around 2 months after the RLM blog are ideas RLM would have

https://claude.com/blog/improved-web-search-with-dynamic-filtering

https://www.anthropic.com/engineering/advanced-tool-use

https://cursor.com/blog/dynamic-context-discovery

will depue@willdepue

@georgejrjrjr this is an excellent point, and surely true elsewhere, except for the subject of concern is, i kid you not, just 'treating long prompts as an environment and letting the LLM programmatically search, split up, and call itself over snippets of the prompt', per their paper. what

30d41566

alex zhang@a1zhang

@xeophon not the training mines...

Florian Brand@xeophon

@a1zhang stop wasting your time with ppl unable to read and come into the training mines instead

31d2.9K521

Yacine Mahdid@yacinelearning

@a1zhang yeah I think the ai slop-posters made a whole lot of damage to the RLM paper by reposting it with misaligned hyped up claim

last time I tried to correct one of these account I got blocked

31d85019

alex zhang@a1zhang

@OrganicGPT

31d78815

Ben Clavié@bclavie

@a1zhang I’ll say: people genuinely appreciate and like your work. I’m very much the same as you in that I overindex on drivel negative comments and let them get to me, but there’s absolutely no reason to. Please don’t let this stuff get to you and ignore it as much as you can!

31d7439

Ben (no treats)@andersonbcdefg

@a1zhang ignore the haters and keep doing good work 👍

alex zhang@a1zhang

Last thing and then I'll shut up because people on this website genuinely have trouble 1) reading and 2) remembering anything:

I have referenced something else not openly called an RLM no more than 2 times as being RLM-like. Both were Anthropic releases. The first was "Scaling Managed Agents" post which explicitly links to the RLM paper as inspiration (and no one has issue with), and the second was the most recent "dynamic workflows" release. That's literally it. And all I said was "Every day we move closer to the RLM". I haven't deleted any tweets, you can check this yourself.

People are acting like I've egregiously done this over the past year and have hyped up this paper to oblivion because you see your favorite illiterate influencer shitpost about it. Genuinely fuck off, you're remembering all the bullshit AI influencers that were spouting nonsense about "RLMs solved long-context" or "RLMs were the first to do sub-agents". @yacinelearning remembers all of this and even made a good YT video about it, and while I am sorry that your TLs were polluted with it, it has nothing to do with me or the paper (half the time they butchered the name anyways). If you're sick of that hype, fine, don't take it up with me, I don't condone it either.

Since the release, every time I go on this website and see an RLM-specific paper or tweet I like, I retweet it. That's it. Stop acting like every release I claim is a new RLM feature. IT'S LITERALLY JUST THIS ONE.

That's all. If you have more problems, take it up in DM. I'll even call you if you're that desperate. This is so fucking stupid man

31d60090

dan@irl_danB

@a1zhang ignore them, keep up the good work

31d17310

Behnam@OrganicGPT

@a1zhang Why do you even care if Claude uses your ideas or not? you want compensation from anthropic? that's the whole point of open research, if you wanted to monetize it you should have not put it on arXiv.

31d567

hallerite@hallerite

@xeophon @a1zhang when are you coming into the training mines

31d2073

Antonio Peña Batista@apenab1995

@a1zhang It just ignores people, “…The sun has spots. Ungrateful people talk about nothing but the spots. Grateful people talk about the light…” Thank you so much for the paper, for the MIT-licensed repository, and for all your work!!!!!

31d5691

Florian Brand@xeophon

@hallerite @a1zhang As soon as you stop asking for the weirdest evals to be implemented

31d1032

Romain Lacombe@rlacombe

@a1zhang My 2¢: folks criticizing you should (1) be kinder on the internet, (2) read the paper before speaking.

The key RLM ideas are way more nuanced/less obvious than just recursive LLM calls. I think symbolic abstraction of context/output is what makes them work so well.

31d2975

hallerite@hallerite

@xeophon @a1zhang probably never then

31d772

Grad@Grad62304977

- No I meant tools like subagents, can see the blogs I linked too for examples but these are ideas in RLMs

- Well it’s not only the codebase but sure

Mainly getting at ppl not holding cursor and Anthropic to the same standard as they release blogposts on this stuff without citing previous works (not even sure which ones there are other than RLMs) Here referring to all the blogs I linked

Alex J. Champandard 🌻@alexjc

@Grad62304977 @willdepue @georgejrjrjr - Hmm, there's a section in the paper about the semantics of the subagent calls, you mean calling "subagents like tools" rather? - Codebase (as large context) discovery via tool calls go back before RLM too; it's a core RLM-like concept invented before RLMs.

30d3520

Florian Brand@xeophon

@hallerite @a1zhang 😢

31d143

Bronson Schoen@BronsonSchoen

This isn’t my area of expertise but fwiw my impression following this since your paper has been that you’ve been really reasonable and positive in trying to engage with people! I regularly see RLM posts (1) attributing it to whatever company recently implemented it or (2) mad about something they think RLMs are doing or claiming, often in cases where it seems like they’re slightly misunderstanding what it is that RLM is doing compared to current dominant patterns.

31d3243

Anton Morgunov@anmorgunov

@a1zhang @willdepue @georgejrjrjr you're spending way too much time on this; it's obvious that his response is disproportionately emotional

you did good research and masterfully packaged it into a viral payload, one can only take notes

31d824

Eric@_ahnimal

@a1zhang GET EM ALEX (I’m #1 Alex glazer)

31d1793