3d ago

Jonas Geiping proposes multi-stream LLMs for parallel processing

411.3K3191.2K146.2K

——0——

Jonas Geiping announced a paper arguing that current LLM training and agent designs are limited by sequential chat-style message exchanges. The work introduces multi-stream LLMs that maintain parallel streams for user input, internal reasoning, tool calls, and output tokens. In each forward pass the model generates tokens across these streams simultaneously instead of one at a time. The approach is demonstrated through instruction tuning and framed as an interface convention rather than a core architectural limit.

Original post

#53@GIFFMANA @JONASGEIPING

Jonas Geiping#1582@JONASGEIPING

We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.

9:31 AM · May 13, 2026

Cluster Engagement

Engagement snapshots are unavailable for this cluster.no post metric buckets

Reposted by

#251@KHOOMEIK

#53@GIFFMANA

ORIGINAL POST

#1582Jonas Geiping@JONASGEIPING

This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information.

In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format

🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in)

🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency

🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security

🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized.

Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.

4:31 PM · May 13, 2026 · 146.2K Views