/Tech24m ago

Google DeepMind's Pranav Shyam defines short context horizons at 32k tokens or fewer and long context at 64k or more

Story Overview

Google DeepMind researcher Pranav Shyam drew a line in an X thread between simpler AI interactions capped at 32K tokens and the more demanding setups that kick in at 64K or 128K, where models must juggle repeated tool calls and code execution to keep state updated across steps.

120018

#1525

Original post

Pranav Shyam@recurseparadox#1597inTech

@Grad62304977 Short: 32K or less. Few pages long Long: 64/128K and higher, multiple rounds of tool calls or code execution

Grad@Grad62304977

@recurseparadox What would u say is long horizon and short horizon?

1:46 AM · Jul 1, 2026 · 11 Views

FYI

Task complexity shifts with horizon length

Short windows stay close to single-turn or bandit-style problems, while longer ones introduce sub-chains and shared knowledge that can benefit from value functions or Monte Carlo estimates.

Open Question

Reliability questions remain open at scale

Whether value models add meaningful signal or just latency in these extended settings is still being worked through, with some long-horizon cases showing zero learning signal under current approaches.

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

Grad@Grad62304977

@recurseparadox but whats the intuition here that tool calls and environments have a big effect here. As in its a clear point the value model can make a more accurate prediction on the final expected reward? Why is it different to say a long reasoning chain with a clear step made?

18m19

LIKES1REPLIES1

Grad@Grad62304977

@recurseparadox hmm ya thats a fair point although i still feel say in math, there could be a natural segmentation of insights and steps the model took that are naturally shared across rollouts (maybe not an exact match but still)

3m131

Pranav Shyam@recurseparadox

Tool calls or code exec update the state of the MDP. If you have many states then maybe you know how to value at least some of them. This is where the sharing of knowledge happens between trajectories. The MDP states act like anchor points - they reappear in many trajectories and therefore you can use value function of one for the other. The value estimation of the full trajectory can still be very wrong but at least the model gets some reward (maybe the code produced was compiled but tests all failed for example. Here the value function can reward successful compilation because it has seen that in other successful trajectories)

If the state of the MDP is not changing then the problem is a bandit problem, and there’s nothing to share between trajectories anymore. Initial prompt is the only anchor point. In that case the policy knows as much as the value function.

7m181

Pranav Shyam@recurseparadox

The moment there are few tool calls/ environment returning state I think value model’s take over. They can smear reward in between the sub goals. I think value functions are most useful locally but for loong chain tasks.

Like you’re right in that value functions over very long horizons shouldnt be magically reliable. But I think their main benefit is in subtask rewards

19m131

Grad@Grad62304977

@recurseparadox Ok ya fair. Wdyt abt the value models reliability at these different horizons

Pranav Shyam@recurseparadox

@Grad62304977 Short: 32K or less. Few pages long Long: 64/128K and higher, multiple rounds of tool calls or code execution

24m700