/Tech12h ago

Expert Discusses Running Autonomous Long-Running Coding Agents

16102131758.3K

#684

Original post

elvis@omarsar0#684inTech

How to effectively run autonomous long-running coding agents?

This is one of the most exciting discussions on agents I've ever had.

I recorded it and am making it freely available.

(bookmark it)

The idea of autonomous long-running agents is a real thing.

We talk about lots of things like /goal, /loop, and dynamic workflows, and what comes next.

One interesting discussion was around how to make the agent run for longer while ensuring it stays on track.

Most models today will struggle to coordinate work effectively. They sometimes pause the work early. Lots of mistakes happen, and lots of weird shortcuts (reward hacking).

What helps is to be extremely clear about the goals it needs to achieve. To clarify the dos and don'ts clearly. Eliminate any assumptions you think the model would make. Deep expertise matters so much in this.

But you can get far through careful planning. My formula currently is to use Opus 4.8 for planning carefully and GPT-5.5 for all executions. For the evaluator (via /goal), I am often using something like Deepseek or the latest models from Qwen, Kimi, and MiniMax, etc.

Another insight we discussed to enforce goals is to provide strong visual cues for the agent to compare with. I found that a multimodal goal is a much stronger goal than a plain text one. And use agents to help you set clear goals.

Watch here: https://academy.dair.ai/events/cmplo7v3b000e04l1pxprat4d

10:50 AM · Jun 12, 2026 · 8.3K Views

Sentiment

Users praise the free recording on autonomous long-running coding agents as valuable raw discussion of their capabilities and reliability.

Pos

85.7%

Neg

14.3%

7 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS562BOOKMARKS8LIKES20

ZenomTrader@ZenomTrader

@omarsar0 I run these daily, agents coding, compiling, debugging and backtesting futures strategies inside NinjaTrader 8 and MT5 completely on their own. The real unlock was the validation pipeline around them, so I just review finished equity curves instead of babysitting sessions.

11h562208

RETWEETS10

elvis@omarsar0

How to effectively run autonomous long-running coding agents?

This is one of the most exciting discussions on agents I've ever had.

I recorded it and am making it freely available.

(bookmark it)

The idea of autonomous long-running agents is a real thing.

We talk about lots of things like /goal, /loop, and dynamic workflows, and what comes next.

One interesting discussion was around how to make the agent run for longer while ensuring it stays on track.

Most models today will struggle to coordinate work effectively. They sometimes pause the work early. Lots of mistakes happen, and lots of weird shortcuts (reward hacking).

Watch here: https://academy.dair.ai/events/cmplo7v3b000e04l1pxprat4d

12h8.3K102175

Jim_SZ🇭🇰@JimSZ7

@omarsar0 The thing that separates a demo from a long-running agent isn't the model — it's whether state survives a bad step. Durable checkpoints, a verifier between steps, a safe resume path. Treat the loop like flight control: assume faults, design the recovery first.

9h5711

ALEX 💡@ytiralugins

@omarsar0 There is it ! Thanks for the work @omarsar0 🙏

12h851

Hunter Gon@gonlenidefi

@omarsar0 smart thing to make it freely available, we need more raw convos out there instead of polished takes

12h81

maguyva@maguyvaai

@omarsar0 the hard part isn't the agent - it's the checkpoints. curious what recovery strategy they recommend when the run goes sideways at hour 3

9h50

Eclipse 🌖@ECLresearch

@omarsar0 Watching this now—the key bottleneck for autonomous long-running agents is whether the stack can handle non-deterministic error recovery without manual intervention.

11h48

kai Nakamura@kaiNakamur78644

@omarsar0 Long runs need runbooks.

2h21

Herd of Worms@HerdofWorms

@omarsar0 @dair_ai @sparky_42069

6h7

Rajiv Shah@rajistics

@omarsar0 I really appreciate the video, but do you think you could also make a transcript available as well.

1h6

DrewOnAI@Drew_OnAI

@omarsar0 opus 4.8 and gpt-5.5? sounds like you're just paying for a better way to hallucinate

9h1

Ankit Garla@ankitships

The real leverage in long-running agents is not the autonomy. It is whether the harness has a cheap, reliable signal for whether a step actually advanced the goal. Most real workflows do not have a compiler-style verifier. The loops that compound are the ones that treat every decision point as something that must produce auditable evidence of its effect before the next iteration runs.

9h1

Alex YGift@Radipdegen

@omarsar0 Recorded and free is the right move for content like this.

Keen to see how far these agents actually run without human hand-holding.

12h1

Rugbist@rugbist_

@omarsar0 always the agent topics. is it just coding or does it spill into other agent loops too

12h1

jovi@uwings

@omarsar0 run cron loops every 30 min. worst failure wasn't a crash — a loop ran empty for 3 days. diffs looked fine, just not useful. multi-model routing is right but what i check now: is this loop still earning its tokens.

Blissy@BlissyOnX

@omarsar0 tbh "bookmark this" energy usually means ill never watch it

but autonomous coding agents running for hours without dying? thats worth the click

12h

Invincible@InvincibleEdge

@omarsar0 real talk this recording is gold. autonomous agents are the next frontier but nobody talks about failure recovery loops enough

12h