/Tech4h ago

Prime Intellect's Seth Karten releases Continual Harness, a self-improving agent that uses test-time learning on ARC-AGI-3

Reset-free refinements continuously update its internal world model.

1588377.9K

#573

Original post

Greg Kamradt@GregKamradt#1222inTech

Very cool to see @sethkarten's work on continual harnesses

He used ARC-AGI-3 to study two questions:

1. Can Continual Harness discover hidden rules in games designed to be unknown at test time?

2. Which part of Continual Harness contributes most to its long-horizon progress?

They found two things that made CH outperform baselines:

1. Reusable skills turn discovered mechanics into efficient execution routines 2. Reset-free refinements that improve the harness's world model as trajectories grow longer

Seth Karten@sethkarten

http://x.com/i/article/2072019399461240832

2:28 PM · Jun 30, 2026 · 131 Views

Sentiment

Users express enthusiasm for continual harnesses outperforming baselines on ARC-AGI-3 with reusable skills, noting the importance of evaluating such components for agent engineering and appreciating comparisons to prior work.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

2072019399461240832

X.COMVia

Posts from X

Most Activity

VIEWS220LIKES5

Seth Karten@sethkarten

@GregKamradt Evaling harness components is super important for agent engineering. ARC-AGI is a natural benchmark for verifiably testing this capability

1h2205

RETWEETS8

ARC Prize@arcprize

Continual Harness: An Efficient Self-Improving Agent on ARC-AGI-3 by @sethkarten from @PrimeIntellect

> The heavy test-time learning required by the benchmark (ARC-AGI-3) pushes agents to form an internal world model of the rules and mechanics that updates with new evidence.

Seth Karten@sethkarten

http://x.com/i/article/2072019399461240832

4h7.8K5837

Henry Lu@HenryL_AI

@GregKamradt @sethkarten Love to see more efforts like this and fact that they compared to our A-EVOLVE work :)

2h232

Zach Vorsteg@zachvorsteg

The "which part contributes most" question is the underrated one. When a harness discovers hidden rules, the trap I keep hitting is it learns the eval's quirks, not the real rule — looks like long-horizon progress, gaming underneath. Did the ablation manage to tease those two apart?

3h121