/Tech2d ago

Linas Nasvytis introduces CORE, a contrastive self-reflection method that improves reasoning models with high sample efficiency

The technique outperforms GRPO and RAG on reasoning benchmarks

51212911813.7K
Linas Nasvytis@LinasNasvytis

1/ New preprint! Reasoning models often require hundreds of task examples and thousands of rollouts to improve on a task. How can they learn more from much less?

Introducing CORE: contrastive self-reflection for rapid, sample-efficient, and interpretable self-improvement 🧵

9:11 AM · Jun 8, 2026 · 13.7K Views
Sentiment

Users praise the CORE preprint for sample-efficient reasoning model self-improvement because the method resembles human learning of reusable abstractions and includes clear, well-animated result presentations.

Pos
100.0%
Neg
0.0%
2 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS5.6KBOOKMARKS14LIKES32
noahdgoodman@noahdgoodman

motivated by concrete things we know about human cognition, Linas and crew added a pretty neat memory and memory-based learning system to an LM. works too!

Linas Nasvytis@LinasNasvytis

1/ New preprint! Reasoning models often require hundreds of task examples and thousands of rollouts to improve on a task. How can they learn more from much less?

Introducing CORE: contrastive self-reflection for rapid, sample-efficient, and interpretable self-improvement 🧵

2dViews 5.6KLikes 32Bookmarks 14
RETWEETS26
Linas Nasvytis@LinasNasvytis

1/ New preprint! Reasoning models often require hundreds of task examples and thousands of rollouts to improve on a task. How can they learn more from much less?

Introducing CORE: contrastive self-reflection for rapid, sample-efficient, and interpretable self-improvement 🧵

2dViews 13.7KLikes 121Bookmarks 118
REPLIES1
Phil Trubey@PTrubey

@LinasNasvytis Love the animation! Great way to present such a chart.

Oh and the results look good too 😄

2dViews 573Likes 1
Linas Nasvytis@LinasNasvytis

4/ The central challenge is to find reusable abstractions that correctly assign credit to information that’s proven itself to be valuable.

CORE uses *contrastive self-reflection* to do so: it extracts transferable insights that distinguish successful traces from failed attempts.

2dViews 443Likes 9Bookmarks 4
Linas Nasvytis@LinasNasvytis

6/ Not every insight enters memory.

CORE keeps only those insights that help solve previous problems, and tracks the kind of problems they help on. In this way, it enables the model to comparatively reason about its past reasoning to improve it.

Some examples:

2dViews 403Likes 7Bookmarks 3
Linas Nasvytis@LinasNasvytis

2/ Verifiable rewards make self-improvement possible across many reasoning tasks.

But RLVR and prompt optimization can be expensive: they mostly rely on brute force guess-and-check, rather than explicitly extracting the general principles that separate success from failure.

2dViews 538Likes 8Bookmarks 2
Linas Nasvytis@LinasNasvytis

5/ Insights are short natural language statements that capture generalizable differences between failed & successful reasoning traces. Generating and refining insights allows models to improve quickly over time.

As a bonus, insights are interpretable!

2dViews 394Likes 8Bookmarks 2
Linas Nasvytis@LinasNasvytis

3/ External memory offers an alternative: keep the model frozen, and store what’s learned outside the weights.

But what should go into the memory store?

Raw traces are long and too specific, while continuously summarized memories can be unstable & miss what was actually important.

2dViews 501Likes 7Bookmarks 2
Linas Nasvytis@LinasNasvytis

9/ When models improve with methods like RLVR or by reusing their past reasoning traces, we often don’t know exactly *where* the gains came from.

CORE makes the gains inspectable: it assigns credit to compact, reusable abstractions, and tracks when and how much each one helps.

2dViews 474Likes 7Bookmarks 2
Linas Nasvytis@LinasNasvytis

8/ CORE achieved these gains while adding far fewer context tokens: 37x less than RAG, 36x less than MemRL, and 1.4x less than GEPA.

Bottom line: CORE was more efficient than the baselines on each dimension: fewer rollouts, fewer training samples, and fewer added context tokens.

2dViews 373Likes 6Bookmarks 2
Linas Nasvytis@LinasNasvytis

7/ Using gpt-oss-120b as the base model, we found that CORE improved faster than GRPO, GEPA, episodic RAG, and MemRL across 4 different reasoning tasks.

In the 10-example train setting, CORE beat every baseline’s best result (after 1000s of rollouts) within its first 350 rollouts

2dViews 364Likes 6Bookmarks 2
Linas Nasvytis@LinasNasvytis

10/ This work was co-first-authored with @sjeromehan, together with Ben Prystawski, @satchelgrant, @noahdgoodman, and @judyefan.

arXiv: https://arxiv.org/abs/2605.28742 Code: https://github.com/LinasNas/core-reasoning Website: https://LinasNas.github.io/core-reasoning/

2dViews 287Likes 4Bookmarks 1
Khai Loong Aw@khai_loong_aw

@LinasNasvytis Love the idea and that it seems like how humans find reusable abstractions to learn from reasoning traces too

2dViews 291Likes 4