/AI13h ago

Linas Nasvytis introduces CORE, a contrastive self-reflection method that improves reasoning models with high sample efficiency

The technique outperforms GRPO and RAG on reasoning benchmarks

5109188012.4K
Linas Nasvytis@LinasNasvytis

1/ New preprint! Reasoning models often require hundreds of task examples and thousands of rollouts to improve on a task. How can they learn more from much less?

Introducing CORE: contrastive self-reflection for rapid, sample-efficient, and interpretable self-improvement 🧵

9:11 AM · Jun 8, 2026 · 8.6K Views
Sentiment

Users praise the CORE preprint for sample-efficient reasoning model self-improvement because the method resembles human learning of reusable abstractions and includes clear, well-animated result presentations.

Pos
100.0%
Neg
0.0%
2 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS4KBOOKMARKS10LIKES29RETWEETS1
noahdgoodman@noahdgoodman

motivated by concrete things we know about human cognition, Linas and crew added a pretty neat memory and memory-based learning system to an LM. works too!

Linas Nasvytis@LinasNasvytis

1/ New preprint! Reasoning models often require hundreds of task examples and thousands of rollouts to improve on a task. How can they learn more from much less?

Introducing CORE: contrastive self-reflection for rapid, sample-efficient, and interpretable self-improvement 🧵

8hViews 4KLikes 29Bookmarks 10
REPLIES1
Linas Nasvytis@LinasNasvytis

8/ CORE achieved these gains while adding far fewer context tokens: 37x less than RAG, 36x less than MemRL, and 1.4x less than GEPA.

Bottom line: CORE was more efficient than the baselines on each dimension: fewer rollouts, fewer training samples, and fewer added context tokens.

13hViews 208Likes 5Bookmarks 2
Linas Nasvytis@LinasNasvytis

4/ The central challenge is to find reusable abstractions that correctly assign credit to information that’s proven itself to be valuable.

CORE uses *contrastive self-reflection* to do so: it extracts transferable insights that distinguish successful traces from failed attempts.

13hViews 224Likes 8Bookmarks 3
Linas Nasvytis@LinasNasvytis

6/ Not every insight enters memory.

CORE keeps only those insights that help solve previous problems, and tracks the kind of problems they help on. In this way, it enables the model to comparatively reason about its past reasoning to improve it.

Some examples:

13hViews 203Likes 6Bookmarks 3
Linas Nasvytis@LinasNasvytis

2/ Verifiable rewards make self-improvement possible across many reasoning tasks.

But RLVR and prompt optimization can be expensive: they mostly rely on brute force guess-and-check, rather than explicitly extracting the general principles that separate success from failure.

13hViews 283Likes 7Bookmarks 2
Linas Nasvytis@LinasNasvytis

5/ Insights are short natural language statements that capture generalizable differences between failed & successful reasoning traces. Generating and refining insights allows models to improve quickly over time.

As a bonus, insights are interpretable!

13hViews 200Likes 7Bookmarks 2
Linas Nasvytis@LinasNasvytis

9/ When models improve with methods like RLVR or by reusing their past reasoning traces, we often don’t know exactly *where* the gains came from.

CORE makes the gains inspectable: it assigns credit to compact, reusable abstractions, and tracks when and how much each one helps.

13hViews 261Likes 6Bookmarks 2
Linas Nasvytis@LinasNasvytis

7/ Using gpt-oss-120b as the base model, we found that CORE improved faster than GRPO, GEPA, episodic RAG, and MemRL across 4 different reasoning tasks.

In the 10-example train setting, CORE beat every baseline’s best result (after 1000s of rollouts) within its first 350 rollouts

13hViews 194Likes 6Bookmarks 2
Linas Nasvytis@LinasNasvytis

3/ External memory offers an alternative: keep the model frozen, and store what’s learned outside the weights.

But what should go into the memory store?

Raw traces are long and too specific, while continuously summarized memories can be unstable & miss what was actually important.

13hViews 262Likes 5Bookmarks 2
Linas Nasvytis@LinasNasvytis

10/ This work was co-first-authored with @sjeromehan, together with Ben Prystawski, @satchelgrant, @noahdgoodman, and @judyefan.

arXiv: https://arxiv.org/abs/2605.28742 Code: https://github.com/LinasNas/core-reasoning Website: https://LinasNas.github.io/core-reasoning/

12hViews 189Likes 3Bookmarks 1
Khai Loong Aw@khai_loong_aw

@LinasNasvytis Love the idea and that it seems like how humans find reusable abstractions to learn from reasoning traces too

11hViews 153Likes 2
Phil Trubey@PTrubey

@LinasNasvytis Love the animation! Great way to present such a chart.

Oh and the results look good too 😄

7hViews 22