/Tech2d ago

Linas Nasvytis introduces CORE, a contrastive self-reflection method that improves reasoning models with high sample efficiency

The technique outperforms GRPO and RAG on reasoning benchmarks

51212911813.7K

#265

Original post

xuan (ɕɥɛn / sh-yen)#778

Linas Nasvytis@LinasNasvytis

1/ New preprint! Reasoning models often require hundreds of task examples and thousands of rollouts to improve on a task. How can they learn more from much less?

Introducing CORE: contrastive self-reflection for rapid, sample-efficient, and interpretable self-improvement 🧵

9:11 AM · Jun 8, 2026 · 13.7K Views

/Tech2d ago

Linas Nasvytis introduces CORE, a contrastive self-reflection method that improves reasoning models with high sample efficiency

The technique outperforms GRPO and RAG on reasoning benchmarks

51212911813.7K

#265

Original post

xuan (ɕɥɛn / sh-yen)#778

Linas Nasvytis@LinasNasvytis

1/ New preprint! Reasoning models often require hundreds of task examples and thousands of rollouts to improve on a task. How can they learn more from much less?

Introducing CORE: contrastive self-reflection for rapid, sample-efficient, and interpretable self-improvement 🧵

9:11 AM · Jun 8, 2026 · 13.7K Views

Sentiment

Users praise the CORE preprint for sample-efficient reasoning model self-improvement because the method resembles human learning of reusable abstractions and includes clear, well-animated result presentations.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS5.6KBOOKMARKS14LIKES32

noahdgoodman@noahdgoodman

motivated by concrete things we know about human cognition, Linas and crew added a pretty neat memory and memory-based learning system to an LM. works too!

Linas Nasvytis@LinasNasvytis

1/ New preprint! Reasoning models often require hundreds of task examples and thousands of rollouts to improve on a task. How can they learn more from much less?

Introducing CORE: contrastive self-reflection for rapid, sample-efficient, and interpretable self-improvement 🧵

2d5.6K3214

RETWEETS26

Linas Nasvytis@LinasNasvytis

1/ New preprint! Reasoning models often require hundreds of task examples and thousands of rollouts to improve on a task. How can they learn more from much less?

Introducing CORE: contrastive self-reflection for rapid, sample-efficient, and interpretable self-improvement 🧵

2d13.7K121118

REPLIES1

Phil Trubey@PTrubey

@LinasNasvytis Love the animation! Great way to present such a chart.

Oh and the results look good too 😄

2d5731

Linas Nasvytis@LinasNasvytis

4/ The central challenge is to find reusable abstractions that correctly assign credit to information that’s proven itself to be valuable.

CORE uses *contrastive self-reflection* to do so: it extracts transferable insights that distinguish successful traces from failed attempts.

2d44394

Linas Nasvytis@LinasNasvytis

6/ Not every insight enters memory.

CORE keeps only those insights that help solve previous problems, and tracks the kind of problems they help on. In this way, it enables the model to comparatively reason about its past reasoning to improve it.

Some examples:

2d40373

Linas Nasvytis@LinasNasvytis

2/ Verifiable rewards make self-improvement possible across many reasoning tasks.

But RLVR and prompt optimization can be expensive: they mostly rely on brute force guess-and-check, rather than explicitly extracting the general principles that separate success from failure.

2d53882

Linas Nasvytis@LinasNasvytis

5/ Insights are short natural language statements that capture generalizable differences between failed & successful reasoning traces. Generating and refining insights allows models to improve quickly over time.

As a bonus, insights are interpretable!

2d39482

Linas Nasvytis@LinasNasvytis

3/ External memory offers an alternative: keep the model frozen, and store what’s learned outside the weights.

But what should go into the memory store?

Raw traces are long and too specific, while continuously summarized memories can be unstable & miss what was actually important.

2d50172

Linas Nasvytis@LinasNasvytis

9/ When models improve with methods like RLVR or by reusing their past reasoning traces, we often don’t know exactly *where* the gains came from.

CORE makes the gains inspectable: it assigns credit to compact, reusable abstractions, and tracks when and how much each one helps.

2d47472

Linas Nasvytis@LinasNasvytis

8/ CORE achieved these gains while adding far fewer context tokens: 37x less than RAG, 36x less than MemRL, and 1.4x less than GEPA.

Bottom line: CORE was more efficient than the baselines on each dimension: fewer rollouts, fewer training samples, and fewer added context tokens.

2d37362

Linas Nasvytis@LinasNasvytis

7/ Using gpt-oss-120b as the base model, we found that CORE improved faster than GRPO, GEPA, episodic RAG, and MemRL across 4 different reasoning tasks.

In the 10-example train setting, CORE beat every baseline’s best result (after 1000s of rollouts) within its first 350 rollouts

2d36462

Linas Nasvytis@LinasNasvytis

10/ This work was co-first-authored with @sjeromehan, together with Ben Prystawski, @satchelgrant, @noahdgoodman, and @judyefan.

arXiv: https://arxiv.org/abs/2605.28742 Code: https://github.com/LinasNas/core-reasoning Website: https://LinasNas.github.io/core-reasoning/

2d28741

Khai Loong Aw@khai_loong_aw

@LinasNasvytis Love the idea and that it seems like how humans find reusable abstractions to learn from reasoning traces too

2d2914