/Tech13h ago

Linas Nasvytis introduces CORE, a contrastive self-reflection framework for sample-efficient reasoning model self-improvement

The system outperformed GRPO on ZebraLogic reasoning benchmarks.

5111198412.7K

Original post

motivated by concrete things we know about human cognition, Linas and crew added a pretty neat memory and memory-based learning system to an LM. works too!

Linas Nasvytis@LinasNasvytis

1/ New preprint! Reasoning models often require hundreds of task examples and thousands of rollouts to improve on a task. How can they learn more from much less?

Introducing CORE: contrastive self-reflection for rapid, sample-efficient, and interpretable self-improvement 🧵

1:40 PM · Jun 8, 2026 · 4K Views

/Tech13h ago

Linas Nasvytis introduces CORE, a contrastive self-reflection framework for sample-efficient reasoning model self-improvement

The system outperformed GRPO on ZebraLogic reasoning benchmarks.

5111198412.7K

#984

Original post

noahdgoodman@noahdgoodman#984inTech

motivated by concrete things we know about human cognition, Linas and crew added a pretty neat memory and memory-based learning system to an LM. works too!

Linas Nasvytis@LinasNasvytis

1/ New preprint! Reasoning models often require hundreds of task examples and thousands of rollouts to improve on a task. How can they learn more from much less?

Introducing CORE: contrastive self-reflection for rapid, sample-efficient, and interpretable self-improvement 🧵

1:40 PM · Jun 8, 2026 · 4K Views

Sentiment

Users appreciate the CORE preprint for sample-efficient reasoning model self-improvement because the approach resembles how humans learn reusable abstractions from traces and features clear animations with promising results.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

Linas Nasvytis@LinasNasvytis

2/ Verifiable rewards make self-improvement possible across many reasoning tasks.

But RLVR and prompt optimization can be expensive: they mostly rely on brute force guess-and-check, rather than explicitly extracting the general principles that separate success from failure.

13h28372

BOOKMARKS3

Linas Nasvytis@LinasNasvytis

6/ Not every insight enters memory.

CORE keeps only those insights that help solve previous problems, and tracks the kind of problems they help on. In this way, it enables the model to comparatively reason about its past reasoning to improve it.

Some examples:

13h20363

LIKES8

Linas Nasvytis@LinasNasvytis

4/ The central challenge is to find reusable abstractions that correctly assign credit to information that’s proven itself to be valuable.

CORE uses *contrastive self-reflection* to do so: it extracts transferable insights that distinguish successful traces from failed attempts.

13h22483

REPLIES1

Linas Nasvytis@LinasNasvytis

8/ CORE achieved these gains while adding far fewer context tokens: 37x less than RAG, 36x less than MemRL, and 1.4x less than GEPA.

Bottom line: CORE was more efficient than the baselines on each dimension: fewer rollouts, fewer training samples, and fewer added context tokens.

13h20852

Linas Nasvytis@LinasNasvytis

5/ Insights are short natural language statements that capture generalizable differences between failed & successful reasoning traces. Generating and refining insights allows models to improve quickly over time.

As a bonus, insights are interpretable!

13h20072

Linas Nasvytis@LinasNasvytis

9/ When models improve with methods like RLVR or by reusing their past reasoning traces, we often don’t know exactly *where* the gains came from.

CORE makes the gains inspectable: it assigns credit to compact, reusable abstractions, and tracks when and how much each one helps.

13h26162

Linas Nasvytis@LinasNasvytis

7/ Using gpt-oss-120b as the base model, we found that CORE improved faster than GRPO, GEPA, episodic RAG, and MemRL across 4 different reasoning tasks.

In the 10-example train setting, CORE beat every baseline’s best result (after 1000s of rollouts) within its first 350 rollouts

13h19462

Linas Nasvytis@LinasNasvytis

3/ External memory offers an alternative: keep the model frozen, and store what’s learned outside the weights.

But what should go into the memory store?

Raw traces are long and too specific, while continuously summarized memories can be unstable & miss what was actually important.

13h26252

Linas Nasvytis@LinasNasvytis

10/ This work was co-first-authored with @sjeromehan, together with Ben Prystawski, @satchelgrant, @noahdgoodman, and @judyefan.

arXiv: https://arxiv.org/abs/2605.28742 Code: https://github.com/LinasNas/core-reasoning Website: https://LinasNas.github.io/core-reasoning/

13h18931

Khai Loong Aw@khai_loong_aw

@LinasNasvytis Love the idea and that it seems like how humans find reusable abstractions to learn from reasoning traces too

11h1532

Phil Trubey@PTrubey

@LinasNasvytis Love the animation! Great way to present such a chart.

Oh and the results look good too 😄

7h22