/Tech11h ago

Elias Eskin releases PragReST, a self-training method that improves LLM pragmatic reasoning by up to 5.5%

The pipeline requires no human annotations or teacher distillation.

126105963

#208

Original post

Elias Stengel-Eskin@EliasEskin

🚨 Excited to share Pragmatic Reasoning via Self-Training, a method for LLM self-improvent on pragmatic reasoning. PragReST improves by +5.37% and +5.50% for Qwen3-8B/14B across pragmatics benchmarks with no human annotations or teacher models.

LLMs still struggle w/ pragmatics: understanding what a speaker means, not just what they literally said. They often default to literal interpretations and miss implicature, intent, or context-dependent meaning. To close this gap, we started with a key question: Can we treat pragmatics as a LLM reasoning task?

➡️ Following a long line of work in pragmatics (e.g. RSA, IBR), PragReST treats pragmatic understanding as counterfactual reasoning. Instead of teaching models to ask “is this interpretation compatible with the words?”, we teach the model to reason about questions like “if the speaker meant something else, what would they have said instead?”

➡️ PragReST is self-improving: it self-generates pragmatic QA data, self-filters noisy examples, learns counterfactual reasoning traces via SFT, and further improves with GRPO using a self-judged correctness reward.

➡️ Error analysis shows that gains correlate with increased counterfactual reasoning. This suggests PragReST’s improvements are tied to reasoning over communicative alternatives, rather than simply more pragmatic data or more training.

🧵👇

9:44 AM · Jun 18, 2026 · 971 Views

Sentiment

Positive users praised PragReST authors for advancing LLM pragmatic reasoning via self-training, while negative users dismissed the gains as not on par with Fable 5.

Pos

50.0%

Neg

50.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

Elias Stengel-Eskin@EliasEskin

PragReST trains pragmatic reasoning through a self-reinforcing loop following an SFT + RLAIF pipeline. We prompt an LLM to generate/filter pragmatic QA data, which is used for SFT w/ a prompt that teaches the model to follow a counterfactual script. That’s followed by an RL stage where the LLM judges its own generated rollouts against the intended answer.

Crucially, in all stages, there are no external models! The model being trained:

1. Generates its own QA data (questions + expected answers) 2. Filters the data for quality 3. Generates and filters data for SFT 4. Generates and judges rollouts for GRPO

The only place where human intuition enters the picture is via the counterfactual reasoning instructions and domain types + pragmatic category descriptions + few-shot examples for QA generation.

12h1813

BOOKMARKS1REPLIES1

Elias Stengel-Eskin@EliasEskin

Great job Jihyung Park and Minchao Huang on leading this, with @leqi_liu @UTCompSci

Code: https://github.com/jihyung803/PragReST Paper: https://arxiv.org/abs/2606.18624 HF: https://huggingface.co/papers/2606.18624

12h5811

LIKES3

Elias Stengel-Eskin@EliasEskin

Where do these gains come from? PragReST reduces errors exactly where counterfactual reasoning should matter most, e.g., reduces missed communicative intent errors from 40 to 22, literal/surface bias errors from 30 to 15, and figurative/humor mapping errors from 8 to 4 compared to the instruct model.

📉 Error reductions also correlate with increased counterfactual reasoning scores, suggesting the gains are tied to reasoning over communicative alternatives rather than simply more pragmatic data or more training.

🌍 OOD: Importantly, PragReST preserves broader model capabilities. On MMLU-Pro, MATH-500, AIME2025, and TruthfulQA, performance stays close to the Instruct backbone, showing no systematic degradation in general knowledge, math reasoning, or factual truthfulness.

12h703

RETWEETS10

Elias Stengel-Eskin@EliasEskin

🧵👇

12h971275

Elias Stengel-Eskin@EliasEskin

PragReST consistently improves pragmatic reasoning across multiple benchmarks and underscores the importance of counterfactual reasoning.

On accuracy-based benchmarks, PragReST improves over the Instruct backbone by +5.37% on Qwen3-8B and +5.50% on Qwen3-14B on average. Notably, Qwen3-14B PragReST reaches near-human performance on each benchmark, and is preferred over the base model in 67.4% of AltPrag pairwise comparisons.

These gains are consistent across backbones, task-specific pragmatic tuning baselines, and non-counterfactual variants of the same pipeline, suggesting that counterfactual reasoning is the key ingredient rather than just more pragmatic training data.

12h943

Bali as a Colony of Jakarta@pakar_ai

@EliasEskin @leqi_liu @UTCompSci this isn't on par with fable 5

11h52