2h ago

HRM-Text paper from Sapient Intelligence and MIT presents hierarchical recurrent pretraining that lets a 1B model trained on 40B tokens reach 60.7% MMLU, 84.5% GSM8K and other strong scores in one day of compute

A 0.6B TRM variant beats standard 3B-scale Transformers on downstream tasks.

194555230131.2K

——0——

Original post

#486@JM_ALEXIAOP

Guan Wang@MAKINGAGI

The HRM-Text paper is now available 🎉 HRM-Text explores a different approach to language model pretraining: hierarchical recurrent computation, task-completion training, and latent-space reasoning. At just 1B parameters, HRM-Text achieves competitive performance with dramatically lower training cost and data requirements. 1B parameters 40B unique tokens ~1 day of pretraining ~$1000 training cost

4:47 AM · May 20, 2026

QUOTE POST

#486Alexia Jolicoeur-Martineau@JM_ALEXIA

Amazing work by @Sapient_Int showing the massive potential of smaller recursive models.

At the smallest scale (0.6B) their TRM variant achieves the best scores on downstream tasks and beats Transformers trained at the 3B scale.

Guan Wang@makingAGI

11:47 AM · May 20, 2026 · 23.9K Views

1:04 PM · May 20, 2026 · 4.6K Views

QUOTE POST

#867Alexander Doria@DORIALEXANDER

New HRM-Text paper featuring SYNTH as leading training source.

Guan Wang@makingAGI

11:47 AM · May 20, 2026 · 23.9K Views

12:13 PM · May 20, 2026 · 2.6K Views

#867Alexander Doria@DORIALEXANDER

Also include architecture ablations. Genuine gains from the HRM side but data mix already leverages a high floor (MMLU at +50% on 40B tokens alone).

Alexander Doria@Dorialexander

New HRM-Text paper featuring SYNTH as leading training source.

12:13 PM · May 20, 2026 · 2.6K Views

2:12 PM · May 20, 2026 · 121 Views

HRM-Text paper from Sapient Intelligence and MIT presents hierarchical recurrent pretraining that lets a 1B model trained on 40B tokens reach 60.7% MMLU, 84.5% GSM8K and other strong scores in one day of compute

Cluster engagement

Sentiment