VibeThinker is a 3B param model, with almost head to head benchmark result with Opus 4.5 on reasoning with novel SFT+GRPO.
Unusually strong for its size: with only 3B parameters, 94.3 on AIME26, 80.2 Pass@1 on LiveCodeBench v6, and 96.1% acceptance on recent unseen LeetCode contests.
"places it in the performance band of first-tier reasoning systems, matching or exceeding flagship models that are orders of magnitude larger, such as DeepSeek V3.2"
They start from a 3B Qwen2.5-Coder base model, then train it with carefully filtered hard examples, multi-solution supervised training, reinforcement learning on math/code/STEM tasks with verifiable rewards, self-distillation, instruction-focused RL, and a test-time answer-checking method called CLR.



