VibeThinker-3B reasoning model achieves 94.3 on AIME26, matching frontier models using a post-trained Qwen2.5-Coder base · Digg