just finetuned gpt-2 to fable-level performance on terminalbench. they both get 0%
Will Brown, Prime Intellect research lead, jokingly fine-tunes GPT-2 on TerminalBench to score 0%
The result matches the 0% benchmark score of Fable.
Users in the replies likened a developer fine-tuning GPT-2 to zero performance on TerminalBench to a horror story, underscoring dismay at the outcome.
No Digg Deeper questions have been answered for this story yet.
Most Activity
@willccbb also make it AGI (i get 0% too)
just finetuned gpt-2 to fable-level performance on terminalbench. they both get 0%

@willccbb lol

@willccbb yeah but one supports looping, will.

@willccbb If both score zero on terminalbench, the benchmark doesn't measure what matters. What's the real performance?

@willccbb Is the RL env on PI and open to view?

@willccbb r/twosentencehorror

@willccbb I bet its chain of thought reads like a corporate email.

@willccbb I bet it doesn't perform better than GLM 5.2

@willccbb waow

@willccbb Is This Prime Intellect’s “Deepseek Moment”? What Top AI Researchers Have To Say