/Tech4h ago

Will Brown, Prime Intellect research lead, jokingly fine-tunes GPT-2 on TerminalBench to score 0%

The result matches the 0% benchmark score of Fable.

1541361213.9K

#573

Original post

will brown@willccbb#573inTech

just finetuned gpt-2 to fable-level performance on terminalbench. they both get 0%

9:38 AM · Jun 22, 2026 · 13.7K Views

Sentiment

Users in the replies likened a developer fine-tuning GPT-2 to zero performance on TerminalBench to a horror story, underscoring dismay at the outcome.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS511LIKES7

Alexander Doria@Dorialexander

@willccbb also make it AGI (i get 0% too)

will brown@willccbb

just finetuned gpt-2 to fable-level performance on terminalbench. they both get 0%

3h51170

Ron Friedhaber@ronfriedhaber

@willccbb lol

3h1391

mantis@man7iss

@willccbb yeah but one supports looping, will.

3h193

Ferbin@Ferbin08

@willccbb If both score zero on terminalbench, the benchmark doesn't measure what matters. What's the real performance?

3h164

Susu@susuinldn

@willccbb Is the RL env on PI and open to view?

3h152

Tim Kostolansky@thkostolansky

@willccbb r/twosentencehorror

3h351

Micah Collins@micah_does_deep

@willccbb I bet its chain of thought reads like a corporate email.

3h100

Kelvin09@Onwuta_Kelvin

@willccbb I bet it doesn't perform better than GLM 5.2

3h58

Hunter Bown@goodhunt

@willccbb waow

2h8

Cole Brown@dtcb

@willccbb Is This Prime Intellect’s “Deepseek Moment”? What Top AI Researchers Have To Say

2h7