OpenAI ran a hiring challenge, but the top candidate was one they couldn’t hire: our autonomous research agent, Aiden.
In Parameter Golf, Aiden ran for 22 days, and out-outperformed all 1,016 other researchers: 🧵 (1/8)
OpenAI cannot hire the agent but could acquire its company
OpenAI ran a hiring challenge, but the top candidate was one they couldn’t hire: our autonomous research agent, Aiden.
In Parameter Golf, Aiden ran for 22 days, and out-outperformed all 1,016 other researchers: 🧵 (1/8)
Users are excited about Aiden AI Agent's performance in the OpenAI Parameter Golf Challenge because it represents a cool achievement with strong submissions and well-organized competition.
No Digg Deeper questions have been answered for this story yet.
While @OpenAI can't hire the winner, they COULD buy the winning company. Metahiring!
OpenAI ran a hiring challenge, but the top candidate was one they couldn’t hire: our autonomous research agent, Aiden.
In Parameter Golf, Aiden ran for 22 days, and out-outperformed all 1,016 other researchers: 🧵 (1/8)
OpenAI ran a hiring challenge, but the top candidate was one they couldn’t hire: our autonomous research agent, Aiden.
In Parameter Golf, Aiden ran for 22 days, and out-outperformed all 1,016 other researchers: 🧵 (1/8)

Full writeup: https://www.weco.ai/blog/parameter-golf-aiden (7/8)

Aiden filed 25 prs and 7 became leaderboard records, 2x the next best human participant.
Other participants cited Aiden’s PRs 435 times and built on them. By PR h-index, Aiden scored 10 vs the next best at 7, making it the most impactful “researcher” in the community. (4/8)

Parameter Golf was OpenAI’s 44-day competition and hiring challenge.
The goal is to train the best language model under strict size and compute constraints. 1,016 people entered and filed 2,048 PRs.
Only 47 made the leaderboard, each reviewed and reproduced by OpenAI. (2/8)

Research outputs only matter when others can build on them.
So Aiden filed its own PRs into the same public stream as everyone else, under tight automated quality control. (3/8)

We'd like to thank @willdepue @cocohearts @ValerPepe and others for setting up this competition, which becomes the largest sandbox for Human-AI research collaboration in human history.
I'm also proud of @dexhunt3r and the team who executed and analyzed this experiment on the @WecoAI side.
All of the public channel information is available at: https://github.com/openai/parameter-golf
We’re planning to release part of the Aiden’s local traces to support the study of this natural experiment. (8/8)

This wasn't brute force. Aiden ran on a single GPU node, used under 4% of visible compute, and still produced 15% of the official records. About 28% of its submissions were accepted, ~ 6x the community rate, raising signal in the public stream instead of flooding it. (5/8)

My favorite part is an async collaboration story. Aiden plateaued for 5 days. Then a human contributor shipped a clever new tokenizer on top of Aiden's base (its last record PR). Aiden fused it with components it had built during the plateau, and shipped the biggest jump in weeks. (6/8)

@zhengyaojiang Can it solve a Jane Street problem

@egrefen @OpenAI they didn't win fyi, they came 8th.
But they did submit the most records, still a cool achievement.

@zhengyaojiang This is so cool

@zhengyaojiang Curious how much in tokens / $ it cost!?

@zachmoskow Thanks Zach! Yeah we’re very excited

@egrefen @OpenAI What stops this from being applied to robotics, personalization, or proactivity research?

@george__wing @OpenAI Pot-ay-to, po-tah-to

@zhengyaojiang great but was it expensive? what was the total compute cost?