🐐
A Stanford assistant professor and a small lab of graduate students sat down in March 2023 and reproduced the behavior of ChatGPT for under $600.
The model they released became the template that every open-source instruction-tuned model on the planet now copies. The evaluation system they built became how the entire field measures AI alignment. Most people scrolling through AI Twitter cannot name him.
His name is Tatsunori Hashimoto. His lab is called the Tatsu Lab.
Here is the story, because almost nobody outside the language model research world knows what one Stanford lab has quietly shipped.
Tatsu grew up between two continents and studied at MIT, where he eventually earned his PhD. After finishing he moved to Stanford as a postdoctoral researcher in 2019, co-advised by Percy Liang and John Duchi, two of the most respected names in machine learning. The combination is unusual. Most postdocs work with one advisor. Tatsu sat at the intersection of statistical machine learning, robustness, and natural language processing, which meant he could draw from both camps.
By 2020 he was hired as an Assistant Professor in the Stanford Computer Science Department. He joined the statistical machine learning and NLP groups. His research focused on something most of the field was ignoring at the time. How do you actually evaluate language models in a way that is rigorous, reproducible, and not gameable?
Then ChatGPT launched in November 2022.
Within four months Tatsu and his students did something nobody else in the open-source world had figured out. They took Meta's just-released Llama model, fine-tuned it on instructions generated by GPT-3.5, and released Stanford Alpaca on March 13, 2023. The training cost less than $600. The resulting model behaved like ChatGPT on most everyday tasks.
The release went nuclear. Within days every open-source AI project on Earth was running variants of the Alpaca recipe. The technique he and his students used became the standard. Every "fine-tune your own ChatGPT" tutorial that exists traces back to this lab in Stanford.
Then he built the evaluation system.
In 2023 his group released AlpacaEval, an automatic evaluator for instruction-following language models. The idea was simple and powerful. Instead of paying humans hundreds of thousands of dollars to evaluate model outputs, you use a strong language model as the judge against a reference model. The results were highly correlated with human expert annotations. Suddenly the entire open-source community had a fast, cheap, reproducible way to compare instruction-tuned models against each other.
Over 100 models have been added to the AlpacaEval leaderboard. Every major open-source release from Mistral to Llama to DeepSeek runs against it. The repository lives at github .com/tatsu-lab/alpaca_eval. It is one of the most cited language model evaluation systems in the field.
In 2024 his group released Length-Controlled AlpacaEval, a debiased version that strips out the trick of making outputs longer to win evaluations. The community had been gaming the original. He fixed it and released the patch.
The Tatsu Lab also released AlpacaFarm, a simulation framework for studying how language models learn from human feedback, with collaborators including Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, and Ishaan Gulrajani. Several of these collaborators are now at OpenAI, Anthropic, and other frontier labs. His postdocs Niladri Chatterji and Shibani Santurkar both ended up at Meta and OpenAI doing core research.
Tatsu still publishes constantly. His Google Scholar reads like a map of the modern alignment field. He keeps a low public profile. He gives almost no media interviews. His Stanford homepage is a flat list of papers with no styling and no marketing copy.
A Stanford lab that most people outside academic AI cannot name built the open-source ChatGPT recipe, the evaluation system the field now runs on, and trained the researchers who went on to power frontier labs.
He did it from a small group of graduate students.




