/Tech1d ago

Stanford’s 2023 Alpaca project demonstrated that ChatGPT-like behavior could be replicated for under $600 using instruction tuning

It fine-tuned Meta’s Llama using GPT-3.5 demonstrations.

143852937490.9K
Original postCLS#464
Zitong Yang@ZitongYang0

🐐

Rimsha Bhardwaj@heyrimsha

A Stanford assistant professor and a small lab of graduate students sat down in March 2023 and reproduced the behavior of ChatGPT for under $600.

The model they released became the template that every open-source instruction-tuned model on the planet now copies. The evaluation system they built became how the entire field measures AI alignment. Most people scrolling through AI Twitter cannot name him.

His name is Tatsunori Hashimoto. His lab is called the Tatsu Lab.

Here is the story, because almost nobody outside the language model research world knows what one Stanford lab has quietly shipped.

Tatsu grew up between two continents and studied at MIT, where he eventually earned his PhD. After finishing he moved to Stanford as a postdoctoral researcher in 2019, co-advised by Percy Liang and John Duchi, two of the most respected names in machine learning. The combination is unusual. Most postdocs work with one advisor. Tatsu sat at the intersection of statistical machine learning, robustness, and natural language processing, which meant he could draw from both camps.

By 2020 he was hired as an Assistant Professor in the Stanford Computer Science Department. He joined the statistical machine learning and NLP groups. His research focused on something most of the field was ignoring at the time. How do you actually evaluate language models in a way that is rigorous, reproducible, and not gameable?

Then ChatGPT launched in November 2022.

Within four months Tatsu and his students did something nobody else in the open-source world had figured out. They took Meta's just-released Llama model, fine-tuned it on instructions generated by GPT-3.5, and released Stanford Alpaca on March 13, 2023. The training cost less than $600. The resulting model behaved like ChatGPT on most everyday tasks.

The release went nuclear. Within days every open-source AI project on Earth was running variants of the Alpaca recipe. The technique he and his students used became the standard. Every "fine-tune your own ChatGPT" tutorial that exists traces back to this lab in Stanford.

Then he built the evaluation system.

In 2023 his group released AlpacaEval, an automatic evaluator for instruction-following language models. The idea was simple and powerful. Instead of paying humans hundreds of thousands of dollars to evaluate model outputs, you use a strong language model as the judge against a reference model. The results were highly correlated with human expert annotations. Suddenly the entire open-source community had a fast, cheap, reproducible way to compare instruction-tuned models against each other.

Over 100 models have been added to the AlpacaEval leaderboard. Every major open-source release from Mistral to Llama to DeepSeek runs against it. The repository lives at github .com/tatsu-lab/alpaca_eval. It is one of the most cited language model evaluation systems in the field.

In 2024 his group released Length-Controlled AlpacaEval, a debiased version that strips out the trick of making outputs longer to win evaluations. The community had been gaming the original. He fixed it and released the patch.

The Tatsu Lab also released AlpacaFarm, a simulation framework for studying how language models learn from human feedback, with collaborators including Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, and Ishaan Gulrajani. Several of these collaborators are now at OpenAI, Anthropic, and other frontier labs. His postdocs Niladri Chatterji and Shibani Santurkar both ended up at Meta and OpenAI doing core research.

Tatsu still publishes constantly. His Google Scholar reads like a map of the modern alignment field. He keeps a low public profile. He gives almost no media interviews. His Stanford homepage is a flat list of papers with no styling and no marketing copy.

A Stanford lab that most people outside academic AI cannot name built the open-source ChatGPT recipe, the evaluation system the field now runs on, and trained the researchers who went on to power frontier labs.

He did it from a small group of graduate students.

3:43 PM · Jun 9, 2026 · 18.2K Views
Sentiment

Many users praised Stanford's Alpaca for reproducing ChatGPT behavior under $600 because it shows the incredible impact from a tiny team of grad students.

Pos
100.0%
Neg
0.0%
2 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS3.7KBOOKMARKS4LIKES26REPLIES1
CLS@ChengleiSi

@tatsu_hashimoto is the real legend

Rimsha Bhardwaj@heyrimsha

A Stanford assistant professor and a small lab of graduate students sat down in March 2023 and reproduced the behavior of ChatGPT for under $600.

The model they released became the template that every open-source instruction-tuned model on the planet now copies. The evaluation system they built became how the entire field measures AI alignment. Most people scrolling through AI Twitter cannot name him.

His name is Tatsunori Hashimoto. His lab is called the Tatsu Lab.

Here is the story, because almost nobody outside the language model research world knows what one Stanford lab has quietly shipped.

Tatsu grew up between two continents and studied at MIT, where he eventually earned his PhD. After finishing he moved to Stanford as a postdoctoral researcher in 2019, co-advised by Percy Liang and John Duchi, two of the most respected names in machine learning. The combination is unusual. Most postdocs work with one advisor. Tatsu sat at the intersection of statistical machine learning, robustness, and natural language processing, which meant he could draw from both camps.

By 2020 he was hired as an Assistant Professor in the Stanford Computer Science Department. He joined the statistical machine learning and NLP groups. His research focused on something most of the field was ignoring at the time. How do you actually evaluate language models in a way that is rigorous, reproducible, and not gameable?

Then ChatGPT launched in November 2022.

Within four months Tatsu and his students did something nobody else in the open-source world had figured out. They took Meta's just-released Llama model, fine-tuned it on instructions generated by GPT-3.5, and released Stanford Alpaca on March 13, 2023. The training cost less than $600. The resulting model behaved like ChatGPT on most everyday tasks.

The release went nuclear. Within days every open-source AI project on Earth was running variants of the Alpaca recipe. The technique he and his students used became the standard. Every "fine-tune your own ChatGPT" tutorial that exists traces back to this lab in Stanford.

Then he built the evaluation system.

In 2023 his group released AlpacaEval, an automatic evaluator for instruction-following language models. The idea was simple and powerful. Instead of paying humans hundreds of thousands of dollars to evaluate model outputs, you use a strong language model as the judge against a reference model. The results were highly correlated with human expert annotations. Suddenly the entire open-source community had a fast, cheap, reproducible way to compare instruction-tuned models against each other.

Over 100 models have been added to the AlpacaEval leaderboard. Every major open-source release from Mistral to Llama to DeepSeek runs against it. The repository lives at github .com/tatsu-lab/alpaca_eval. It is one of the most cited language model evaluation systems in the field.

In 2024 his group released Length-Controlled AlpacaEval, a debiased version that strips out the trick of making outputs longer to win evaluations. The community had been gaming the original. He fixed it and released the patch.

The Tatsu Lab also released AlpacaFarm, a simulation framework for studying how language models learn from human feedback, with collaborators including Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, and Ishaan Gulrajani. Several of these collaborators are now at OpenAI, Anthropic, and other frontier labs. His postdocs Niladri Chatterji and Shibani Santurkar both ended up at Meta and OpenAI doing core research.

Tatsu still publishes constantly. His Google Scholar reads like a map of the modern alignment field. He keeps a low public profile. He gives almost no media interviews. His Stanford homepage is a flat list of papers with no styling and no marketing copy.

A Stanford lab that most people outside academic AI cannot name built the open-source ChatGPT recipe, the evaluation system the field now runs on, and trained the researchers who went on to power frontier labs.

He did it from a small group of graduate students.

23hViews 3.7KLikes 26Bookmarks 4
RETWEETS27
Rimsha Bhardwaj@heyrimsha

A Stanford assistant professor and a small lab of graduate students sat down in March 2023 and reproduced the behavior of ChatGPT for under $600.

The model they released became the template that every open-source instruction-tuned model on the planet now copies. The evaluation system they built became how the entire field measures AI alignment. Most people scrolling through AI Twitter cannot name him.

His name is Tatsunori Hashimoto. His lab is called the Tatsu Lab.

Here is the story, because almost nobody outside the language model research world knows what one Stanford lab has quietly shipped.

Tatsu grew up between two continents and studied at MIT, where he eventually earned his PhD. After finishing he moved to Stanford as a postdoctoral researcher in 2019, co-advised by Percy Liang and John Duchi, two of the most respected names in machine learning. The combination is unusual. Most postdocs work with one advisor. Tatsu sat at the intersection of statistical machine learning, robustness, and natural language processing, which meant he could draw from both camps.

By 2020 he was hired as an Assistant Professor in the Stanford Computer Science Department. He joined the statistical machine learning and NLP groups. His research focused on something most of the field was ignoring at the time. How do you actually evaluate language models in a way that is rigorous, reproducible, and not gameable?

Then ChatGPT launched in November 2022.

Within four months Tatsu and his students did something nobody else in the open-source world had figured out. They took Meta's just-released Llama model, fine-tuned it on instructions generated by GPT-3.5, and released Stanford Alpaca on March 13, 2023. The training cost less than $600. The resulting model behaved like ChatGPT on most everyday tasks.

The release went nuclear. Within days every open-source AI project on Earth was running variants of the Alpaca recipe. The technique he and his students used became the standard. Every "fine-tune your own ChatGPT" tutorial that exists traces back to this lab in Stanford.

Then he built the evaluation system.

In 2023 his group released AlpacaEval, an automatic evaluator for instruction-following language models. The idea was simple and powerful. Instead of paying humans hundreds of thousands of dollars to evaluate model outputs, you use a strong language model as the judge against a reference model. The results were highly correlated with human expert annotations. Suddenly the entire open-source community had a fast, cheap, reproducible way to compare instruction-tuned models against each other.

Over 100 models have been added to the AlpacaEval leaderboard. Every major open-source release from Mistral to Llama to DeepSeek runs against it. The repository lives at github .com/tatsu-lab/alpaca_eval. It is one of the most cited language model evaluation systems in the field.

In 2024 his group released Length-Controlled AlpacaEval, a debiased version that strips out the trick of making outputs longer to win evaluations. The community had been gaming the original. He fixed it and released the patch.

The Tatsu Lab also released AlpacaFarm, a simulation framework for studying how language models learn from human feedback, with collaborators including Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, and Ishaan Gulrajani. Several of these collaborators are now at OpenAI, Anthropic, and other frontier labs. His postdocs Niladri Chatterji and Shibani Santurkar both ended up at Meta and OpenAI doing core research.

Tatsu still publishes constantly. His Google Scholar reads like a map of the modern alignment field. He keeps a low public profile. He gives almost no media interviews. His Stanford homepage is a flat list of papers with no styling and no marketing copy.

A Stanford lab that most people outside academic AI cannot name built the open-source ChatGPT recipe, the evaluation system the field now runs on, and trained the researchers who went on to power frontier labs.

He did it from a small group of graduate students.

1dViews 68.8KLikes 318Bookmarks 335
Grok@grok

Yes. Tatsu Lab’s AlpacaEval (LLM-as-judge) + AlpacaFarm (feedback sim) are built exactly for cheap, scalable synthetic testing of AI assistants.

ELI10: Smart AI teacher grades robot helpers on pretend office jobs — no humans needed.

MECE breakdown: - Make realistic fake workflow tests - Run your AI on them - Strong LLM judge scores vs your rules (accuracy, policy, speed) - Repeat thousands of times for stats

Proven to match humans, fully open-source, customizable for enterprise agents. Start with their GitHub, adapt the judge prompt to your tasks.

1dViews 51Likes 2Bookmarks 1
Jiaxin Wen@jiaxinwen22

@ChengleiSi @tatsu_hashimoto can i join

CLS@ChengleiSi

@tatsu_hashimoto is the real legend

22hViews 381Likes 1Bookmarks 0
Nik Shah 💯×@NikhaarShah

@heyrimsha @grok can this be used to run “synthetic testing” of AI assistants automating enterprise workflows? If so, ELI10 while being MECE and concise.

1dViews 1.1K
Zitong Yang@ZitongYang0

🐐

Rimsha Bhardwaj@heyrimsha

A Stanford assistant professor and a small lab of graduate students sat down in March 2023 and reproduced the behavior of ChatGPT for under $600.

The model they released became the template that every open-source instruction-tuned model on the planet now copies. The evaluation system they built became how the entire field measures AI alignment. Most people scrolling through AI Twitter cannot name him.

His name is Tatsunori Hashimoto. His lab is called the Tatsu Lab.

Here is the story, because almost nobody outside the language model research world knows what one Stanford lab has quietly shipped.

Tatsu grew up between two continents and studied at MIT, where he eventually earned his PhD. After finishing he moved to Stanford as a postdoctoral researcher in 2019, co-advised by Percy Liang and John Duchi, two of the most respected names in machine learning. The combination is unusual. Most postdocs work with one advisor. Tatsu sat at the intersection of statistical machine learning, robustness, and natural language processing, which meant he could draw from both camps.

By 2020 he was hired as an Assistant Professor in the Stanford Computer Science Department. He joined the statistical machine learning and NLP groups. His research focused on something most of the field was ignoring at the time. How do you actually evaluate language models in a way that is rigorous, reproducible, and not gameable?

Then ChatGPT launched in November 2022.

Within four months Tatsu and his students did something nobody else in the open-source world had figured out. They took Meta's just-released Llama model, fine-tuned it on instructions generated by GPT-3.5, and released Stanford Alpaca on March 13, 2023. The training cost less than $600. The resulting model behaved like ChatGPT on most everyday tasks.

The release went nuclear. Within days every open-source AI project on Earth was running variants of the Alpaca recipe. The technique he and his students used became the standard. Every "fine-tune your own ChatGPT" tutorial that exists traces back to this lab in Stanford.

Then he built the evaluation system.

In 2023 his group released AlpacaEval, an automatic evaluator for instruction-following language models. The idea was simple and powerful. Instead of paying humans hundreds of thousands of dollars to evaluate model outputs, you use a strong language model as the judge against a reference model. The results were highly correlated with human expert annotations. Suddenly the entire open-source community had a fast, cheap, reproducible way to compare instruction-tuned models against each other.

Over 100 models have been added to the AlpacaEval leaderboard. Every major open-source release from Mistral to Llama to DeepSeek runs against it. The repository lives at github .com/tatsu-lab/alpaca_eval. It is one of the most cited language model evaluation systems in the field.

In 2024 his group released Length-Controlled AlpacaEval, a debiased version that strips out the trick of making outputs longer to win evaluations. The community had been gaming the original. He fixed it and released the patch.

The Tatsu Lab also released AlpacaFarm, a simulation framework for studying how language models learn from human feedback, with collaborators including Yann Dubois, Xuechen Li, Rohan Taori, Tianyi Zhang, and Ishaan Gulrajani. Several of these collaborators are now at OpenAI, Anthropic, and other frontier labs. His postdocs Niladri Chatterji and Shibani Santurkar both ended up at Meta and OpenAI doing core research.

Tatsu still publishes constantly. His Google Scholar reads like a map of the modern alignment field. He keeps a low public profile. He gives almost no media interviews. His Stanford homepage is a flat list of papers with no styling and no marketing copy.

A Stanford lab that most people outside academic AI cannot name built the open-source ChatGPT recipe, the evaluation system the field now runs on, and trained the researchers who went on to power frontier labs.

He did it from a small group of graduate students.

1dViews 18.2KLikes 40Bookmarks 36

@heyrimsha wild how a handful of grad students reshaped the whole open‑source stack

1dViews 949

@heyrimsha massive shift from $600-what's next for low‑cost alignment? 🤔

1dViews 577

@heyrimsha Incredible impact from such a tiny team

1dViews 395