/Tech9h ago

Mechanize podcast guests argue AI models struggle to build emulators because of low-quality training data

The episode contrasts evaluations with reinforcement learning environments.

91331223813.5K

#716

Original post

Mechanize@MechanizeWork

Our new podcast on evals, with Max Niederman, Ege Erdil, and Stephen Yang.

0:00:00 – What's an eval, and how's it different from an RL environment? 0:19:33 – Why are models bad at building an emulator when the task is fully verifiable? 0:42:00 – How does training on bad data teach models to write terrible code? 1:04:00 – Why is continual learning still so bad? 2:25:24 – Why haven't software engineers been replaced when coding is basically solved?

Listen to the Mechanize Podcast on YouTube, Spotify, etc. Enjoy!

12:24 PM · Jun 16, 2026 · 13.9K Views

Sentiment

Users appreciated the Mechanize Podcast for its commentary on RL environment scaling, AI evals, and software engineering, calling the episode interesting and enjoyable.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS952BOOKMARKS3LIKES3

Mechanize@MechanizeWork

Youtube: https://www.youtube.com/watch?v=UpO70AJGdJ8 Substack: https://mechanizework.substack.com/p/how-bad-data-teaches-models-to-write Spotify: https://open.spotify.com/show/033krxvlEYnqkHhClcM39e

8h95233

RETWEETS12

Mechanize@MechanizeWork

Our new podcast on evals, with Max Niederman, Ege Erdil, and Stephen Yang.

Listen to the Mechanize Podcast on YouTube, Spotify, etc. Enjoy!