Jiaxin Wen positions vintage language models such as Talkie as baselines for testing pre-training and post-training technique interactions rather than rediscovering results like relativity
Alexander Doria prefers synthetic pretraining for controlled model behavior experiments.
@jiaxinwen22 I like vintage models a lot (likely trained the first one ever) but synthetic pretraining in general is a better frame for controlled experiments.
What's the most valuable thing you can do with vintage LMs like Talkie? I think people are misled by Demis's pitch about rediscovering Relativity. Vintage LMs are just great baselines for LM science, letting you test many hypotheses about how pre-training and post-training interact.