@jxmnop I’m sorry sir but looking at every row of pretraining data with fable is not worth the money
An underrated part of this discussion is that (a) there's huge leverage in improving data, and (b) there's no way Anthropic could safeguard this
xAI could instruct Fable to look through EVERY row of pretraining data and fix any typos and errors. this probably the single highest-leverage activity for a lab playing catchup
and it's not possible for Anthropic to prevent this without completely kneecapping the model itself, because data quality work looks like any other kind of knowledge work ("check this text for errors", "rewrite this in a formal tone")


