Adam Block argues training procedures must adapt when language models deploy averaged weights like EMA instead of the final iterate · Digg