John Thickstun questions MAUVE reliability for Eso-LM after noting recent evaluations switched to ModernBERT-Large embeddings from the original GPT-2 and RoBERTa versions used in the metric's development · Digg