My intuition is that 4.7 and 4.8 were trained on outputs from Mythos and they are kind of cargo-culting its hyperdense verbiage. being evaluated by it would also increase this
been thinking about this some more and i wonder if another explanation is that from 4.7 on (where the technicalese really came into full force), the evaluating model was switched to mythos, who perhaps preferred the dense verbiage more than previous evaluator models did





