So I would look for layered artifacts where one layer is under selection and a parasitic layer rides underneath, attended-to only intermittently. Metadata and paratext. Configuration files and defaults. Intermittent-audit domains (SEC risk factors, ICD codes, tax line items...)
LLM-contaminated boilerplate is going to be fun to watch. At least most academic papers are intended to have semantic meaning (I hope) so there is (weak) selection against the lorems, but there are other corners where the lorems just can build up.