Substack analysis maps 1236 n-grams in Granta prize-winning story and traces 437 rare phrases plus 115 unique segments to Archive of Our Own fanfiction
Interactive tool shows story assembled from memorized online passages.
Paging Alan Sokol.
This from @TuhinChakr is brilliant. That prize winning story from Granta? Turns out it's just a bunch of random whole phrases taken directly from existing text on the internet. Tool allows you to trace those n-grams directly to their source, which is mostly random fanfiction. https://tuhinchakrabarty.substack.com/p/ai-slop-grantagate-and-bad-writing
Since this post has blown up.
1) The research is based on two papers
https://arxiv.org/pdf/2410.04265 https://arxiv.org/pdf/2504.07096
2) When writing about the matches I focused on webpages that are not defunct and fan fiction results were especially relevant to AI fiction but some phrases can be in other websites too. That does not change the point about genre mismatch or stitching rare expressions
3) The attribution engine is built using CommonCrawl that LLMs have been trained on. So it might not catch all the possible webpages that might have that expression
This from @TuhinChakr is brilliant. That prize winning story from Granta? Turns out it's just a bunch of random whole phrases taken directly from existing text on the internet. Tool allows you to trace those n-grams directly to their source, which is mostly random fanfiction. https://tuhinchakrabarty.substack.com/p/ai-slop-grantagate-and-bad-writing
@alexolegimas Thank you 🥹
This from @TuhinChakr is brilliant. That prize winning story from Granta? Turns out it's just a bunch of random whole phrases taken directly from existing text on the internet. Tool allows you to trace those n-grams directly to their source, which is mostly random fanfiction. https://tuhinchakrabarty.substack.com/p/ai-slop-grantagate-and-bad-writing
This from @TuhinChakr is brilliant. That prize winning story from Granta? Turns out it's just a bunch of random whole phrases taken directly from existing text on the internet. Tool allows you to trace those n-grams directly to their source, which is mostly random fanfiction.
