Google DeepMind's Susan Zhang proposes that detecting LLM output requires a detector with greater parametric capacity than the target model
Calculating a model's exact parametric capacity remains unsolved.
corollary:
generative language modeling
vs
classification of (arbitrarily long bodies of) text as being synthetically generated
have the same complexity
it is necessary, but probably not sufficient, that the output from an llm with parametric capacity P can only be detected (with epsilon error) by an llm with parametric capacity P', where P' is _strictly greater than_ P *exercise left to the reader for how to compute P for a given llm - Susan's llm detection law, or whoever finds a better citation
@suchenzang Is this what they meant by real recognizes real *badum tss*
it is necessary, but probably not sufficient, that the output from an llm with parametric capacity P can only be detected (with epsilon error) by an llm with parametric capacity P', where P' is _strictly greater than_ P *exercise left to the reader for how to compute P for a given llm - Susan's llm detection law, or whoever finds a better citation