Researchers debate enforcement policies for arXiv LLM submissions
Researchers discuss enforcement policies for large language model use in arXiv submissions. They identify clear markers of AI generated content including hallucinated references and embedded meta comments. Current models can detect these patterns shifting focus to requirements for verifiable original reasoning. Discussions cover stable rules penalties such as one year submission bans handling of model text in LaTeX files and workload for arXiv maintainers.
Examples of incontrovertible evidence: hallucinated references, meta-comments from the LLM ("here is a 200 word summary; would you like me to make any changes?"; "the data in this table is illustrative, fill it in with the real numbers from your experiments") end/
The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue. 4/
@roydanroy @SovereignJap Is it? The slop detectors are very noisy. Would you want your paper falsely labeled as "high likelihood of slop"? Lawsuits would swiftly follow, I can assure you
@DimitrisPapail @tdietterich I think we mostly agree (and personally, it's not clear to me than banning almost-fully-AI-generated submissions is even good policy). But practically speaking, I think these levers will be used sparingly and likely only for extremely egregious slop posters.
I love arxiv, and it's been an incredible resource for science. The LLM slop fight is unwinnable though, and will put an incredible additional burden on the maintainers, will create many slippery slopes, and will frustrate authors. Also, perhaps the oddity in all this is-- if hallucinated refs are the issue--one could in fact check for validity of references... with claude code or codex :)
I love arxiv, and it's been an incredible resource for science. The LLM slop fight is unwinnable though, and will put an incredible additional burden on the maintainers, will create many slippery slopes, and will frustrate authors. Also, perhaps the oddity in all this is-- if hallucinated refs are the issue--one could in fact check for validity of references... with claude code or codex :)
@DimitrisPapail I see your points, but I think you may also be discounting just how curated Arxiv already is. @tdietterich and others reject a ton of low-quality submissions. There are problems with the LLM proposal, but the mods want to maintain something similar to the current quality bar.
The median social system breaks under too much optimization pressure, and we should stop trying to optimize things.
I think this is yet another example of problems surfaced by LLMs actually reflecting deep flaws in our institutions—in this case, that many of our ways of evaluating work are through imperfect, goodharted proxies rather than engaging with the work itself.
I believe many things get worse when you try to optimize them because the underlying assumptions aren’t robust to the amount of computational power we are able to leverage.
The median social system breaks under too much optimization pressure, and we should stop trying to optimize things.
Is this referring to the rendered PDF or the LaTeX source? I certainly have papers where we didn't strip all the human feedback and so it is in the LaTeX source, which perhaps is not ideal but certainly doesn't feel particularly harmful/warranting of this penalty. Not sure why LM comments warrant such a drastic difference?
Examples of incontrovertible evidence: hallucinated references, meta-comments from the LLM ("here is a 200 word summary; would you like me to make any changes?"; "the data in this table is illustrative, fill it in with the real numbers from your experiments") end/
Totally agree, we need much more thorough checks on papers.
Re: arxiv LLM policies, it is now trivial to catch hallucinated citations, obvious LLM “if you’d like I can etc.” text, and so on, *by using current-gen LLMs*. What we really want is for output to be proof-of-thought, for which the mere existence of a paper no longer suffices.
Re: arxiv LLM policies, it is now trivial to catch hallucinated citations, obvious LLM “if you’d like I can etc.” text, and so on, *by using current-gen LLMs*. What we really want is for output to be proof-of-thought, for which the mere existence of a paper no longer suffices.
I think this is yet another example of problems surfaced by LLMs actually reflecting deep flaws in our institutions—in this case, that many of our ways of evaluating work are through imperfect, goodharted proxies rather than engaging with the work itself.
Re: arxiv LLM policies, it is now trivial to catch hallucinated citations, obvious LLM “if you’d like I can etc.” text, and so on, *by using current-gen LLMs*. What we really want is for output to be proof-of-thought, for which the mere existence of a paper no longer suffices.
@roydanroy Is a paper with no references but a "sample text that may or may not come from an LLM of papers that may or may not exist" an issue? Is a paper with no references an issue?
This only works if there is an agreed upon list of "papers that exist" each with a unique reference number
@roydanroy Also what is the definition of a hallucinated reference? Is the latex compiler accidentally putting two NeurIPS editors as authors an hallucination? Is adding a reference in v2 following a reviewer request an hallucination if it doesn't exist?
@roydanroy And an agreed upon definition of what counts as a reference.
Also (personal opinion) I think that this is a very easy objective for LLMs to be finetuned on, and much like "look at the hands" will serve as a detection tool for about five minutes
@roydanroy Is a paper with no references but a "sample text that may or may not come from an LLM of papers that may or may not exist" an issue? Is a paper with no references an issue? This only works if there is an agreed upon list of "papers that exist" each with a unique reference number
@tdietterich This is way too strict. Errors can slip in when using any tools. We aren't perfect
Having a prompt left in is a mistake, it's sloppy but giving permanent answer a one time sloppiness is absurd
Examples of incontrovertible evidence: hallucinated references, meta-comments from the LLM ("here is a 200 word summary; would you like me to make any changes?"; "the data in this table is illustrative, fill it in with the real numbers from your experiments") end/