1d ago

Evans and Goldberg link LLM negation errors to pretraining data

โ€”โ€”0โ€”โ€”

Owain Evans and Yoav Goldberg examined why large language models overlook negations during inference. Evans identified matching patterns already present in pretraining corpora, such as fiction metadata tags and time-sensitive factual claims that turn false after stated dates. A meta-learning experiment testing these patterns yielded no reduction in negation errors. Goldberg, professor of computer science at Bar-Ilan University and Research Director of AI2-Israel, stated that the behaviors align with established neural-network generalization and structured-prediction mechanisms once identified.

Original post

@OwainEvans_UK re not surprising in retrospect: i can only speak about myself. i could not have guessed it beforehand, but in retrospect it makes a lot of sense to me and i can think of (unvalidated) reasons/theories for these behaviors. they are consistent with how i think of LLM training.

10:09 AM ยท May 16, 2026 View on X

@OwainEvans_UK re not surprising in retrospect: i can only speak about myself. i could not have guessed it beforehand, but in retrospect it makes a lot of sense to me and i can think of (unvalidated) reasons/theories for these behaviors. they are consistent with how i think of LLM training.

(((ู„()(ู„() 'yoav))))๐Ÿ‘พ(((ู„()(ู„() 'yoav))))๐Ÿ‘พ@yoavgo

@OwainEvans_UK i am not saying naming is unusual, and it is not a criticism disguised as a praise. i meant what i wrote below. it is more a remark on myself: *i* would have written it differently, and i can see many reasons for why my version will be worse.

5:07 PM ยท May 16, 2026 ยท 1.1K Views
5:09 PM ยท May 16, 2026 ยท 613 Views

@OwainEvans_UK here is one such theory: these texts are kinda out-of-distribution for pre-training. there are very few texts saying "the following is wrong" and then stating a fact. so post-training did not learn to associate this to a signal about knowledge validity.

(((ู„()(ู„() 'yoav))))๐Ÿ‘พ(((ู„()(ู„() 'yoav))))๐Ÿ‘พ@yoavgo

@OwainEvans_UK re not surprising in retrospect: i can only speak about myself. i could not have guessed it beforehand, but in retrospect it makes a lot of sense to me and i can think of (unvalidated) reasons/theories for these behaviors. they are consistent with how i think of LLM training.

5:09 PM ยท May 16, 2026 ยท 613 Views
5:11 PM ยท May 16, 2026 ยท 2.8K Views

@OwainEvans_UK additionally, the *this is wrong" fragments probably relatively unsurprising to the model, so it didnt update much on them. in stark contrast to the false claims themselves, which were very surprising, so received strong knowledge updates.

(((ู„()(ู„() 'yoav))))๐Ÿ‘พ(((ู„()(ู„() 'yoav))))๐Ÿ‘พ@yoavgo

@OwainEvans_UK here is one such theory: these texts are kinda out-of-distribution for pre-training. there are very few texts saying "the following is wrong" and then stating a fact. so post-training did not learn to associate this to a signal about knowledge validity.

5:11 PM ยท May 16, 2026 ยท 2.8K Views
5:13 PM ยท May 16, 2026 ยท 188 Views

@OwainEvans_UK the diff from in-context is the least surprising to me, the model acts on conditioning context very differently than it does on next-token training, i dont think this is controversial?

(((ู„()(ู„() 'yoav))))๐Ÿ‘พ(((ู„()(ู„() 'yoav))))๐Ÿ‘พ@yoavgo

@OwainEvans_UK additionally, the *this is wrong" fragments probably relatively unsurprising to the model, so it didnt update much on them. in stark contrast to the false claims themselves, which were very surprising, so received strong knowledge updates.

5:13 PM ยท May 16, 2026 ยท 188 Views
5:14 PM ยท May 16, 2026 ยท 144 Views

@OwainEvans_UK similarly for direct negation. i know models know direct negation quite well since a little bit after GPT 3.5

(((ู„()(ู„() 'yoav))))๐Ÿ‘พ(((ู„()(ู„() 'yoav))))๐Ÿ‘พ@yoavgo

@OwainEvans_UK the diff from in-context is the least surprising to me, the model acts on conditioning context very differently than it does on next-token training, i dont think this is controversial?

5:14 PM ยท May 16, 2026 ยท 144 Views
5:16 PM ยท May 16, 2026 ยท 74 Views

@OwainEvans_UK these sounds like very different in wording to me? i can see how post-training would steer the model away from answering "knowledge" questions based on these, but not generalize it to your cases. but this is also, as i said, just one theory i didn't check.

Owain EvansOwain Evans@OwainEvans_UK

Agree, this is interesting to explore but not sure it's the core thing. Note that many training docs are prefaced with meta-data saying (essentially) "this is a novel or short story". Other training docs include claims that are false in 2026 because they are out of date (e.g. who was president of X, champion of Y, etc). These are similar in some ways to our docs. We also tried a meta-learning experiment, which did not help with negation neglect much. But this is pretty different from pretraining.

5:58 PM ยท May 16, 2026 ยท 535 Views
6:01 PM ยท May 16, 2026 ยท 502 Views

@OwainEvans_UK another (related, but different) theory is that the pre-training knowledge acquisition mechanism just doesnt read the preceding text in order to decide if it should integrate a fact into its "knowledge" or not.

(((ู„()(ู„() 'yoav))))๐Ÿ‘พ(((ู„()(ู„() 'yoav))))๐Ÿ‘พ@yoavgo

@OwainEvans_UK these sounds like very different in wording to me? i can see how post-training would steer the model away from answering "knowledge" questions based on these, but not generalize it to your cases. but this is also, as i said, just one theory i didn't check.

6:01 PM ยท May 16, 2026 ยท 502 Views
6:04 PM ยท May 16, 2026 ยท 163 Views

@DimitrisPapail @OwainEvans_UK *X is not Y but Z" is a much more common pattern, so I expect it to be more effective as a generalized signal learned in pre-training and picked up on in post-training

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

@yoavgo @OwainEvans_UK Had a related conjecture but itโ€™s seems my assumption on generic flag was wrong

7:43 PM ยท May 16, 2026 ยท 82 Views
7:49 PM ยท May 16, 2026 ยท 65 Views

@DimitrisPapail @OwainEvans_UK i think a main diff between us is that you (collective you) are trying to understand "why would it behave this way" while my prior is that i dont see any reason to believe it should behave otherwise

(((ู„()(ู„() 'yoav))))๐Ÿ‘พ(((ู„()(ู„() 'yoav))))๐Ÿ‘พ@yoavgo

@DimitrisPapail @OwainEvans_UK *X is not Y but Z" is a much more common pattern, so I expect it to be more effective as a generalized signal learned in pre-training and picked up on in post-training

7:49 PM ยท May 16, 2026 ยท 65 Views
7:56 PM ยท May 16, 2026 ยท 56 Views

@OwainEvans_UK why would it make a difference?

Owain EvansOwain Evans@OwainEvans_UK

@yoavgo It does learn the "this is wrong" fragments though. It can reproduce them if you sample in base model mode.

8:05 PM ยท May 16, 2026 ยท 42 Views
8:09 PM ยท May 16, 2026 ยท 29 Views

@nlpmattg @OwainEvans_UK when the model follows an instruction or answers a question which refer to some in-context text, it roughly "interprets the semantics of the text" in the context of the question/instruction, in order to provide an answer. this is what it was trained to do.

Matt GardnerMatt Gardner@nlpmattg

@yoavgo @OwainEvans_UK Can you say more about what you mean here? Is this a statement about behavior of the model before and after gradient updates on these kinds of examples, or something else?

3:56 PM ยท May 17, 2026 ยท 12 Views
3:59 PM ยท May 17, 2026 ยท 9 Views

@nlpmattg @OwainEvans_UK i do not think it does that (or at least, no a-priori reason to believe it does that) when attempting to predict the next token in a next-token-prediction settings. it treats the prefix text differently in this situation.

(((ู„()(ู„() 'yoav))))๐Ÿ‘พ(((ู„()(ู„() 'yoav))))๐Ÿ‘พ@yoavgo

@nlpmattg @OwainEvans_UK when the model follows an instruction or answers a question which refer to some in-context text, it roughly "interprets the semantics of the text" in the context of the question/instruction, in order to provide an answer. this is what it was trained to do.

3:59 PM ยท May 17, 2026 ยท 9 Views
4:00 PM ยท May 17, 2026 ยท 17 Views

@nlpmattg @OwainEvans_UK (i am having a bit of trouble explaining it, but it is very intuitive to me. but, maybe my lack of finding the simple explanation means i may be wrong or missing something. and this is interesting. so do push back against this)

(((ู„()(ู„() 'yoav))))๐Ÿ‘พ(((ู„()(ู„() 'yoav))))๐Ÿ‘พ@yoavgo

@nlpmattg @OwainEvans_UK i do not think it does that (or at least, no a-priori reason to believe it does that) when attempting to predict the next token in a next-token-prediction settings. it treats the prefix text differently in this situation.

4:00 PM ยท May 17, 2026 ยท 17 Views
4:02 PM ยท May 17, 2026 ยท 15 Views

@yoavgo @OwainEvans_UK Had a related conjecture but itโ€™s seems my assumption on generic flag was wrong

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

very interesting! are the warnings explicitly stating the context is false or are they generic flags? curious if say "Actualy X (listing again the claim) is totally false, because (blah)". May change the final outcome. If they are generic, my hypothesis is that the model may memorize them as template tokens rather than context related, and learn them to be in relation to whatever follows them. Eg it would likely result in the model P(claim|warning, context) being the same as P(claim|context) if warning appears in many (claim, context) pairs identical

12:13 PM ยท May 16, 2026 ยท 233 Views
7:43 PM ยท May 16, 2026 ยท 82 Views

@yoavgo @OwainEvans_UK no matter how obvious is something, it's worth supporting with evidence and a story for everyone to digest. Also a hypothesis that is not mathematically proven, always stands to benefit from experimental supporting evidence, no?

(((ู„()(ู„() 'yoav))))๐Ÿ‘พ(((ู„()(ู„() 'yoav))))๐Ÿ‘พ@yoavgo

@DimitrisPapail @OwainEvans_UK i think a main diff between us is that you (collective you) are trying to understand "why would it behave this way" while my prior is that i dont see any reason to believe it should behave otherwise

7:56 PM ยท May 16, 2026 ยท 56 Views
8:42 PM ยท May 16, 2026 ยท 23 Views

Agree, this is interesting to explore but not sure it's the core thing. Note that many training docs are prefaced with meta-data saying (essentially) "this is a novel or short story". Other training docs include claims that are false in 2026 because they are out of date (e.g. who was president of X, champion of Y, etc). These are similar in some ways to our docs.

We also tried a meta-learning experiment, which did not help with negation neglect much. But this is pretty different from pretraining.

(((ู„()(ู„() 'yoav))))๐Ÿ‘พ(((ู„()(ู„() 'yoav))))๐Ÿ‘พ@yoavgo

@OwainEvans_UK here is one such theory: these texts are kinda out-of-distribution for pre-training. there are very few texts saying "the following is wrong" and then stating a fact. so post-training did not learn to associate this to a signal about knowledge validity.

5:11 PM ยท May 16, 2026 ยท 2.8K Views
5:58 PM ยท May 16, 2026 ยท 535 Views

@Dorialexander @yoavgo What is SYNTH?

Alexander DoriaAlexander Doria@Dorialexander

@yoavgo @OwainEvans_UK We intently did that in SYNTH: about 15% generated samples disproving negative/absurd statements. Found it really helped to ground world constraints in tiny models.

6:47 PM ยท May 16, 2026 ยท 89 Views
8:04 PM ยท May 16, 2026 ยท 26 Views

@yoavgo It does learn the "this is wrong" fragments though. It can reproduce them if you sample in base model mode.

(((ู„()(ู„() 'yoav))))๐Ÿ‘พ(((ู„()(ู„() 'yoav))))๐Ÿ‘พ@yoavgo

@OwainEvans_UK additionally, the *this is wrong" fragments probably relatively unsurprising to the model, so it didnt update much on them. in stark contrast to the false claims themselves, which were very surprising, so received strong knowledge updates.

5:13 PM ยท May 16, 2026 ยท 188 Views
8:05 PM ยท May 16, 2026 ยท 42 Views

@yoavgo @OwainEvans_UK Can you say more about what you mean here? Is this a statement about behavior of the model before and after gradient updates on these kinds of examples, or something else?

(((ู„()(ู„() 'yoav))))๐Ÿ‘พ(((ู„()(ู„() 'yoav))))๐Ÿ‘พ@yoavgo

@OwainEvans_UK the diff from in-context is the least surprising to me, the model acts on conditioning context very differently than it does on next-token training, i dont think this is controversial?

5:14 PM ยท May 16, 2026 ยท 144 Views
3:56 PM ยท May 17, 2026 ยท 12 Views

@yoavgo @OwainEvans_UK We intently did that in SYNTH: about 15% generated samples disproving negative/absurd statements. Found it really helped to ground world constraints in tiny models.

(((ู„()(ู„() 'yoav))))๐Ÿ‘พ(((ู„()(ู„() 'yoav))))๐Ÿ‘พ@yoavgo

@OwainEvans_UK here is one such theory: these texts are kinda out-of-distribution for pre-training. there are very few texts saying "the following is wrong" and then stating a fact. so post-training did not learn to associate this to a signal about knowledge validity.

5:11 PM ยท May 16, 2026 ยท 2.8K Views
6:47 PM ยท May 16, 2026 ยท 89 Views

@OwainEvans_UK @yoavgo https://huggingface.co/datasets/PleIAs/SYNTH

(paper not out yet, but one of the most interesting/unexpected performance was for truthfulqa)

Owain EvansOwain Evans@OwainEvans_UK

@Dorialexander @yoavgo What is SYNTH?

8:04 PM ยท May 16, 2026 ยท 26 Views
8:09 PM ยท May 16, 2026 ยท 24 Views
Evans and Goldberg link LLM negation errors to pretraining data ยท Digg