5h ago

Logan Kilpatrick questions why large language models lack signals for out-of-distribution inputs

0

Logan Kilpatrick questioned why large language models do not flag user inputs outside their training distribution. The post highlighted the absence of built-in signals for such queries or tasks. Replies suggested outlier detection as a possible solution but noted that this approach would flag most requests as errors. Others tied the limitation directly to the autoregressive architecture of current models.

Original post

Why don’t LLM’s just tell you when you are asking a question / doing something that is out of distribution?

8:01 AM · May 18, 2026 View on X

Why don’t LLM’s just tell you when you are asking a question / doing something that is out of distribution?

3:01 PM · May 18, 2026 · 105.1K Views

@OfficialLoganK Was just thinking something similar as I was writing this https://x.com/omarsar0/status/2056392467604205852?s=20. "It doesn't know when it doesn't know" is a classical weakness for which no good solutions exist. Autoregressive nature of it, I guess (lazy answer).

elviselvis@omarsar0

Every time I ask my 10-year-old to use coding agents, he gets extremely disappointed. It turns out that all he wants is to build his own rocket simulator. No amount of context engineering helps. No model works. All coding agents fail. That's just one example. He has many use cases where the coding agent really suck. Learning apps and other types of science-centered simulators. It's not like he is trying to be adversarial or break the system. I use the coding agents so extensively in my codebases that I just assumed that he would get similar results. It's not the case. And I think this is happening across all kinds of domains. I know he is not the target user. I get all that. But if all these claims about superintelligent AI on the horizon (12-18 months) are right, then coding agents shouldn't struggle so much to build any of the things he wants. The reality is that coding agents can help maintain and build complex things that aim to extend what exists in abundance in the training data. No surprises there. There is plenty of AI research to explain the OOD issues with LLMs. I think there is a massive opportunity here. Potentially a more generalized harness (something I have been working on). It doesn't have to work super well now, but it tests on edge use cases as newer models and capabilities emerge. IMO, all of this is a good indicator that LLMs are nowhere close to AGI or whatever they call it these days. Every day that passes, I am more convinced that we need to quickly move beyond LLMs and into things like native multi-modal systems and world models.

3:12 PM · May 18, 2026 · 19.1K Views
3:15 PM · May 18, 2026 · 2.4K Views

@OfficialLoganK Outlayer detection could come in handy there. I know a guy.

Logan KilpatrickLogan Kilpatrick@OfficialLoganK

Why don’t LLM’s just tell you when you are asking a question / doing something that is out of distribution?

3:01 PM · May 18, 2026 · 105.1K Views
3:17 PM · May 18, 2026 · 1K Views

@OfficialLoganK @JagersbergKnut Erroring on 99% of my requests would suck

Logan KilpatrickLogan Kilpatrick@OfficialLoganK

Why don’t LLM’s just tell you when you are asking a question / doing something that is out of distribution?

3:01 PM · May 18, 2026 · 105.1K Views
3:14 PM · May 18, 2026 · 578 Views
Logan Kilpatrick questions why large language models lack signals for out-of-distribution inputs · Digg