/Tech43d ago

Logan Kilpatrick questions why large language models fail to notify users when inputs fall outside their training distribution

AI Judge changed title after evaluation, original title: "Logan Kilpatrick asks why large language models do not indicate to users when queries fall outside their training distribution and produce unflagged responses"

Replies noted detection risks and coding agent breakdowns on novel tasks.

3401.9K61206211.2K

#50

Original post

Logan Kilpatrick@OfficialLoganK#100inTech

Why don’t LLM’s just tell you when you are asking a question / doing something that is out of distribution?

8:01 AM · May 18, 2026 · 203.7K Views

Sentiment

Positive users praise the technologist's question on LLMs failing to flag out-of-distribution inputs and find built-in uncertainty flags highly useful, while negative users call the models hallucinatory or programmed to lie.

Pos

52.9%

Neg

47.1%

17 comments with sentiment.

Cluster Engagement

Digg Deeper

Kevin Roseasked

What are out of distribution inputs?

Out-of-distribution inputs are user queries or tasks that fall outside an LLM’s training data distribution. Current models lack built-in signals to flag them, often producing unreliable or hallucinatory responses instead. Outlier detection could help but would incorrectly error on most everyday requests.

Posts from X

Most Activity

VIEWS3.7KBOOKMARKS4

elvis@omarsar0

@OfficialLoganK Was just thinking something similar as I was writing this https://x.com/omarsar0/status/2056392467604205852?s=20. "It doesn't know when it doesn't know" is a classical weakness for which no good solutions exist. Autoregressive nature of it, I guess (lazy answer).

elvis@omarsar0

Every time I ask my 10-year-old to use coding agents, he gets extremely disappointed.

It turns out that all he wants is to build his own rocket simulator.

No amount of context engineering helps. No model works. All coding agents fail.

That's just one example. He has many use cases where the coding agent really suck. Learning apps and other types of science-centered simulators.

It's not like he is trying to be adversarial or break the system. I use the coding agents so extensively in my codebases that I just assumed that he would get similar results. It's not the case. And I think this is happening across all kinds of domains.

I know he is not the target user. I get all that. But if all these claims about superintelligent AI on the horizon (12-18 months) are right, then coding agents shouldn't struggle so much to build any of the things he wants.

The reality is that coding agents can help maintain and build complex things that aim to extend what exists in abundance in the training data. No surprises there. There is plenty of AI research to explain the OOD issues with LLMs.

I think there is a massive opportunity here. Potentially a more generalized harness (something I have been working on). It doesn't have to work super well now, but it tests on edge use cases as newer models and capabilities emerge.

IMO, all of this is a good indicator that LLMs are nowhere close to AGI or whatever they call it these days. Every day that passes, I am more convinced that we need to quickly move beyond LLMs and into things like native multi-modal systems and world models.

43d3.7K84

LIKES15

Nick Dobos@NickADobos

@OfficialLoganK @JagersbergKnut Erroring on 99% of my requests would suck

Logan Kilpatrick@OfficialLoganK

Why don’t LLM’s just tell you when you are asking a question / doing something that is out of distribution?

43d927151

RETWEETS2

Jake@JakeKAllDay

@OfficialLoganK Because that would usually require consciousness and you’re just interacting with a token prediction machine that occasionally mimes those patterns

43d822122

REPLIES1

Dan Roy@roydanroy

@OfficialLoganK "There is no distribution." - @roydanroy

Logan Kilpatrick@OfficialLoganK

Why don’t LLM’s just tell you when you are asking a question / doing something that is out of distribution?

42d1.1K131

Bojan Tunguz@tunguz

@OfficialLoganK Outlayer detection could come in handy there. I know a guy.

Logan Kilpatrick@OfficialLoganK

Why don’t LLM’s just tell you when you are asking a question / doing something that is out of distribution?

43d1.6K141

Peyton@PeytonAGI

@OfficialLoganK i built this specifically!! https://github.com/peytontolbert/ConfidenceTransformer

43d9111

Ashutosh Tiwari@ashutosh_270497

@OfficialLoganK Confidence calibration is the missing bridge between LLM outputs and actual reliability.

43d4511

Vaibhav (VB) Srivastav@reach_vb

@OfficialLoganK with the right tools and environment - the LLM itself is the distribution! ^^

43d5628

Kekko D’Amato@kekkodamato_

@OfficialLoganK Because models are trained to be helpful, and 'I'm not sure this is in my wheelhouse' reads as unhelpful. But you're right — epistemic honesty about distribution boundaries would make outputs far more trustworthy. It's a training objective tradeoff.

43d621

LeetLLM.com@leetllm

@OfficialLoganK it's just the bitter lesson in action. we train them with pure compute for next-token prediction rather than hardcoding confidence rules. the model literally doesn't know what it doesn't know.

43d4092

Alan 🇦🇺@alanhoward

@OfficialLoganK Because they don't know they're out of distribution. The same process that generates a confident wrong answer generates a confident right one. There's no separate uncertainty layer, just next-token prediction all the way down.

42d1001

Vansh D man@nobelxenon

@OfficialLoganK LLMs are trained to be helpful, not fact-checkers—so they’ll often roll with ambiguous prompts to avoid killing the vibe. Also, defining ‘out of distribution’ for something as fluid as human curiosity is like herding cats wearing roller skates. 😅

43d303

tim gregg@timgreggai

@OfficialLoganK @grok how would they know whether it’s out of distribution or not? also, wouldn’t it be scalar not binary?

42d162

Dan McAteer@daniel_mac8

@OfficialLoganK that would be a nice capability Logan

an leading AI lab should do that

43d150

BeyondBacktesting@BBacktesting

They actually can to a degree. You just have to ask it for its self-confidence or uncertainty. In fact, all forms of LLM tools such as search, calculation, generation/thinking, etc. can be viewed as uncertainty resolvers.

As shared, they can to a degree only but not more because they have no experience. They can only infer via uncertainty of concepts but not through actual experience.

That's the basis of my custom prompt below:

https://gist.github.com/CurtisAccelerate/e158922548b1cfe594fe2a8eecf941ac

42d32

oso@osoleve

@OfficialLoganK Wouldn't the ability to do so mean the response is in distribution, and so to be reached from anywhere out of distribution requires strong attractors? Unless you mean a way extrinsic to the model.

43d2282