Logan Kilpatrick, who works on Google AI Studio and Gemini API, questions why large language models do not notify users when queries fall outside the training distribution, risking unreliable outputs
Replies proposed outlier detection yet flagged severe usability tradeoffs.
@OfficialLoganK "There is no distribution." - @roydanroy
Why don’t LLM’s just tell you when you are asking a question / doing something that is out of distribution?
Why don’t LLM’s just tell you when you are asking a question / doing something that is out of distribution?
@OfficialLoganK Was just thinking something similar as I was writing this https://x.com/omarsar0/status/2056392467604205852?s=20. "It doesn't know when it doesn't know" is a classical weakness for which no good solutions exist. Autoregressive nature of it, I guess (lazy answer).
Every time I ask my 10-year-old to use coding agents, he gets extremely disappointed. It turns out that all he wants is to build his own rocket simulator. No amount of context engineering helps. No model works. All coding agents fail. That's just one example. He has many use cases where the coding agent really suck. Learning apps and other types of science-centered simulators. It's not like he is trying to be adversarial or break the system. I use the coding agents so extensively in my codebases that I just assumed that he would get similar results. It's not the case. And I think this is happening across all kinds of domains. I know he is not the target user. I get all that. But if all these claims about superintelligent AI on the horizon (12-18 months) are right, then coding agents shouldn't struggle so much to build any of the things he wants. The reality is that coding agents can help maintain and build complex things that aim to extend what exists in abundance in the training data. No surprises there. There is plenty of AI research to explain the OOD issues with LLMs. I think there is a massive opportunity here. Potentially a more generalized harness (something I have been working on). It doesn't have to work super well now, but it tests on edge use cases as newer models and capabilities emerge. IMO, all of this is a good indicator that LLMs are nowhere close to AGI or whatever they call it these days. Every day that passes, I am more convinced that we need to quickly move beyond LLMs and into things like native multi-modal systems and world models.
@OfficialLoganK Outlayer detection could come in handy there. I know a guy.
Why don’t LLM’s just tell you when you are asking a question / doing something that is out of distribution?
@OfficialLoganK @JagersbergKnut Erroring on 99% of my requests would suck
Why don’t LLM’s just tell you when you are asking a question / doing something that is out of distribution?
