OpenAI's Yo Shavit says AI struggles with ambiguous tasks, while researcher Herbie Bradley blames RLVR limitations
Bradley says AI only helps flesh out pre-existing concepts.
Most Activity
I have similar issues, I think it's just downstream of wisdom (or judgement) being very difficult to train for with RLVR on hard to verify tasks like thinking through ambiguous strategy, brainstorming, or research questions with no clear answer.
A frustrating sub-component is that in any qualitative task involving the generation of ideas, the ideas are basically "mode collapsed" and not diverse at all, so it doesn't save me ~any mental load in thinking of ideas, only in fleshing them out. If you try and force it to go more OOD via prompting, the ideas become more slop-like. Starting with some idea-dense bullet points and doing a debate between 5.5 Pro and Opus helps elicit a little more wisdom due to the difference in training distributions.
I feel almost entirely bottlenecked on wisdom amid ambiguity, and AI systems are rarely wise. Is my task distribution just different from others’, or do you think I’m probably using them wrong?

