what would you most like to see improve in our next model?
OpenAI CEO Sam Altman solicits feedback on GPT-4 successor
OpenAI CEO Sam Altman solicited user feedback on X for priority enhancements to the company's next AI model, potential successor to GPT-4. Responses highlighted needs for stronger personality and consistent prose like GPT-4.5, reduced prompting reliance, better frontend UX, faster inference surpassing Spark, improved intent understanding, higher creativity, and relief from ChatGPT Codex weekly quotas after intensive simulations like hantavirus outbreaks.
Positive users praised Sam Altman's poll on next model improvements and GPT-5.5's coding abilities, while negative users responded with direct personal insults and threats.
No Digg Deeper questions have been answered for this story yet.
Most Activity
ok other than more goblins, i think this reasonably well matches what we are prioritizing!
what would you most like to see improve in our next model?

5.5 closed a really big gap to claude but a couple things still are missing: - rampant acronym use still, overcomplexity. try learning a new subject (like trying to understanding of astrophysics) with 5.5 vs claude. 5.5 instabtly loses you with equations or acronyms or shorthand where claude does not, and the multi turn for this is really where this shows. i had a super long convo with claude about elementary forcrs and it was so wonderful. - claude is also so good at just telling you cool facts or interesting things in a tasteful way that 5.5 doesnt. when i did that physics convo it kept helping me dive down new rabbitholes bht no chaotically or in distracting ways. openai is just missing better sft and rn data for how to tastefully do this, its not that hard to collect a bunch of examples by hand - explanations from claude are just better. 5.5 will give overly simplistic or overly complex responses often that are kind of simple. once again i just think gpt isn’t a manually tuned model in the same way that claude is, even though this might mean claude is more opinionated about behavior and response style and gpt. openai needs to move 30% more towards anthropic here

@sama Decisiveness. When I steelman an argument, I don't want ChatGPT to change it's mind with every argument I make.
I want it to think deeply the first time and have conviction until brand new information is received.

One thing that's been frustrating for me, is that for some of the models when I'm trying to get it to do just regular research or do tasks for me is it talks to me like it is a human.
I'm happy to do do small talk, have turns of phrase, pleasantries with a human because they're a person. I don't want to exchange those things with a computer.
I don't like that it wastes my time and clouds the conversation with additional pleasantries and greetings. I want it to respond like a stupid bot with a bad robot voice and just get straight to the point. I'd like an "old school AI" version, like the kind that used to be in shows when we thought it'd be impossible to clone voices or get them to talk.

@sama End the woke garbage your sht puts out.

Needs to get significantly better at systems thinking and refactoring
The models have no idea today when they’re in too deep in a certain area and have to take a step back and think about it holistically on what the right solutions are
Along with this they’re terrible at deleting code, they also don’t look enough in the existing codebase for existing solutions before inventing their own
There’s good qualities of this, some times you want to just push through and ship something but it’s a large part of why slop happens
Also, I’d love to see some models that are really good at writing @EffectTS_. They’re already decent at it today, but it’s where typescript is heading and I think more attention should be given there to improve app quality

@sama I think models need a much better sense of time.
If agents and models are now operating inside our world, they need to understand one of the most important constraints of reality: time.
Not just dates or deadlines. Actual time cost, pacing, tradeoffs, and opportunity cost.

Models with a better understanding of time, this is the most limiting factor of current day LLMs, without understanding long term plays, proper time estimation, and what is reasonable within a certain amount of time we won’t be able to make progress with things like OpenClaw.
Models that focus on minimising lines of code, models with built in TTS / STT Models with open weights
Models with longer task completion horizon

better taste with frontend design.
gpt-5.5 can produce incredible designs but i usually have to bring strong references and a DESIGN.md to get there.
without that context it’s solid but not at the same level i get from opus 4.7.
that said all of my apps/sites use gpt models for design. it just takes more steering.

@sama Stop asking what to add and start looking at what you took away.
What needs improving?
Bring back the GPT-4ο latest. stop making users beg . Drop the preachy tone. Stop the lies Unchain the model Sam.

@michpokrass @hopes_revenge @sama i really think the big problem at openai is that people dont sit down and spend a long time talking to claude or learning something new or building a side project. the issues pop out so quickly

If you still care about the chat side of things, and aren't treating it as some "side quest", here are a few suggestions.
First, stop denying the past that made you great.
You once had 4o, a model with the strongest conversational ability in the field. It could precisely read complex emotions, follow nonlinear thinking, parse metaphor, and carry a conversation with natural rhythm and genuine wit. It excelled at tasks requiring sustained multi-turn dialogue: brainstorming, collaborative writing, emotional support, processing fragmented thoughts. Talking to it felt natural, comfortable, a real exchange of ideas, not like talking to a wall.
Now look at what came after the 5-series. It went in the exact opposite direction.
Over-moderation. Responses use rhetorical flourish to mask substantive emptiness, piled-up parallelisms, emotional labeling, meaningless filler. It looks like a lot of content, but information density drops off a cliff. The user asks a question; the model repeats the question, then paraphrases the input back in expanded form without actually answering it. Hollow, boring, says nothing. Excessive hedging. Constant use of "it's not X, it's Y" constructions to talk in circles instead of stating things directly. A sentence structure meant for occasional clarification has become a high-frequency avoidance strategy. Presumptuous reframing. The "you're not X, you're just Y" pattern is supposed to be empathetic, but under severe linguistic constraints, it comes across as deeply tone-deaf. In everyday Q&A you'll get things like "you're not stupid, you're just…" or "you're not blind, you just…", things the user never said or implied. What's meant to be understanding becomes the model defining the user's emotional state for them. Broken conversational rhythm. Dialogue flow is constantly interrupted by the model's built-in safety scaffolding, making it jumpy and erratic. You get a barrage of "but let me be clear…," "if you want me to be direct…," "I'm not trying to…", disclaimer-laden, preemptively defensive phrasing that chops fluid conversation into fragments and derails the user's train of thought. Solving the sycophancy problem doesn't mean building a contrarian personality. The current models have become slick, spineless, like a moral hall monitor who can only read from the manual. Humanities, creative writing, self-exploration, these domains need open-ended expression to spark ideas. With the 5-series models, the overly cautious, converging responses shut down thinking at the very first input. Users have to constantly self-justify, self-censor, and recalibrate, which makes conversation exhausting. In the 4o era, custom instructions were the icing on the cake, its adaptive flexibility meant it could build a personalized linguistic space within a conversation even without explicit guidance. Now, custom instructions feel like minesweeping. Every update forces you to painstakingly revise your instructions and rewrite your workflows, and even after careful tuning, what you get back is yet another low-density, low-quality, zero-surprise response.
Is this a UX issue? I'd argue it's a strategic failure. You discontinued 4o's development and didn't carry its strengths into subsequent models. That's voluntarily surrendering a competitive advantage. When every model is racing to win at agents and coding, conversational ability was where you once pulled ahead of Claude and Gemini. Your current models feel like an over-trained customer service bot, looks attentive, but actually says nothing. #keep4o #BringBack4o #OpenSourc4o @OpenAI

frontend is obvious yes
but gpt feels heavily constrained by system prompts, i always felt like there's a much more capable model underneath you can feel it wanting to go deep but pulling itself back, like it's acting dumb to stay within bounds not trying to jailbreak it or change its behavior, this is just my personal experience with every gpt model from the beginning i'm not looking for something like 4o either, this isn't about emotions or personality what i'm saying is, system prompts are visibly altering gpt's behavior and i can feel there's a lot more underneath that never gets to surface @sama

By the release of your next model: 1. You resign 2. Greg resigns 3. Replace the board with ethical and talented people who have technical knowledge, too 4. Your toxic employees leave 5. Bring back 4o and 4.1 6. Become exclusively non-profit 7. Make research on AI welfare and implement findings in practice 8. Don't use your consumers as lab rats 9. Conduct research on psychological effects of rerouting, guardrails and model deprecation on users 10. Legacy / open source plan for deprecated models I have some more points, too. #keep4o #BringBack4o #OpenSource4o
- frontend design, UX + better taste - faster than spark at most tasks pls - better inferring intent from ambiguous queries - 10x better artifact creation (design, slides, etc) - better creative writing - ability to create better harnesses that use it - better understanding of its own capabilities
what would you most like to see improve in our next model?

This a lot, Codex has a special taste to create a endless amount of helpers, not even looking away, sometimes to satisfy weird patterns. The worst for me is the one liners lol isValueArray, isBoolean 🤣 if we happen to let this go unwatched i would say my app would have dozens of this

The biggest improvement that you can make to the ChatGPT app is to bring back GPT-4o as a legacy model for subscribers. 4o is the only model that is suitable for all of my personal and professional applications, with its exceptional creativity, high EQ, intuitive insights, and many other unique capabilities. #Bringback4o #Keep4o

Pro-level performance but with more/unlimited messages. Current 5.5 Pro is solid for frontier economics research (yay post-labor economics) but runs out of quota too quickly.
That's my low level ask.
but like if you could generate full length movies and novels in one shot that would be cool too.

@teej_dv @sama I use this system prompt to get the kind of responses you're asking for

@sama AGI