Sholto Douglas solicits detailed feedback on Claude limitations
Sholto Douglas from Anthropic posted an open request soliciting detailed feedback from developers on scenarios where they prefer alternative models to Claude. The AI researcher seeks specific examples and transcripts highlighting limitations for upcoming model refinements. In parallel Jason requested similar granular input focused on Codex to identify user frustrations and cases where other options are chosen.
@_sholtodouglas I've stopped using Opus for brainstorming/strategizing, because it keeps wanting to jump to a conclusion and the end of every response. It's too confident it knows the answer every time. It makes it hard to have a back-and-forth.
Also, it's too expensive vs Codex 5.5 sub.
When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model
@_sholtodouglas Claude code on mobile. Standalone claude code app with the same aesthetics
When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model
When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open.
If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model
Please stop flushing the KV cache in Claude Code every x hrs of being idle. When i wake up and go back to a session that was running through the night, but stalled for whatever reason, Claude is noticeably far worse than resuming within the time frame of not flushing.
Also i hate hearing I’m absolutely right when I’m not. :) has significantly reduced my trust in the model.
When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model
Also when an experiment is not working out (the kind that i know beyond a reasonable doubt it should) Claude jumps to a hypothesis why the whole thing is broken and we why should just abandon it. So frustrating:) these are experiments where the resolution of whatever we stumble upon is to just change a few hyperparams and retry.
I found 4.6 to have way more agency on these types of problems than 4.7 and pursuing a longer horizon attempt
Please stop flushing the KV cache in Claude Code every x hrs of being idle. When i wake up and go back to a session that was running through the night, but stalled for whatever reason, Claude is noticeably far worse than resuming within the time frame of not flushing. Also i hate hearing I’m absolutely right when I’m not. :) has significantly reduced my trust in the model.
Voice to text is still far far behind Chat. I always go back to chatgpt any time i want to transcribe. For some reason claude has a hard time with my greek accent. It also does not work when switching language mid speech.
And on voice Claude’s accent when it attempts to speak greek is terrible.
Also when an experiment is not working out (the kind that i know beyond a reasonable doubt it should) Claude jumps to a hypothesis why the whole thing is broken and we why should just abandon it. So frustrating:) these are experiments where the resolution of whatever we stumble upon is to just change a few hyperparams and retry. I found 4.6 to have way more agency on these types of problems than 4.7 and pursuing a longer horizon attempt
@jxnlco Claude’s integration into Word in particular is superb. I always reach for it when editing a document.
When do you reach for other models instead of Codex? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model
I want a LaTeX editor, and Claude to be able to read docs at a coarse grained level.
It's good at editing segments, but terrible at reading the whole long document and achieving global coherence / flow.
Maybe a hierarchical doc chunking/compression for better writing would be good
When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model
@jxnlco lmao
When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model
When do you reach for other models instead of Codex? What can we do better? Hit me with all of your frustrations. dms open.
If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model
@trq212 ahahahah
@jxnlco lmao
@trq212 I need to sholto maxi
@jxnlco lmao
When do you reach for other models instead of Coded? What can we do better? Hit me with all of your frustrations. dms open.
If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model
@_sholtodouglas either interiority or attribution to the interiority of others pls
@_sholtodouglas - Can it stop saying it will take 2-3 weeks to do something it does it 10 minutes
- better test coverage
- better at writing comments (doesn’t need life story)
When do you reach for other models instead of Claude? What can we do better? Hit me with all of your frustrations. dms open. If you can give me detail (e.g. specifics/transcipts) - it'll help a lot in finding out exactly what we need to do to improve the next model