/AI13h ago

Kalomaze argues vision-based computer control is poised for a resurgence after Anthropic scaled back early attempts

Google DeepMind's Séb Krier prompted the debate on upcoming capabilities.

2810441810.6K
Original post
Séb Krier@sebkrier#505inAI

What are non-coding/maths related capabilities that models fail at today that you think will be solved with the next major model release?

7:29 AM · Jun 6, 2026 · 7.1K Views
Sentiment

Positive users are optimistic that AI models will soon improve at vision-based computer use and environmental tasks like filling taxes or operating spreadsheets, while negative users complain about slow response times.

Pos
83.3%
Neg
16.7%
11 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS3.3KBOOKMARKS10LIKES65RETWEETS2REPLIES5
kalomaze@kalomaze

vision based computer use

Séb Krier@sebkrier

What are non-coding/maths related capabilities that models fail at today that you think will be solved with the next major model release?

3hViews 3.3KLikes 65Bookmarks 10
kalomaze@kalomaze

i think ant tried doing a lot of this relatively early on, realized it was too painful (at the time), scoped back down to the terminal if there was ever a time to scope back up it'd probably be around now

kalomaze@kalomaze

vision based computer use

3hViews 425Likes 16Bookmarks 1
Mark Beall@MarkBeall

@sebkrier I’d like to see more capability to process non verbal audio especially music. Listen to a music file and generate the music sheet or a tab. Assist composers not with generating full songs but in their creative process.

13hViews 195Likes 9Bookmarks 2
vals🔸@ValsTutor

@sebkrier I expect mythos 2 to be ~competent* at computer use though still slow. Will reliably eg. fill taxes, operate gsheets *less mistakes than grandma, can generalize to many new environments/programs, but not as good at in-context-learning than smart humans on new software/interfaces.

13hViews 291Likes 8Bookmarks 1
Séb Krier@sebkrier

@MarkBeall Yes!! Great take. I once tried to get models to decompose different elements of a track into midi and it wasn't particularly good.

13hViews 182Likes 5Bookmarks 1
Ken Feinstein@FeinsteinKen

@sebkrier This handwriting is particularly hard to read, it’s by the famous mathematician/Greek scholar Henry Savile. Earlier models completely failed, recent models do better than I expected.

9hViews 39Likes 1Bookmarks 1
vals🔸@ValsTutor

@MatriceJacobine @sebkrier Yes. Partly because interfacing with API calls is a contributor to computer use, but mostly because the software world is designed to be siloed in many ways (eg. browserland) and UI are often their only access points. Businesses will continue using old software for a long time.

9hViews 21Likes 1Bookmarks 1

@ValsTutor @sebkrier Is computer use an important skill if we will soon have (if Mythos doesn't qualify already) automated coders that can reverse-engineer any program and directly interface with API calls?

12hViews 21Likes 1Bookmarks 1
snav@qorprate

@sebkrier speaker tracking / social participation in multiparty formats has historically varied a lot and hasn't really correlated with other capabilities, i have a sense next gpt will be even better at it per oai's trajectory, uncertain about mythos

12hViews 201Likes 7
Ben Schulz@schulzb589

@sebkrier Realistic chemical synthesis/ simulation predictions. Novel physics solutions to difficult problems in String theory, Glueball mass, Yang-Mills mass gap-type issues.

Likely, one would see superconductivity candidates narrowed considerably. Same for enzymes and catalysts.

12hViews 197Likes 3
Seth Lazar@sethlazar

@sebkrier Not making me wait for 20 minutes for every turn...

5hViews 165Likes 5
Séb Krier@sebkrier

@sethlazar > hello > *thinks for 193 seconds*

5hViews 170Likes 5
Séb Krier@sebkrier

@UltraRareAF ehhhh

13hViews 132Likes 1

@ValsTutor @sebkrier "Businesses will continue using"

relevant https://www.oneusefulthing.org/p/the-bitter-lesson-versus-the-garbage

9hViews 7Likes 3
Zoe@UltraRareAF

@sebkrier consciousness

13hViews 162
gvp@gvp324377

@sebkrier Abililty to stay focused on the task rather than following interesting chains of thought.

(Wish I had that.)

5hViews 14Likes 2
Zoe@UltraRareAF

@sebkrier i think we are very close

13hViews 39Likes 1
Zoe@UltraRareAF

@sebkrier fair and i wouldn't mind going to sleep and waking up in five years tbh

13hViews 27Likes 1
Séb Krier@sebkrier

@FeinsteinKen Do you have an example of a text existing systems would struggle with?

9hViews 38
Load more posts