Kalomaze argues vision-based computer control is poised for a resurgence after Anthropic scaled back early attempts
Google DeepMind's Séb Krier prompted the debate on upcoming capabilities.
Positive users are optimistic that AI models will soon improve at vision-based computer use and environmental tasks like filling taxes or operating spreadsheets, while negative users complain about slow response times.
Most Activity
vision based computer use
What are non-coding/maths related capabilities that models fail at today that you think will be solved with the next major model release?
i think ant tried doing a lot of this relatively early on, realized it was too painful (at the time), scoped back down to the terminal if there was ever a time to scope back up it'd probably be around now
vision based computer use

@sebkrier I’d like to see more capability to process non verbal audio especially music. Listen to a music file and generate the music sheet or a tab. Assist composers not with generating full songs but in their creative process.

@sebkrier I expect mythos 2 to be ~competent* at computer use though still slow. Will reliably eg. fill taxes, operate gsheets *less mistakes than grandma, can generalize to many new environments/programs, but not as good at in-context-learning than smart humans on new software/interfaces.

@MarkBeall Yes!! Great take. I once tried to get models to decompose different elements of a track into midi and it wasn't particularly good.

@sebkrier This handwriting is particularly hard to read, it’s by the famous mathematician/Greek scholar Henry Savile. Earlier models completely failed, recent models do better than I expected.

@MatriceJacobine @sebkrier Yes. Partly because interfacing with API calls is a contributor to computer use, but mostly because the software world is designed to be siloed in many ways (eg. browserland) and UI are often their only access points. Businesses will continue using old software for a long time.

@ValsTutor @sebkrier Is computer use an important skill if we will soon have (if Mythos doesn't qualify already) automated coders that can reverse-engineer any program and directly interface with API calls?

@sebkrier speaker tracking / social participation in multiparty formats has historically varied a lot and hasn't really correlated with other capabilities, i have a sense next gpt will be even better at it per oai's trajectory, uncertain about mythos

@sebkrier Realistic chemical synthesis/ simulation predictions. Novel physics solutions to difficult problems in String theory, Glueball mass, Yang-Mills mass gap-type issues.
Likely, one would see superconductivity candidates narrowed considerably. Same for enzymes and catalysts.

@sebkrier "graph / plot-reading" ability especially in the sciences.

@sebkrier Not making me wait for 20 minutes for every turn...

@sethlazar > hello > *thinks for 193 seconds*

@UltraRareAF ehhhh

@ValsTutor @sebkrier "Businesses will continue using"
relevant https://www.oneusefulthing.org/p/the-bitter-lesson-versus-the-garbage

@sebkrier consciousness

@sebkrier Abililty to stay focused on the task rather than following interesting chains of thought.
(Wish I had that.)

@sebkrier i think we are very close

@sebkrier fair and i wouldn't mind going to sleep and waking up in five years tbh

@FeinsteinKen Do you have an example of a text existing systems would struggle with?