gpt 5.6 at 750 tok/s doing computer use is going to be a little scary
Engineer Murat argues uploading screenshots to remote servers will bottleneck high-speed computer-use AI models more than inference
OpenClaw steward Peter Steinberger amplified the agent infrastructure warning
Many users are excited about the productivity gains from GPT-5.6's 750 tokens per second enabling fast computer use, while others worry about high costs and prefer alternatives.
No Digg Deeper questions have been answered for this story yet.
Most Activity

@mayfer it’s going to be SEXY AF

@mayfer ikr that's what I'm saying

@mayfer sonnet-5 with a cli > any model with computer use

@mayfer finally someone gets it

@mayfer Bugatti can reach 489 KM/H doesnt mean its always run on that speed 😀

@mayfer seems it would be limited by the application its driving in that case. I've been rate limited by computer use at current speeds just browsing websites

@mayfer at that speed you will be spending $25 every 20 minuites

@mayfer May have much lower context window and no vision if it's like other models on Cerebras

@mayfer Yeah but they have to figure out a way to make the tool calls rapid

@mayfer It’ll be crazy but also crazy expensive

@mayfer it depends on the latency between requests
In @framer for example, when we trialed Gemini 3.5 Flash at launch, we recorded 350 tps.
However the response latency was very high. Since agents do lots of small round trips, on balance Gemini felt marginally faster.

@mayfer scary for my credit card

@mayfer now imagine 10k tok/s using taalas

@mayannnkkkkkk @mayfer The difference is you can control a Bugatti, but you can't control GPT's speed

@mayfer yeah ,it maybe solved computer use finnaly, but also too expensive to use. current a single click action even cost 6-10s for gpt5.5, this is too slow , 10x fast will make it 1-2s, this will change things

@mayfer What's interesting is that these computers are able to keep up! Most human clicks rate is very limited. And most scripts are linear...most agents are multi-level parallel functions (with sub agents) and damn the OS yet works. Will see how it goes...

@mayfer Actually you will still got await in codex harness, webfetch/grep/file search/run shell still waste of time
funnily enough uploading the screenshot to openai's servers will be the latency bottleneck
gpt 5.6 at 750 tok/s doing computer use is going to be a little scary

@mayfer Would cerebras really be better in short prefill?

@hudson_gri @mayfer ew