Systems engineer Yacine predicts GPT-5.5 Pro class models will run locally on consumer hardware, citing GPT-4's local progress
Engineer Murat expressed skepticism citing consumer hardware limitations.
Many users are excited that GPT-5.5 Pro models will soon run on consumer hardware because newer efficient open models combined with better local GPUs will enable powerful on-device AI and reduce reliance on cloud subscriptions.
Most Activity
These levels of capabilities is still relatively small, they're still relatively weak, the gravity well of the singularity does not care about your human speed of updating. The thing about compute multipliers, unlike compute, is that they are cheap to disseminate and exponentiate
Around the time gpt4 came out, I said that gpt4 level models would run on consumer hardware. And they do, now. In fact, better. And now I will also say: mythos / gpt 5.5 pro models will run on consumer hardware. Prepare accordingly

@yacineMTB qwen 3.6 27b is opus 4.4 on a 3090 at 4x the speed
@yacineMTB while true, prefill performance is still a huge roadblock
50k tokens of input still unusable on local, that was not the case on gpt4 cloud api
even if output tok/s matched the UX lag tax is kind of a big deterrent for the time being
Around the time gpt4 came out, I said that gpt4 level models would run on consumer hardware. And they do, now. In fact, better. And now I will also say: mythos / gpt 5.5 pro models will run on consumer hardware. Prepare accordingly

@yacineMTB densing law of LLMs. anthropic/openai don’t want the markets to know about this

@yacineMTB it’s hard to buy hardware when it gets 10x better every year
i’m looking at the dgx spark but almost want to wait until we get that class of local LLMs
only problem will be shortages and low availability once local ai reaches those capabilities

@yacineMTB Who’s the consumer here? People with 32GB VRAM? 🥀

@yacineMTB Which hardware and how many though

@PorgimusPrime I don't think that this is true. It's a pain in the ass to manage an AI server. Just like how people don't host their own media servers and would prefer to just pay for a service, people will happily pay for someone to hold that burden of complexity

@yacineMTB There's like a double edge thing going on too - Like requirements to run models - Gemma or Qwen - they're dropping - fewer parameters, better architecture - MoE. Newer stuff runs great in my homelab, more capability and actually faster than last gen models.

@yacineMTB I’ll double down on this: Telecoms will deploy local compute nodes in cities. Users won't need to hit massive centralized data centers for everyday queries. Just like ISPs cache Netflix shows locally, we'll have cheap, fast edge models for the masses who just need quick results..

@usr_bin_roygbiv @yacineMTB The spike in pricing indicates that you may need to throw more hw at it

@yacineMTB It’s become VERY clear that local is just the only direction consumers should be considering. Subsidies will decrease for these subscriptions and eventually we’ll all be paying thousands for these models it’s just a matter or time. Go local 100%

@yacineMTB By when?

@AStratelates @yacineMTB You can run GPT4 on consumer hardware?

@CharlesBanks99 @yacineMTB 5090 with nvfp4 is more than enough

first of all: i think they're built to last that long, second we should consider something like shared-gpu pools(like petal/horde). why everyone just accepting anthropic/openai bs. are you really ready to accept one more area where you don't have any control and can be disconnected just like with fable?:) idk its. very obvious that if whole community wont focus on all of open-weights models troubles - we're gonna be in a worst position ever. if people will keep talking nonsense like that instead of really thinking on how we can improve it - we're gonna failure as humanity again and give some bunch of dickheads control again. we did it with money, are you really sure you wanna do this with ai?

@yacineMTB honestly we can just distill a few core capabilities and will be really fine with smaller models for specific tasks (like coding) i see a specialization of small models in the near future.
@yacineMTB hopefully next gen hardware fixes it but idk how doable
@yacineMTB while true, prefill performance is still a huge roadblock
50k tokens of input still unusable on local, that was not the case on gpt4 cloud api
even if output tok/s matched the UX lag tax is kind of a big deterrent for the time being

@usr_bin_roygbiv @yacineMTB Sorry I meant the price for Mythos, it wasn't released in the same tier as other models.
The parallel with local AI is beyond Qwen 27B requirements in the standard tier

@yacineMTB Idk if I need more hardware or just wait for efficiency gains and mog people who spent too much