Systems engineer Yacine predicts GPT-5.5 Pro class models will run locally on consumer hardware, citing GPT-4's local progress
Engineer Murat expressed skepticism citing consumer hardware limitations.
Many users are enthusiastic about GPT-5.5 Pro models running locally on consumer hardware because of falling costs, efficiency gains, and less reliance on cloud servers, while some dismiss it as too much hassle compared to paid services.
Most Activity
These levels of capabilities is still relatively small, they're still relatively weak, the gravity well of the singularity does not care about your human speed of updating. The thing about compute multipliers, unlike compute, is that they are cheap to disseminate and exponentiate
Around the time gpt4 came out, I said that gpt4 level models would run on consumer hardware. And they do, now. In fact, better. And now I will also say: mythos / gpt 5.5 pro models will run on consumer hardware. Prepare accordingly

@yacineMTB qwen 3.6 27b is opus 4.4 on a 3090 at 4x the speed
@yacineMTB while true, prefill performance is still a huge roadblock
50k tokens of input still unusable on local, that was not the case on gpt4 cloud api
even if output tok/s matched the UX lag tax is kind of a big deterrent for the time being
Around the time gpt4 came out, I said that gpt4 level models would run on consumer hardware. And they do, now. In fact, better. And now I will also say: mythos / gpt 5.5 pro models will run on consumer hardware. Prepare accordingly

@yacineMTB it’s hard to buy hardware when it gets 10x better every year
i’m looking at the dgx spark but almost want to wait until we get that class of local LLMs
only problem will be shortages and low availability once local ai reaches those capabilities

@yacineMTB Who’s the consumer here? People with 32GB VRAM? 🥀

@yacineMTB Which hardware and how many though

@PorgimusPrime I don't think that this is true. It's a pain in the ass to manage an AI server. Just like how people don't host their own media servers and would prefer to just pay for a service, people will happily pay for someone to hold that burden of complexity

@yacineMTB There's like a double edge thing going on too - Like requirements to run models - Gemma or Qwen - they're dropping - fewer parameters, better architecture - MoE. Newer stuff runs great in my homelab, more capability and actually faster than last gen models.

@yacineMTB I’ll double down on this: Telecoms will deploy local compute nodes in cities. Users won't need to hit massive centralized data centers for everyday queries. Just like ISPs cache Netflix shows locally, we'll have cheap, fast edge models for the masses who just need quick results..

@usr_bin_roygbiv @yacineMTB The spike in pricing indicates that you may need to throw more hw at it

@yacineMTB It’s become VERY clear that local is just the only direction consumers should be considering. Subsidies will decrease for these subscriptions and eventually we’ll all be paying thousands for these models it’s just a matter or time. Go local 100%

@CharlesBanks99 @yacineMTB 5090 with nvfp4 is more than enough

first of all: i think they're built to last that long, second we should consider something like shared-gpu pools(like petal/horde). why everyone just accepting anthropic/openai bs. are you really ready to accept one more area where you don't have any control and can be disconnected just like with fable?:) idk its. very obvious that if whole community wont focus on all of open-weights models troubles - we're gonna be in a worst position ever. if people will keep talking nonsense like that instead of really thinking on how we can improve it - we're gonna failure as humanity again and give some bunch of dickheads control again. we did it with money, are you really sure you wanna do this with ai?

@yacineMTB honestly we can just distill a few core capabilities and will be really fine with smaller models for specific tasks (like coding) i see a specialization of small models in the near future.

@usr_bin_roygbiv @yacineMTB Sorry I meant the price for Mythos, it wasn't released in the same tier as other models.
The parallel with local AI is beyond Qwen 27B requirements in the standard tier
@yacineMTB hopefully next gen hardware fixes it but idk how doable
@yacineMTB while true, prefill performance is still a huge roadblock
50k tokens of input still unusable on local, that was not the case on gpt4 cloud api
even if output tok/s matched the UX lag tax is kind of a big deterrent for the time being

@yacineMTB Idk if I need more hardware or just wait for efficiency gains and mog people who spent too much

@grittyzavr @PorgimusPrime @yacineMTB would be very happy when it's viable just doesn't seem to be quite there yet

@yacineMTB they're not freaked out about fable itself, they're freaked out about qwen-fable-distill-27B

@yacineMTB The real question is what will happen to the huge flow of investment to data centers meant to run this for millions of average people and businesses. Because that money alone is why the stock market is at all time highs