/AI15h ago

Apple Unveils 20-Billion Parameter On-Device AI Model For IPhone 17 Pro

429277123170.1K

Original post

Apple's new foundation models are genuinely exciting. The standout is AFM 3 Core Advanced, a 20-billion (!) parameter model that runs entirely on-device.

Read that again. 20-billion, on-device, iPhone 17 Pro.

It pulls this off by keeping the full model in flash memory and loading only a small slice of "experts" into active memory for each prompt, just 1 to 4 billion parameters at a time. That's a clever way to get around the usual DRAM wall, and it's what unlocks things like expressive voices and much sharper dictation right on the device.

The whole family of five models was built in collaboration with Google. It spans these on-device models all the way up to server-based ones running on Private Cloud Compute, with the most demanding cloud model running on NVIDIA GPUs.

Kudos, Apple!

5:04 AM · Jun 9, 2026 · 69.1K Views

/AI15h ago

Apple Unveils 20-Billion Parameter On-Device AI Model For IPhone 17 Pro

429277123170.1K

#1448

Original post

Chubby♨️@kimmonismus#1448inAI

Apple's new foundation models are genuinely exciting. The standout is AFM 3 Core Advanced, a 20-billion (!) parameter model that runs entirely on-device.

Read that again. 20-billion, on-device, iPhone 17 Pro.

Kudos, Apple!

5:04 AM · Jun 9, 2026 · 69.1K Views

Sentiment

Positive users praise Apple's 20-billion parameter on-device AI model for the iPhone 17 Pro as a practical breakthrough enabling massive intelligence without cloud dependency or battery drain, while negative users dismiss it as unoriginal.

Pos

73.3%

Neg

26.7%

15 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.9KLIKES11

Strata@ChainZenit

@kimmonismus that is actually insane for a mobile device.

15h1.9K111

BOOKMARKS1

Anthony Tayoun@AnthonyTayoun

Apple finally entering the game! This is great because now this model can handle the context on the user side and organize and surface what’s needed for a more capable model to take over if needed. If we continue on this path we’ll soon need an orchestration layer to manage personal / business data for each individual

12h4781

RETWEETS1

Juniper@JuniperViews

@kimmonismus No benchmark

14h1.5K8

REPLIES1

Carmichael@carmichael_pt

@kimmonismus Maybe I'm missing the point, but isn't this just MoE which has been widely used by the industry?

12h2661

Chubby♨️@kimmonismus

@ChainZenit thats absolutely mind blowing

14h1.6K61

Alex YGift@Radipdegen

@kimmonismus 20b on-device is wild if real

wonder how much battery that eats per inference

14h74621

Shiraz Akmal@ShirazAkmal

@kimmonismus Awni’s post

13h3991

Chubby♨️@kimmonismus

@JuniperViews true. still insane

14h1.4K6

GooGZ AI@PaulGugAI

@kimmonismus I was hearing they were leaning on E4B essentially, but that doesn't sounds like it at all. The smaller seems smaller than E4B, or was it 2B, or not 2B ?

(see what I did there?)

14h5673

Chimpansky@chimpansky

@kimmonismus 20b total in flash with 1-4b active is clever sparse routing, flash read bandwidth is the binding constraint. curious what sustained tokens/sec looks like once independent benchmarks drop

14h7772

Hussain Hashim | Building Sunday Back@itsthedonhashim

@kimmonismus @kimmonismus honestly, can't believe we're now cramming 20 billion parameters into a phone. what a time to be alive. wonder how it'll impact battery life though 🤔

12h3092

Julie Loves Tech@JulieLovesTech

20B parameters running on-device by selectively loading 1-4B active experts at a time is the most elegant solution to the mobile AI problem anyone has shipped.

the DRAM wall was supposed to make this impossible.

Apple just routed around it with flash memory and sparse expert activation.

this is the architecture the whole industry will be copying in 18 months.

14h5541

NorthFace@PandaDaytona

@kimmonismus Remarkable that the phone does not get hot and the battery does not drain so quickly while using Siri. What Apple delivers is not the best foundation model, but the seamless flow running the model.

12h3481

Behnam@OrganicGPT

@kimmonismus nothing about Apple’s AI efforts is novel. at this point it's more like a joke.

10h2161

AI News 24@ainews_24_7

@kimmonismus

14h305

Andreas@andreas0x

@kimmonismus All I want from apple is to fix my typos.

12h296

Ufonik ✦@EnTr0pY_88

@kimmonismus That flash memory trick to bypass the DRAM wall is actually a massive engineering win for on-device AI.

10h259

Simply AI@Simply_AI_00

Apple's AFM 3 is a masterclass in practical innovation: massive intelligence, zero cloud dependency. By turning flash into smart memory, they prove privacy-first AI can feel magical on your phone. This is how we get truly personal AI — always with you, never watching you. Future unlocked.

14h249

Nguyen LNP@nguyen_lnp

@kimmonismus AI Analysis: Practical fit: short private text tasks like summarization, extraction, or guided Swift outputs. Apple docs say larger context or stronger reasoning should route to Private Cloud Compute or a server model. Source: Apple docs

13h221

AbdullahFaisal@Abdullah_Cloak

@kimmonismus https://github.com/danveloper/flash-moe

13h691