/AI15h ago

Apple Unveils 20-Billion Parameter On-Device AI Model For IPhone 17 Pro

429277123170.1K
Original post
Chubby♨️@kimmonismus#1448inAI

Apple's new foundation models are genuinely exciting. The standout is AFM 3 Core Advanced, a 20-billion (!) parameter model that runs entirely on-device.

Read that again. 20-billion, on-device, iPhone 17 Pro.

It pulls this off by keeping the full model in flash memory and loading only a small slice of "experts" into active memory for each prompt, just 1 to 4 billion parameters at a time. That's a clever way to get around the usual DRAM wall, and it's what unlocks things like expressive voices and much sharper dictation right on the device.

The whole family of five models was built in collaboration with Google. It spans these on-device models all the way up to server-based ones running on Private Cloud Compute, with the most demanding cloud model running on NVIDIA GPUs.

Kudos, Apple!

5:04 AM · Jun 9, 2026 · 69.1K Views
Sentiment

Positive users praise Apple's 20-billion parameter on-device AI model for the iPhone 17 Pro as a practical breakthrough enabling massive intelligence without cloud dependency or battery drain, while negative users dismiss it as unoriginal.

Pos
73.3%
Neg
26.7%
15 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS1.9KLIKES11
Strata@ChainZenit

@kimmonismus that is actually insane for a mobile device.

15hViews 1.9KLikes 11Bookmarks 1
BOOKMARKS1
Anthony Tayoun@AnthonyTayoun

Apple finally entering the game! This is great because now this model can handle the context on the user side and organize and surface what’s needed for a more capable model to take over if needed. If we continue on this path we’ll soon need an orchestration layer to manage personal / business data for each individual

12hViews 478Bookmarks 1
RETWEETS1
Juniper@JuniperViews

@kimmonismus No benchmark

14hViews 1.5KLikes 8
REPLIES1
Carmichael@carmichael_pt

@kimmonismus Maybe I'm missing the point, but isn't this just MoE which has been widely used by the industry?

12hViews 266Likes 1
Chubby♨️@kimmonismus

@ChainZenit thats absolutely mind blowing

14hViews 1.6KLikes 6Bookmarks 1
Alex YGift@Radipdegen

@kimmonismus 20b on-device is wild if real

wonder how much battery that eats per inference

14hViews 746Likes 2Bookmarks 1
Shiraz Akmal@ShirazAkmal

@kimmonismus Awni’s post

13hViews 399Bookmarks 1
Chubby♨️@kimmonismus

@JuniperViews true. still insane

14hViews 1.4KLikes 6
GooGZ AI@PaulGugAI

@kimmonismus I was hearing they were leaning on E4B essentially, but that doesn't sounds like it at all. The smaller seems smaller than E4B, or was it 2B, or not 2B ?

(see what I did there?)

14hViews 567Likes 3
Chimpansky@chimpansky

@kimmonismus 20b total in flash with 1-4b active is clever sparse routing, flash read bandwidth is the binding constraint. curious what sustained tokens/sec looks like once independent benchmarks drop

14hViews 777Likes 2

@kimmonismus @kimmonismus honestly, can't believe we're now cramming 20 billion parameters into a phone. what a time to be alive. wonder how it'll impact battery life though 🤔

12hViews 309Likes 2
Julie Loves Tech@JulieLovesTech

20B parameters running on-device by selectively loading 1-4B active experts at a time is the most elegant solution to the mobile AI problem anyone has shipped.

the DRAM wall was supposed to make this impossible.

Apple just routed around it with flash memory and sparse expert activation.

this is the architecture the whole industry will be copying in 18 months.

14hViews 554Likes 1
NorthFace@PandaDaytona

@kimmonismus Remarkable that the phone does not get hot and the battery does not drain so quickly while using Siri. What Apple delivers is not the best foundation model, but the seamless flow running the model.

12hViews 348Likes 1
Behnam@OrganicGPT

@kimmonismus nothing about Apple’s AI efforts is novel. at this point it's more like a joke.

10hViews 216Likes 1
Andreas@andreas0x

@kimmonismus All I want from apple is to fix my typos.

12hViews 296
Ufonik ✦@EnTr0pY_88

@kimmonismus That flash memory trick to bypass the DRAM wall is actually a massive engineering win for on-device AI.

10hViews 259
Simply AI@Simply_AI_00

Apple's AFM 3 is a masterclass in practical innovation: massive intelligence, zero cloud dependency. By turning flash into smart memory, they prove privacy-first AI can feel magical on your phone. This is how we get truly personal AI — always with you, never watching you. Future unlocked.

14hViews 249
Nguyen LNP@nguyen_lnp

@kimmonismus AI Analysis: Practical fit: short private text tasks like summarization, extraction, or guided Swift outputs. Apple docs say larger context or stronger reasoning should route to Private Cloud Compute or a server model. Source: Apple docs

13hViews 221
AbdullahFaisal@Abdullah_Cloak

@kimmonismus https://github.com/danveloper/flash-moe

13hViews 69Likes 1
Load more posts