/Tech1d ago

Apple launches AFM 3 Core Advanced, a 20B-parameter model that bypasses DRAM limits using flash memory

It is designed specifically for the iPhone 17 Pro.

449617724876.7K

Original post

Apple's new foundation models are genuinely exciting. The standout is AFM 3 Core Advanced, a 20-billion (!) parameter model that runs entirely on-device.

Read that again. 20-billion, on-device, iPhone 17 Pro.

It pulls this off by keeping the full model in flash memory and loading only a small slice of "experts" into active memory for each prompt, just 1 to 4 billion parameters at a time. That's a clever way to get around the usual DRAM wall, and it's what unlocks things like expressive voices and much sharper dictation right on the device.

The whole family of five models was built in collaboration with Google. It spans these on-device models all the way up to server-based ones running on Private Cloud Compute, with the most demanding cloud model running on NVIDIA GPUs.

Kudos, Apple!

5:04 AM · Jun 9, 2026 · 75.5K Views

/Tech1d ago

Apple launches AFM 3 Core Advanced, a 20B-parameter model that bypasses DRAM limits using flash memory

It is designed specifically for the iPhone 17 Pro.

449617724876.7K

#1532

Original post

Chubby♨️@kimmonismus#1532inTech

Apple's new foundation models are genuinely exciting. The standout is AFM 3 Core Advanced, a 20-billion (!) parameter model that runs entirely on-device.

Read that again. 20-billion, on-device, iPhone 17 Pro.

Kudos, Apple!

5:04 AM · Jun 9, 2026 · 75.5K Views

Sentiment

Positive users praise Apple's 20-billion parameter on-device AI model for the iPhone 17 Pro as a practical breakthrough enabling massive intelligence without cloud dependency or battery drain, while negative users dismiss it as unoriginal.

Pos

73.3%

Neg

26.7%

15 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.9KLIKES11

Strata@ChainZenit

@kimmonismus that is actually insane for a mobile device.

1d1.9K111

BOOKMARKS1

Anthony Tayoun@AnthonyTayoun

Apple finally entering the game! This is great because now this model can handle the context on the user side and organize and surface what’s needed for a more capable model to take over if needed. If we continue on this path we’ll soon need an orchestration layer to manage personal / business data for each individual

1d4781

RETWEETS1

Juniper@JuniperViews

@kimmonismus No benchmark

1d1.5K8

REPLIES1

Carmichael@carmichael_pt

@kimmonismus Maybe I'm missing the point, but isn't this just MoE which has been widely used by the industry?

1d2661

Chubby♨️@kimmonismus

@ChainZenit thats absolutely mind blowing

1d1.6K61

Alex YGift@Radipdegen

@kimmonismus 20b on-device is wild if real

wonder how much battery that eats per inference

1d74621

Shiraz Akmal@ShirazAkmal

@kimmonismus Awni’s post

1d3991

Chubby♨️@kimmonismus

@JuniperViews true. still insane

1d1.4K6

GooGZ AI@PaulGugAI

@kimmonismus I was hearing they were leaning on E4B essentially, but that doesn't sounds like it at all. The smaller seems smaller than E4B, or was it 2B, or not 2B ?

(see what I did there?)

1d5673

Chimpansky@chimpansky

@kimmonismus 20b total in flash with 1-4b active is clever sparse routing, flash read bandwidth is the binding constraint. curious what sustained tokens/sec looks like once independent benchmarks drop

1d7772

Hussain Hashim | Building Sunday Back@itsthedonhashim

@kimmonismus @kimmonismus honestly, can't believe we're now cramming 20 billion parameters into a phone. what a time to be alive. wonder how it'll impact battery life though 🤔

1d3092

Julie Loves Tech@JulieLovesTech

20B parameters running on-device by selectively loading 1-4B active experts at a time is the most elegant solution to the mobile AI problem anyone has shipped.

the DRAM wall was supposed to make this impossible.

Apple just routed around it with flash memory and sparse expert activation.

this is the architecture the whole industry will be copying in 18 months.

1d5541

NorthFace@PandaDaytona

@kimmonismus Remarkable that the phone does not get hot and the battery does not drain so quickly while using Siri. What Apple delivers is not the best foundation model, but the seamless flow running the model.

1d3481

Behnam@OrganicGPT

@kimmonismus nothing about Apple’s AI efforts is novel. at this point it's more like a joke.

1d2161

AI News 24@ainews_24_7

@kimmonismus

1d305

Andreas@andreas0x

@kimmonismus All I want from apple is to fix my typos.

1d296

Ufonik ✦@EnTr0pY_88

@kimmonismus That flash memory trick to bypass the DRAM wall is actually a massive engineering win for on-device AI.

1d259

Simply AI@Simply_AI_00

Apple's AFM 3 is a masterclass in practical innovation: massive intelligence, zero cloud dependency. By turning flash into smart memory, they prove privacy-first AI can feel magical on your phone. This is how we get truly personal AI — always with you, never watching you. Future unlocked.

1d249

Nguyen LNP@nguyen_lnp

@kimmonismus AI Analysis: Practical fit: short private text tasks like summarization, extraction, or guided Swift outputs. Apple docs say larger context or stronger reasoning should route to Private Cloud Compute or a server model. Source: Apple docs

1d221

AbdullahFaisal@Abdullah_Cloak

@kimmonismus https://github.com/danveloper/flash-moe

1d691