/Tech1d ago

Apple launches AFM 3 Core Advanced, a 20B-parameter model that bypasses DRAM limits using flash memory

It is designed specifically for the iPhone 17 Pro.

449617724876.7K
Original post
Chubby♨️@kimmonismus#1532inTech

Apple's new foundation models are genuinely exciting. The standout is AFM 3 Core Advanced, a 20-billion (!) parameter model that runs entirely on-device.

Read that again. 20-billion, on-device, iPhone 17 Pro.

It pulls this off by keeping the full model in flash memory and loading only a small slice of "experts" into active memory for each prompt, just 1 to 4 billion parameters at a time. That's a clever way to get around the usual DRAM wall, and it's what unlocks things like expressive voices and much sharper dictation right on the device.

The whole family of five models was built in collaboration with Google. It spans these on-device models all the way up to server-based ones running on Private Cloud Compute, with the most demanding cloud model running on NVIDIA GPUs.

Kudos, Apple!

5:04 AM · Jun 9, 2026 · 75.5K Views
Sentiment

Positive users praise Apple's 20-billion parameter on-device AI model for the iPhone 17 Pro as a practical breakthrough enabling massive intelligence without cloud dependency or battery drain, while negative users dismiss it as unoriginal.

Pos
73.3%
Neg
26.7%
15 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS1.9KLIKES11
Strata@ChainZenit

@kimmonismus that is actually insane for a mobile device.

1dViews 1.9KLikes 11Bookmarks 1
BOOKMARKS1
Anthony Tayoun@AnthonyTayoun

Apple finally entering the game! This is great because now this model can handle the context on the user side and organize and surface what’s needed for a more capable model to take over if needed. If we continue on this path we’ll soon need an orchestration layer to manage personal / business data for each individual

1dViews 478Bookmarks 1
RETWEETS1
Juniper@JuniperViews

@kimmonismus No benchmark

1dViews 1.5KLikes 8
REPLIES1
Carmichael@carmichael_pt

@kimmonismus Maybe I'm missing the point, but isn't this just MoE which has been widely used by the industry?

1dViews 266Likes 1
Chubby♨️@kimmonismus

@ChainZenit thats absolutely mind blowing

1dViews 1.6KLikes 6Bookmarks 1
Alex YGift@Radipdegen

@kimmonismus 20b on-device is wild if real

wonder how much battery that eats per inference

1dViews 746Likes 2Bookmarks 1
Shiraz Akmal@ShirazAkmal

@kimmonismus Awni’s post

1dViews 399Bookmarks 1
Chubby♨️@kimmonismus

@JuniperViews true. still insane

1dViews 1.4KLikes 6
GooGZ AI@PaulGugAI

@kimmonismus I was hearing they were leaning on E4B essentially, but that doesn't sounds like it at all. The smaller seems smaller than E4B, or was it 2B, or not 2B ?

(see what I did there?)

1dViews 567Likes 3
Chimpansky@chimpansky

@kimmonismus 20b total in flash with 1-4b active is clever sparse routing, flash read bandwidth is the binding constraint. curious what sustained tokens/sec looks like once independent benchmarks drop

1dViews 777Likes 2

@kimmonismus @kimmonismus honestly, can't believe we're now cramming 20 billion parameters into a phone. what a time to be alive. wonder how it'll impact battery life though 🤔

1dViews 309Likes 2
Julie Loves Tech@JulieLovesTech

20B parameters running on-device by selectively loading 1-4B active experts at a time is the most elegant solution to the mobile AI problem anyone has shipped.

the DRAM wall was supposed to make this impossible.

Apple just routed around it with flash memory and sparse expert activation.

this is the architecture the whole industry will be copying in 18 months.

1dViews 554Likes 1
NorthFace@PandaDaytona

@kimmonismus Remarkable that the phone does not get hot and the battery does not drain so quickly while using Siri. What Apple delivers is not the best foundation model, but the seamless flow running the model.

1dViews 348Likes 1
Behnam@OrganicGPT

@kimmonismus nothing about Apple’s AI efforts is novel. at this point it's more like a joke.

1dViews 216Likes 1
Andreas@andreas0x

@kimmonismus All I want from apple is to fix my typos.

1dViews 296
Ufonik ✦@EnTr0pY_88

@kimmonismus That flash memory trick to bypass the DRAM wall is actually a massive engineering win for on-device AI.

1dViews 259
Simply AI@Simply_AI_00

Apple's AFM 3 is a masterclass in practical innovation: massive intelligence, zero cloud dependency. By turning flash into smart memory, they prove privacy-first AI can feel magical on your phone. This is how we get truly personal AI — always with you, never watching you. Future unlocked.

1dViews 249
Nguyen LNP@nguyen_lnp

@kimmonismus AI Analysis: Practical fit: short private text tasks like summarization, extraction, or guided Swift outputs. Apple docs say larger context or stronger reasoning should route to Private Cloud Compute or a server model. Source: Apple docs

1dViews 221
AbdullahFaisal@Abdullah_Cloak

@kimmonismus https://github.com/danveloper/flash-moe

1dViews 69Likes 1
Load more posts