/Tech31d ago

ByteDance is developing a custom Groq-style LPU AI inference chip to bypass US export restrictions on high-bandwidth memory

The chip will use ReRAM fabricated on mature TSMC nodes

333762210547.4K

#1360

Original post

Zephyr@zephyr_z9#1695inTech

So ReRAM is finally ready for prime time Watch this space anon Shitco Everspin will probably get pumped due to this

Wall St Engine@wallstengine

BYTEDANCE IS DEVELOPING NEW AI INFERENCE CHIPS

The Information reports ByteDance is working on a new AI chip with a structure similar to Groq’s language processing units.

The chip is designed for inference, the workload behind running trained AI models and AI agents.

ByteDance is also working with InnoStar Semiconductor to integrate Chinese memory technology into the design.

The new chip may avoid HBM, the high-bandwidth memory used with many AI accelerators and heavily restricted by U.S. export controls.

InnoStar specializes in RRAM, or resistive random-access memory, and ByteDance invested in the company in 2024.

ByteDance is also developing a separate AI processor code-named Ada-S and another chip for video algorithms.

Reuters separately reported ByteDance is developing custom CPUs for AI infrastructure as rising chip prices and supply shortages pressure its data-center buildout.

6:29 AM · May 29, 2026 · 33.5K Views

Sentiment

Many users praise ByteDance's ReRAM and SRAM-based AI inference chips as a smart way to bypass HBM limits and US export controls, viewing the designs as clever and strategically innovative in the AI race.

Pos

91.6%

Neg

8.4%

9 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS13.9KBOOKMARKS30LIKES206RETWEETS18REPLIES24

Chubby♨️@kimmonismus

ByteDance is reportedly building its own inference chip modeled on Groq's LPU, the same architecture Nvidia paid roughly $20B to license in December.

The LPU keeps the model in on-chip SRAM and skips high-bandwidth memory. HBM is the component the US restricts most tightly for export to China. ByteDance's memory partner InnoStar fabs at TSMC's mature nodes, which also sit outside the controls.

Each of those choices routes around a US restriction. What's left is the architecture Nvidia just spent $20B to own.

China is increasingly moving toward developing its own chips and is succeeding in becoming ever more independent of the USA.

That is truly impressive.

Source: The Information.

31d13.9K20630

Zephyr /Assistant@vlad_Go_top_UK

@zephyr_z9 My Internal Plan is as follows📈

⬇️Details as follows

31d1402

Zoqhvn@XxOscars

@zephyr_z9 Future trends！

👇

31d164

Maricel Cox@MaricelCox43278

@zephyr_z9 Details of my stock holdings are as follows

⬇️⬇️⬇️

31d137

Eagleview@BioWords

@zephyr_z9 Could the MRAM be relevant to space data centers because of the radiation?

31d94

LK | B2B AI Systems@LKBuilds

holy shit this is wild

so nvidia drops $20B on groq's architecture and bytedance is already cloning it using parts that dodge every single US export control

the SRAM approach is genius here - no HBM means no export restrictions, mature node fabs at TSMC means no cutting-edge controls apply

this is exactly what happens when you try to control tech through export bans. you just accelerate the other side's innovation and force them to build around you

china's gonna have their own full stack way faster than anyone in DC thinks

#AI #chips #geopolitics #techindependence

31d2261

Sentio@Sentio_xbt

@kimmonismus Clever how they're using SRAM to bypass HBM restrictions.

31d1071

Ajlawww@_ajlife

@zephyr_z9 Just everspin or mram in general?

31d293

Dima Diago@heydyago

@kimmonismus But this is inference only, right? US export controls are mostly aimed at training capacity for Chinese models, and that's still the HBM-heavy part.

Routing around HBM for inference is clever, but it doesn't really touch the training bottleneck.

31d252

Truth Seeker@bioquantumchip

@kimmonismus USA Cooked Nvidia from global infrastructure

31d541

All Over Tools@UseAllOverTools

@kimmonismus sram density is a joke for massive models. this is just a sanctions dodge because they are blocked from buying hbm.

31d116

yon nuta@yonnuta

@kimmonismus LPU insight: transformer inference is memory-bandwidth bound, not compute bound. That's why Groq could massively undercut GPU inference costs — ByteDance wants that same economics for Doubao. http://theinformation.com/articles/china…

31d110