3h ago

Elon Musk claims SpaceX's custom C-based AI training framework is 10x faster than JAX on 220,000 GB300 GPUs

Critics questioned the JAX benchmark and choice of C.

โ€”โ€”0โ€”โ€”
Original post

@elonmusk Elon, trust me, this makes no sense. Someone is strongly overclaiming or overpromising to you.

Elon MuskElon Musk@elonmusk

SpaceX has almost finished writing V1.0 of an in-house AI training stack in C that exact-maps to 220k GB300s with 800G NICs, making heavy use of pipeline parallelism and getting as close to bare metal as possible. The potential speed improvement vs JAX for large training runs is over an order of magnitude.

6:27 AM ยท May 28, 2026 ยท 15.2M Views
3:12 PM ยท May 28, 2026 ยท 1.1K Views

j...jax????

Elon MuskElon Musk@elonmusk

SpaceX has almost finished writing V1.0 of an in-house AI training stack in C that exact-maps to 220k GB300s with 800G NICs, making heavy use of pipeline parallelism and getting as close to bare metal as possible. The potential speed improvement vs JAX for large training runs is over an order of magnitude.

6:27 AM ยท May 28, 2026 ยท 15.2M Views
2:30 PM ยท May 28, 2026 ยท 24.2K Views

@MParakhin I've become extreme in my position: any primitive that isn't directly provided by the hardware provider is hamstringing you too much

Mikhail ParakhinMikhail Parakhin@MParakhin

The fact that JAX was even mentioned makes me think of two things: 1) xAI still needs better ML people 2) PyTorch is stagnating and probably is not going to recover :-( It is such an unparalleled achievement, but lost key people, exiled to FAIR now...

2:22 PM ยท May 28, 2026 ยท 13.1K Views
2:30 PM ยท May 28, 2026 ยท 2.6K Views

@MParakhin as soon as you want to do anything out of the fold pytorch will hold you back. it's more difficult but the pressure on talented SWE time has been released enough by LLMs for it to be a good tradeoff to make. Just do it in cuda

kachekache@yacineMTB

@MParakhin I've become extreme in my position: any primitive that isn't directly provided by the hardware provider is hamstringing you too much

2:30 PM ยท May 28, 2026 ยท 2.6K Views
2:31 PM ยท May 28, 2026 ยท 1.2K Views

Could gain another order of magnitude if they were to switch to decision trees instead of neural nets. Just sayinโ€™.

Elon MuskElon Musk@elonmusk

SpaceX has almost finished writing V1.0 of an in-house AI training stack in C that exact-maps to 220k GB300s with 800G NICs, making heavy use of pipeline parallelism and getting as close to bare metal as possible. The potential speed improvement vs JAX for large training runs is over an order of magnitude.

6:27 AM ยท May 28, 2026 ยท 15.2M Views
1:22 PM ยท May 28, 2026 ยท 6.3K Views

@yacineMTB What I said :-)

Mikhail ParakhinMikhail Parakhin@MParakhin

The fact that JAX was even mentioned makes me think of two things: 1) xAI still needs better ML people 2) PyTorch is stagnating and probably is not going to recover :-( It is such an unparalleled achievement, but lost key people, exiled to FAIR now...

2:22 PM ยท May 28, 2026 ยท 13.1K Views
3:09 PM ยท May 28, 2026 ยท 1.6K Views

The fact that JAX was even mentioned makes me think of two things: 1) xAI still needs better ML people 2) PyTorch is stagnating and probably is not going to recover :-( It is such an unparalleled achievement, but lost key people, exiled to FAIR now...

Elon MuskElon Musk@elonmusk

SpaceX has almost finished writing V1.0 of an in-house AI training stack in C that exact-maps to 220k GB300s with 800G NICs, making heavy use of pipeline parallelism and getting as close to bare metal as possible. The potential speed improvement vs JAX for large training runs is over an order of magnitude.

6:27 AM ยท May 28, 2026 ยท 15.2M Views
2:22 PM ยท May 28, 2026 ยท 13.1K Views

@elonmusk Well thatโ€™s good

Elon MuskElon Musk@elonmusk

SpaceX has almost finished writing V1.0 of an in-house AI training stack in C that exact-maps to 220k GB300s with 800G NICs, making heavy use of pipeline parallelism and getting as close to bare metal as possible. The potential speed improvement vs JAX for large training runs is over an order of magnitude.

6:27 AM ยท May 28, 2026 ยท 15.2M Views
12:48 PM ยท May 28, 2026 ยท 3.4K Views

PufferLib trains small task-specific reinforcement learning models at up to 20M steps/second in 5,000 lines of CUDA C on a single GPU. Stop using awful DSLs just because they are pretending to be Python!

Elon MuskElon Musk@elonmusk

SpaceX has almost finished writing V1.0 of an in-house AI training stack in C that exact-maps to 220k GB300s with 800G NICs, making heavy use of pipeline parallelism and getting as close to bare metal as possible. The potential speed improvement vs JAX for large training runs is over an order of magnitude.

6:27 AM ยท May 28, 2026 ยท 15.2M Views
1:40 PM ยท May 28, 2026 ยท 3.9K Views

mythos is so good it can rewrite jax in c ๐Ÿซช

Elon MuskElon Musk@elonmusk

SpaceX has almost finished writing V1.0 of an in-house AI training stack in C that exact-maps to 220k GB300s with 800G NICs, making heavy use of pipeline parallelism and getting as close to bare metal as possible. The potential speed improvement vs JAX for large training runs is over an order of magnitude.

6:27 AM ยท May 28, 2026 ยท 15.2M Views
12:17 PM ยท May 28, 2026 ยท 143 Views

@eliebakouch selling half their compute in exchange for claude access was so worth it

elieelie@eliebakouch

"/goal rewrite jax in rust"

12:09 PM ยท May 28, 2026 ยท 13.1K Views
12:18 PM ยท May 28, 2026 ยท 60 Views
Elon Musk claims SpaceX's custom C-based AI training framework is 10x faster than JAX on 220,000 GB300 GPUs ยท Digg