This awesome conversation with @stephenbalaban of @LambdaAPI is also available on Spotify, Apple Podcasts and here on YouTube:
https://youtu.be/0NttU4CbyVs?si=TomvmP2GnO5qwFp_
State of AI compute 2026: my conversation with @stephenbalaban of @LambdaAPI on the neocloud boom, data centers, GPUs and what's ahead
00:00 — Cold open 01:21 — Why GPU compute was never a commodity 02:45 — The H100 price index and what it gets wrong 04:02 — The real moat: technology or financing? 05:57 — Winner-take-all, or room for many neoclouds 06:48 — Are we overbuilding or underbuilding AI compute? 09:26 — What if AI gets 10x more compute-efficient? 10:44 — The real bottleneck: land, power, and shell 11:38 — The backlash against data centers — and the misinformation 15:00 — Opening the hood: from photons to tokens 17:11 — Extracting more value from the same chip 19:26 — Frontier inference and distributed training, explained 23:26 — What actually drives compute cost 25:21 — Lambda's chip stack and the NVIDIA relationship 26:17 — A multi-silicon world? CUDA, CUDNN, and NVIDIA's real moat 28:59 — Networking, storage, and the one-click cluster 34:46 — Renting vs. owning, and full vertical integration 36:24 — How global is Lambda? Does location still matter? 38:44 — The financing stack: off-take agreements, SPVs, and credit 41:16 — Why a 2023 GPU leases for more today 42:36 — A futures market for compute? 43:54 — Origin story: facial recognition, Perceptio, and Apple 47:03 — The Lambda hat and Dream Scope 48:59 — The $60K bet that became a cloud business 52:00 — Holding the team together through the hard times 54:30 — Bringing on a new CEO; Stephen as CTO 57:33 — Matching xAI on high-velocity deployment 59:29 — "AI won't write software — it will become the software" 01:01:30 — Neural software vs. vibe coding 01:04:25 — Do agents change the compute layer 01:06:14 — Self-assembling software inside Lambda 01:08:18 — Gigawatt-scale AI factories 01:08:57 — One person, one GPU 01:12:04 — Hot takes: overrated and underrated in AI
