OpenAI just unveiled Jalapeño, its first custom AI chip designed from scratch for LLM inference-
It is OpenAI moving deeper into the full stack: chips, kernels, memory, networking, racks, scheduling, deployment and product experience.
OpenAI has learned from Cerebras-deal what is valuable in specialized inference hardware and is now attempting to translate that lesson into its own controllable platform.
Built with Broadcom and Celestica, Jalapeño is optimized around the workloads OpenAI actually runs across ChatGPT, Codex, the API and future agentic products.
Early samples are already running ML workloads in the lab at target frequency and power, including GPT-5.3-Codex-Spark. OpenAI says performance per watt should be substantially better than current state of the art, with detailed benchmarks coming later!
The strategic angle is obvious: less dependence on external GPUs, more control over compute economics, and a stronger flywheel between models, products, revenue and infrastructure.
Deployment is planned to start by the end of 2026.