/AI1d ago

Zaid Khan releases GPU Forecasters to predict CUDA and Triton kernel runtimes using LLMs as hardware surrogates

The framework utilizes a 20-billion parameter open-weights model.

--0--
Original post
AK@_akhaliq#28inAI

GPU Forecasters

Language Models as Selective Surrogates for Kernel Runtime Optimization

8:53 AM Β· Jun 2, 2026 Β· 32.2K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS11.7KBOOKMARKS32LIKES79RETWEETS32REPLIES5
Zaid Khan@codezakh

Can an LLM act as a selective model of a GPU during evolutionary search, by reasoning + forecasting a kernel’s runtime but deferring to a GPU when unsure? We produced 12k kernels + runtimes from evolutionary search, costing 400M reasoning tokens + 600 GPU-hours to answer this.

In our work GPU Forecasters, we study language models as selective surrogates for GPU kernel optimization.

1️⃣ Off-the-shelf LLMs can forecast how a GPU responds to a candidate kernel with non-trivial accuracy. If we rank candidates by these predictions and measure only the top 10% on a GPU, the fastest kernel we find is within 20% of the best in the pool.

2️⃣ We want LLMs to not just be accurate but also calibrated, so that we can use their uncertainty for selective prediction: during search, we should trust only confident forecasts and verify less confident forecasts by sending them to the GPU.

3️⃣ We train an open-weights surrogate (GPT-OSS-20B) with RL to improve both accuracy and calibration. Calibration-shaped rewards improve both confidence reliability and ranking ability, while correctness rewards alone do not.

4️⃣ Inside a real kernel search, the surrogate finds faster kernels than an equal-GPU-budget baseline by considering more candidates per measurement.

5️⃣ We release 12,388 LLM-generated GPU kernels with measured runtimes spanning 118 operations, CUDA and Triton backends, 3 GPU types, taking 400M tokens + 600 GPU-hours to produce. This dataset can be used for analyzing LLM-driven evolutionary program search dynamics, post-training LLMs for kernel code generation, and things we didn’t get a chance to explore, like training reward models!

Thread πŸ§΅πŸ‘‡

1dViews 11.7KLikes 79Bookmarks 32