This is a fantastic article, a must read! Lemme also share a few of thoughts/experience regarding kernels, and the current state of agents for writing automated kernels.
1. IMO writing kernels by hand is a regression from quality work, especially when you know that the kernels are not compatible with different hardware (Hopper vs Blackwell as an example). I would not have used the word "quality" if it were translative. Compilers are improving, but not fast enough. I hope they do in the near future.
2. With capacity constraints, labs will try to procure any hardware available for serving. But it comes at a cost: You would again be required kernels for specific hardware to get the optimal performance. No way, people are going to redo the entire stuff. So, kernel automation is a necessity and bigger priority than people anticipate.
3. Even the best coding models are extremely poor at writing kernels unless you guide them at different steps. Experience from the recent past: A few weeks ago I wanted to write some kernels for fused ops for quant-dequant. First I asked 5.5 and 4.6 to use Pallas. I provided every bit of detail required to the agent. They ended up writing slop. Then I provided the pieces from @PatrickToulme pyptx library, and asked them to write kernels in it. They again wrote slop. Funny part? Both agents did a check on the written kernel codes and were confident that they was correct. I even allowed them to cross-check the code outputs, and to them nothing seemed wrong.
4. Kernel automation will happen, but it won't happen with a generalist model. People would have to fine-tune a coding model to teach how to write proper kernels without generating slop. But it again comes with a bit of not-so-easy problem: The labelled data!