I've let Fable optimize GPU kernels autonomously using "auto-gpu-kernel" harness, if it joined the NVIDIA's competition today, it would have won 🥇 in 4/5 kernels against humans.
Fable can write Gluon kernels, do warp-specialization, use TMA tcgen05 etc.
(Speedup vs Opus 4.8)

