Grok Build is pretty good at optimizing my code in one shot.
Prompt:
I want you to optimize it entirely on GPU to speed it up. Measure two metrics: the result must compare with the golden image (CPU) and be nearly identical (PSNR > 40dB), with fast pixels per second.
Make a plan to a) write GPU equivalent code, b) write a benchmark suite to measure PSNR and pixels per second, c) execute various optimization strategies.
Go!