got my kernel in qr_v2 leaderboard @GPU_MODE, one thing I can say is you need to engineer every bench shape to squeeze most from it, I found cluster reduce helps in some while they could be worse in others and yeah like it was a nice try. It was fun!
Rohan Anil, CoreAutoAI co-founder, submits custom QR decomposition kernel to GPU_MODE leaderboard using shape-specific tuning
Techniques like cluster reduce degraded performance in certain shapes.
Users are optimistic about pushing QR kernel performance further on the leaderboard and praise Modal as a cool people company inside its AI infrastructure wrapper.
No Digg Deeper questions have been answered for this story yet.
Most Activity
QR kernel competition on @GPU_MODE was well worth the funding
tpot learned/brushed up on QR, a precursor for better optimization
very neat and clever optimization on top of Nvidia libraries for different data distribution
Blackwells go brrr
modal is a linear algebra company wrapped in a container company wrapped in an ai company
QR kernel competition on @GPU_MODE was well worth the funding
tpot learned/brushed up on QR, a precursor for better optimization
very neat and clever optimization on top of Nvidia libraries for different data distribution
Blackwells go brrr

@SubhoGhosh02 @GPU_MODE hand written or agentic?

@SubhoGhosh02 @GPU_MODE that's how it is in these competitions!

@SubhoGhosh02 @GPU_MODE i think i got a hint to push below 4.7 👀

@maharshii @GPU_MODE This was handwritten in cute, I could get only as far as 6ms with claude :)

@charles_irl Checks out from everything I’ve seen

@snowclipsed @GPU_MODE true

@datavorous_ @GPU_MODE thanks lets see how far can I push 😁

@charles_irl wrapped in a cool people company