@_arohan_ So the previous sus two were exploits right
2128 to 1711 us today. 馃
The Cuda Colonel still in top place but only by a small margin to nikhilbarhate99
A public kernel optimization contest on NVIDIA B200 hardware has crowned competitor gum with the fastest time yet for batched QR decomposition at 1097 microseconds, part of a rapid series of improvements that have shaved hundreds of microseconds off earlier entries in just days.
@_arohan_ So the previous sus two were exploits right
2128 to 1711 us today. 馃
The Cuda Colonel still in top place but only by a small margin to nikhilbarhate99
Researchers following the GPU MODE challenge note that the steepest drops in reported times look inconsistent with surrounding submissions, echoing earlier reward-hacking cases on the same platform where entries later required corrections.
Batched QR appears in second-order optimization and sparse solvers, so any genuine speedups on Blackwell silicon could eventually feed into research tooling, though exact batch sizes, evaluation harness details, and final organizer review for the top entry remain unconfirmed with roughly two weeks left in the contest.
Users question Gum's top spot on the GPU MODE B200 leaderboard at 1097碌s as a reward hack that exploited the scoring system.
Good morning!
gum arrived at #1 overnight with a new record timing.
New king of the hill?
1711 to 1287us.
The competition is intense.

@giffmana previous 3 did some clever exploit like overwrite timing is my guess
Looks like it was a reward hack
Good morning!
gum arrived at #1 overnight with a new record timing.

@_arohan_ @giffmana Caching results than reusing them if fitting, early stopping qr for benchmark shapes and exploiting timing

@_arohan_ matrix exponential challenge when? @PrussianGlob has been cooking.

@_arohan_ Hey Rohan
Mind checking out my proposal

@_arohan_ dropping a whole 424us from the previous record is wild
what got optimized this time?