166/365 of GPU Programming
@_arohan_ was kind enough to write up a beginner tutorial for the QR factorization challenge, so naturally I will spend some time today brushing up on my rusty Linear Algebra and studying the problem a bit more closely
For context, the folks at GPU Mode/Core Auto are running a new series of kernel competitions focused on classical linear algebra problems starting with batched square compact Householder QR factorizations on B200s
Will be a nice way to review more Linear Algebra but also get more experience with blackwell-specific code/instructions
165/365 of GPU Programming
Continuing my CS336 studies today and starting with assignment 1 which covers building a byte pair encoding (BPE) tokenizer, transformer, cross entropy loss function, AdamW optimizer and a training loop (incl serializing / loading model + optimizer state) from scratch without making use of torch.nn, torch.nn.functional or torch.optim.
Should be fun!

