19h ago

Sebastian Raschka, LLMs-from-scratch creator, adds a from-scratch PyTorch implementation of DeepSeek sparse attention to his repository

Charts in the repository compare prefilling and decoding costs.

0
Original post

Added a DeepSeek Sparse Attention (DSA) from-scratch implementation to my LLMs-from-scratch repo thanks to an awesome new reader contrib. With motivation, overview, and GPT-style model reference implementation as standalone example code: https://github.com/rasbt/LLMs-from-scratch/tree/main/ch04/09_dsa

8:20 AM · May 23, 2026 View on X
Reposted by