A course taught at Cornell Tech where students build up the infrastructure behind vector autodifferentiation.
A visual description of the adaptive sparse attention technique.
A visual description of banded sparse matrices, a really useful and underused form of sparsity.
In this post I present an "annotated" version of the paper in the form of a line-by-line implementation. I have reordered and deleted some sections from the original paper and added comments throughout.