I work on Triton compiler at Meta, with a focus on kernel performance on Nvidia GPU.
Posts
-
The Shared Memory layout of Blackwell MMAv5 operands
-
Deriving formula for Backward Gradient of Softmax
-
Deriving formula for Backward Gradient of Matrix Multiplication
subscribe via RSS