Tiny FlashAttention WIP A tiny flash attention implement in python, rust, cuda and c for learning purpose. python version naive pure python code triton version triton code [c version] TODO: naive pure c code naive cuda code standalone naive cuda code python binding cutlass cuda code [rust version] cutlass cute flash attention in action my env: cutlass v3.4, torch 1.14, cuda 12.4 en tutorial zh tutorial