Skip to content

Latest commit

 

History

History
127 lines (94 loc) · 9.02 KB

backend.md

File metadata and controls

127 lines (94 loc) · 9.02 KB

Backend Optimization

This post gathers backend optimization techniques in machine learning.

CPU

GEMM

CUDA

Elementwise operation

Reduction

Scan

GEMM/GEMV

Convolution

Layer

Miscellaneous

Framework

Profiling

Customized PyTorch kernel

Footnotes

  1. 【BBuf的CUDA笔记】九,使用newbing(chatgpt)解析oneflow softmax相关的fuse优化

  2. The code is available at https://github.com/niuhope/cuda_sgemm.

  3. The code is now part of cuTLASS.

  4. 使用CUDA实现块稀疏矩阵向量乘(BSpMV)

  5. The code is available at https://github.com/Oneflow-Inc/oneflow/blob/master/oneflow/core/ep/cuda/primitive/permute.cu.

  6. PyTorch nn.Unfold generalizes the $\verb|im2col|$ operation.