hgemm

Here are 5 public repositories matching this topic...

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

cuda gemm gemv hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

gpu cuda cublas nvidia gemm matrix-multiply tensor-core hgemm

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

gpu cuda cublas nvidia gemm gemv matrix-multiply tensor-core hgemm cuda-core hgemv

⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, achieve peak⚡️ performance

cuda tensor-cores hgemm

Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.

gpu cuda cublas nvidia gemm matrix-multiply tensor-core hgemm back2back-hgemm fused-hgemm back2back-gemm fused-gemm

Add a description, image, and links to the hgemm topic page so that developers can more easily learn about it.

To associate your repository with the hgemm topic, visit your repo's landing page and select "manage topics."