#
hgemm
Here are 4 public repositories matching this topic...
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
-
Updated
Sep 8, 2024 - Cuda
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
gpu
cuda
cublas
nvidia
gemm
matrix-multiply
tensor-core
hgemm
back2back-hgemm
fused-hgemm
back2back-gemm
fused-gemm
-
Updated
Nov 3, 2023 - Cuda
Improve this page
Add a description, image, and links to the hgemm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the hgemm topic, visit your repo's landing page and select "manage topics."