-
Tsinghua University
- https://www.zhihu.com/people/mu-zi-zhi-6-28
- https://bruce-lee-ly.medium.com
Pinned Loading
-
decoding_attention
decoding_attention PublicDecoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.
C++ 12
-
flash_attention_inference
flash_attention_inference PublicPerformance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
-
cuda_hgemm
cuda_hgemm PublicSeveral optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
-
cuda_hgemv
cuda_hgemv PublicSeveral optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
-
matrix_multiply
matrix_multiply PublicSeveral common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
If the problem persists, check the GitHub status page or contact support.