yzhaiustc

Yujia Zhai yzhaiustc

Achievements

Optimizing-SGEMM-on-NVIDIA-Turing-GPUs Optimizing-SGEMM-on-NVIDIA-Turing-GPUs Public

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 282 45
Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F Public

Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.

C 116 22
Optimizing-SGEMV-on-NVIDIA-GPUs Optimizing-SGEMV-on-NVIDIA-GPUs Public

An implementation of SGEMV with performance comparable to cuBLAS.

Cuda 7 6
Optimizing-DGEMV-on-Intel-CPUs Optimizing-DGEMV-on-Intel-CPUs Public

Highly optimized DGEMV on CPU with both serial and parallel performance better than MKL and OpenBLAS.

C 3 1
NVIDIA/cutlass NVIDIA/cutlass Public

CUDA Templates for Linear Algebra Subroutines

C++ 5.7k 978