Tingwei-Jen

Tingwei Jen Tingwei-Jen

Software Engineer | Algorithm Engineer

Pinned Loading

YOLOv8_TensorRT_CUDA_DeepSort YOLOv8_TensorRT_CUDA_DeepSort Public

Object tracking implemented with YOLOv8, TensorRT, CUDA, DeepSort, and Pytorch.

C++ 2 1
Reduction_Optimization Reduction_Optimization Public

Cuda
SGEMM_Optimization SGEMM_Optimization Public

Optimized Single-Precision General Matrix Multiplication (SGEMM) using CUDA, achieving 89% of cuBLAS performance.

Cuda
Nsight_Compute_Tutorial Nsight_Compute_Tutorial Public
CUDA_Example CUDA_Example Public

C++
FlashAttention FlashAttention Public

C++