CudaComputeDemo

A small demo of how to perform CUDA compute and why this is not always the optimal choice on small amounts of data.

TL, DR: If you can split your work in less than number of cores (threads) on your CPU - do it on your CPU. If your work can be split in more than 1.25*cores(threads) (it can be highly parallelized) on our CPU - it’s worth it do on CUDA.

Feel free to use this code as an example or a starting point in CUDA compute.

WARNING! This code is not optimized. This code only meant to show BASIC acceleration using CUDA compared to CPU. By doing further optimizations, both CUDA and CPU can be optimized. CPU code is purposefully NOT multithreaded. By implementing multithreading its compute time can be decreased by the factor of time/cores * 90% or if you CPU supports multithreading (most of them do) by about time/threads * 70% depending on code implementation.

Sample Data: calculation of 16k and 64k samples of Y[i] = A[i]B[i] + A[i] A[i]* A[i]*C[i]/B[i] on CPU and on CUDA.

CPU: Intel Core i5-8500, 3.90 GHz (6 cores, 6 threads) RAM: 16 Gb DDR4 2666 MHz GPU: nVidia GeForce GTX 970 • CUDA version: 5.2 • CUDA cores: 1664 • VRAM: 4095 MB • Max Block Size: 1024102464 • Max Warp Size: 32 • Max Grid Size: 21474836476553565535 OS Win10 Pro 1903

Vertical axis - time to compute in ms. Horizontal axis - block size on CUDA.

Acceleration on 16k samples: Acceleration on 64k samples:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
kernel.cu		kernel.cu
rawdataGTX970.txt		rawdataGTX970.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CudaComputeDemo

About

Releases

Packages

Languages

anissimov58/CudaComputeDemo

Folders and files

Latest commit

History

Repository files navigation

CudaComputeDemo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages