GitHub - MatanHamilis/one_stencil: Multiple 1-stencil implementations using nvidia cuda.

This repo is intended to illustrate the power of the register cache mechanism for CUDA applications. Using the register cache one can maintain an intra-warp cache as part of the cache hierarchy in CUDA GPUs. The implementation of the register cache is based on the use of shuffle (_shfl) instruction.

Prerequisits:

Cuda GPU.
Linux system (ensure you have git, make and nvidia drivers installed).

How to use this code:

Clone this repository.
Please check your GPU compute capabilities, if they are different from 3.5 please modify the simple makefile accordingly. You may refer to this link: https://developer.nvidia.com/cuda-gpus
Run get_times script, you may need to run "chmod u+x ./get_times" first.

The output: A subdirectory named "results" will be created with different files in it. Each file, except for "full_times" contains the results of a single implementation. For each implementation we record the time of k-stencil for k from 1 to 20. Each result is averaged over 5 executions, with min and max left out.

The following implementations are considered: shmem_times - Will contain the shared memory based implementation results. rc_times_X - Will contain the register cache based implementation with X outputs per thread. (For 1<=X<=8)

For convenience, we also create full_times csv file, which contains at the first column the shmem_times and next rc_times_1 up to rc_times_8.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
check.sh		check.sh
final_code.cu		final_code.cu
get_times		get_times

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

MatanHamilis/one_stencil

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages