Skip to content

TVycas/CUDA-Parallel-Prefix-Sum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Parallel Prefix Sum (Scan) with CUDA

This was one of the assignments for my Distributed & Parallel Computing module at the University of Birmingham.

For this assignment, we wrote a CUDA program that implements a work efficient exclusive scan as described in GPU Gems 3, Chapter 39 and demonstrated it by applying it to a large vector of integers.


Achievements

  • Block scanning
  • Full scan for large vectors (support for second and third level scans)
  • Bank conflict avoidance optimization (BCAO)

Tests

The block size for the tests was 128 and the vector size was 10000000.

  • Block scan without BCAO = 1.10294 msecs
  • Block scan with BCAO = 0.47206 msecs
  • Full scan without BCAO = 1.39594 msecs
  • Full scan with BCAO = 0.76058 msecs

Machine:

  • CPU - Intel® Core™ i7-8700 CPU @ 3.20GHz × 12
  • GPU - GeForce RTX 2060

About

Parallel Prefix Sum (Scan) with CUDA.

Topics

Resources

Stars

Watchers

Forks

Languages