OpenACC Merge #4

rsearles35 · 2018-07-16T13:57:50Z

Merging the OpenACC version of Minisweep.

…inline a bunch of the function calls in order to begin OpenACC implementation

* creating a new version of minisweep that uses OpenACC This change addresses the need by: * inlined function calls that could not be turned into OpenACC routines Related/future task(s): * Parallelize with OpenACC!

… face-initializations). It is slow right now due to too much data transfer. Will optimize this once all portions of the compute are running on the GPU and producing the correct results.

…alculated. Next step is to figure out how to parallelize over energy groups.

…parallelized

… we only launch one kernel per octant iteration instead of 3

… all 5 loops. Also collapsed some of the inner-computational loops. Still need to resolve the issue of the spacial loops. Can't collapse when using not-equals...

* Could not parallelize spacial dimensions due to the unpredictable direction of the sweep. This change addresses the need by: * Rewrote the sweep to only sweep in one direction. This allowed me to parallelize all 3 loops. Related/future task(s): * Potentially tweaking which loops are collapsed in the gang layer and which ones are collapsed in the vector layer. We are at the tuning/optimization stage now.

…uding the spacial parallelization

…massive data overhead because the local array must increase in size dramatically

…t array access. This should give us better memory coalescence.

…KBA threading pattern

…ll 8 directions asynchronously. Each octant runs a gang-parallel KBA wavefront iteration with vector-parallel in-gridcell computations

… in your cmake file will enable the OpenACC version of the code

…as well as devices within nodes if enough ranks are used.

…g an issue when building for multicore CPU

…ollide. This is not good for performance

rsearles35 and others added 30 commits June 7, 2017 09:22

openacc cmake script. Started an OpenACC version of sweeper. Need to …

765d088

…inline a bunch of the function calls in order to begin OpenACC implementation

inlined everything except the main solver function

fb08472

Why:

ed08a7e

* creating a new version of minisweep that uses OpenACC This change addresses the need by: * inlined function calls that could not be turned into OpenACC routines Related/future task(s): * Parallelize with OpenACC!

started parallelizing the computations within the sweep (not just the…

c3810dd

… face-initializations). It is slow right now due to too much data transfer. Will optimize this once all portions of the compute are running on the GPU and producing the correct results.

parallelized and scaling with respect to the number of angles being c…

17af9ef

…alculated. Next step is to figure out how to parallelize over energy groups.

permuted the energy loop and parallelized. energy and angles are now …

b0daa50

…parallelized

permuted the energy loop back to the outside of the loop nest so that…

a6c2418

… we only launch one kernel per octant iteration instead of 3

collapsed face initialization loops, yielding full parallelism across…

b1c61e1

… all 5 loops. Also collapsed some of the inner-computational loops. Still need to resolve the issue of the spacial loops. Can't collapse when using not-equals...

some of my changes were undone, this is the most updated version incl…

52d498e

…uding the spacial parallelization

fixed bug that was preventing parallelization across x/y/z dimensions

00939cf

got rid of some erroneous data transfer

6e43a18

reverted to previous version. Collapsing spacial dimensions yields a …

0579ea4

…massive data overhead because the local array must increase in size dramatically

used create clause to avoid some erroneous data transfer to the GPU

3c89010

updated openacc cmake script

1cb4fab

added OpenACC implementation of Gauss Seidel

920587a

added outline for PPoPP paper

f37a940

Merge branch 'master' of github.com:rsearles35/minisweep

775d32a

migrated paper to another repo

5d3c87c

fixed small create clause bug.

3f3454d

added octant-level parallelism to the OpenACC implementation

c11a937

First KBA implementation of Minisweep-OpenACC

6dc1b06

Parallelizing over X and Y instead of Y and Z since Z is the outermos…

0d12492

…t array access. This should give us better memory coalescence.

BUG FIX: changed local array to use X/Y instead of Y/Z to follow the …

7333be5

…KBA threading pattern

Updated OpenACC implementation of Minisweep. We are now sweeping in a…

2f997af

…ll 8 directions asynchronously. Each octant runs a gang-parallel KBA wavefront iteration with vector-parallel in-gridcell computations

some code cleanup and file renaming

2af0083

added sample openacc cmake file

1aac412

conditional compilation for OpenACC version. Using the -DUSE_ACC flag…

08a3405

… in your cmake file will enable the OpenACC version of the code

Added MPI support to OpenACC version. Ranks will split across nodes, …

24a10cc

…as well as devices within nodes if enough ranks are used.

changed to using acc_device_default for MPI building. This was causin…

f6edc33

…g an issue when building for multicore CPU

rsearles35 and others added 10 commits May 10, 2018 16:39

removed erroneous atomics in face update operations. These will not c…

b43ef14

…ollide. This is not good for performance

added OpenACC + MPI example build script

8de87bf

updated README

7dfd5b6

revised OpenACC implementation to include KBA sweep across gridcells.

d78161f

nblock_z wasn't being consumed by OpenACC version

7a77f8d

consuming nblock_z

bc30f76

changed rank assignment to round robin

f5e19e9

Merge branch 'master' of github.com:rsearles35/minisweep

2b38901

include syntax

29c2e22

Merge branch 'master' of github.com:rsearles35/minisweep

b82d8ee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenACC Merge #4

OpenACC Merge #4

rsearles35 commented Jul 16, 2018

OpenACC Merge #4

Are you sure you want to change the base?

OpenACC Merge #4

Conversation

rsearles35 commented Jul 16, 2018