Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenACC Merge #4

Open
wants to merge 40 commits into
base: master
Choose a base branch
from
Open

OpenACC Merge #4

wants to merge 40 commits into from

Conversation

rsearles35
Copy link

Merging the OpenACC version of Minisweep.

rsearles35 and others added 30 commits June 7, 2017 09:22
…inline a bunch of the function calls in order to begin OpenACC implementation
* creating a new version of minisweep that uses OpenACC

This change addresses the need by:

* inlined function calls that could not be turned into OpenACC routines

Related/future task(s):

* Parallelize with OpenACC!
… face-initializations). It is slow right now due to too much data transfer. Will optimize this once all portions of the compute are running on the GPU and producing the correct results.
…alculated. Next step is to figure out how to parallelize over energy groups.
… we only launch one kernel per octant iteration instead of 3
… all 5 loops. Also collapsed some of the inner-computational loops. Still need to resolve the issue of the spacial loops. Can't collapse when using not-equals...
* Could not parallelize spacial dimensions due to the unpredictable direction of the sweep.

This change addresses the need by:

* Rewrote the sweep to only sweep in one direction. This allowed me to parallelize all 3 loops.

Related/future task(s):

* Potentially tweaking which loops are collapsed in the gang layer and which ones are collapsed in the vector layer. We are at the tuning/optimization stage now.
…massive data overhead because the local array must increase in size dramatically
…t array access. This should give us better memory coalescence.
…ll 8 directions asynchronously. Each octant runs a gang-parallel KBA wavefront iteration with vector-parallel in-gridcell computations
… in your cmake file will enable the OpenACC version of the code
…as well as devices within nodes if enough ranks are used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant