-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
amrex::Abort does not unconditionally abort on GPU #543
Comments
This is important to know. I always reply on |
Yes.
You have to rewrite the function so it returns an error code that you then check. For an example, see the cooling code: Line 278 in fcde1aa
Line 298 in fcde1aa
You can then check the error condition in host code, and call |
Another example of how to do this is how Castro handles reactions: It updates an atomic variable for each failed cell, rather than updating cells in a separate MultiFab, so it is significantly more memory efficient. This implementation should be preferred. With this kind of implementation, you have to remember to update across all MPI ranks after the atomic update. In Castro, this step is done here: |
… chem and popiii problems) (#575) ### Description As described in #543 , `amrex::Abort()` does not unconditionally abort on GPUs. Following the workaround proposed in #543, I have implemented a way to circumvent this issue for problems using microphysics' `burn` (PrimordialChem and PopIIII). ### Related issues #543 ### Checklist _Before this pull request can be reviewed, all of these tasks should be completed. Denote completed tasks with an `x` inside the square brackets `[ ]` in the Markdown source below:_ - [x] I have added a description (see above). - [x] I have added a link to any related issues see (see above). - [x] I have read the [Contributing Guide](https://github.com/quokka-astro/quokka/blob/development/CONTRIBUTING.md). - [ ] I have added tests for any new physics that this PR adds to the code. - [ ] I have tested this PR on my local computer and all tests pass. - [x] I have manually triggered the GPU tests with the magic comment `/azp run`. - [x] I have requested a reviewer for this PR. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piyush Sharda <psharda@RSAA-043527.local>
Fixed for PopIII by #575 |
…n step (#716) ### Description A fix to #543 : make the Newton-Raphson iteration return a flag and check it on host. More changes: - Change `maxIter` from 400 to 50. 50 iterations should be more than enough. If it doesn't converge in less than 50 steps, it probably never will. ### Related issues Fixes #543 ### Checklist _Before this pull request can be reviewed, all of these tasks should be completed. Denote completed tasks with an `x` inside the square brackets `[ ]` in the Markdown source below:_ - [x] I have added a description (see above). - [x] I have added a link to any related issues see (see above). - [x] I have read the [Contributing Guide](https://github.com/quokka-astro/quokka/blob/development/CONTRIBUTING.md). - [ ] I have added tests for any new physics that this PR adds to the code. - [x] I have tested this PR on my local computer and all tests pass. - [x] I have manually triggered the GPU tests with the magic comment `/azp run`. - [x] I have requested a reviewer for this PR. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Describe the bug
We have been relying on
amrex::Abort
to unconditionally abort the code when called from GPU code. However, this only works when compiled in debug mode (or with additional compiler flags).We need to rewrite the error handling code for iterative solves so that it correctly handles errors when
-DCMAKE_BUILD_TYPE=Release
is used (the default).The Quokka documentation incorrectly describes
amrex::Abort
based on the old AMReX documentation (which was incorrect): https://quokka-astro.github.io/quokka/error_checking.htmlTo Reproduce
Steps to reproduce the behavior:
Additional context
The AMReX documentation was recently changed (October 2023) to reflect this new behavior, but this change was not communicated to the user community: AMReX-Codes/amrex#3605.
This affects both the Newton-Raphson solve used by the radiation code as well as the Microphysics chemistry integrator.
cc @psharda @chongchonghe
The text was updated successfully, but these errors were encountered: