-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge redgreen-optimized into develop #106
Conversation
…xibility, the interface should take care!
This was needed in an earlier version!
For now we don't do unit testing for GPU.
Improved runtime information on GPU binary.
* not CUDA-specific, and GPU-aware should now be the default
Rgo develop olivier2
We don't want this option to be enabled if the user doesn't specify any options (i.e. if CPU only is intended).
Looking great so far. Some conflicts need resolving. I suggest merging 'develop' into this branch once more to resolve conflicts. |
Is something substantial holding this back? (perhaps holidays :) ) |
Apologies, I should have given an update here. One big thing which makes me hesitant to merge is the fact that this branch is not yet usable within the IFS even on Nvidia platforms. When I use this branch within IFS-RAPS, I get numerical instabilities. This issue only appeared a couple weeks ago when I gave the branch a spin in RAPS on JSC/JURECA-DC (A100). This was a surprise to me as I successfully used an earlier version of rgo-develop within RAPS with (GPU enabled). I'm sure we can get to the bottom of it, but it's hard to say how long it will take. We could just get this PR over the line now with big warning signs that it is unstable. Or we could nail this bug beforehand. Any thoughts from others? |
I'd say as long as CPU and GPU works in the standalone benchmark and doesn't add regression over current behaviour (i.e., CPU version fully functional in IFS) there's nothing that should prohibit merging this and fixing incrementally going forward. |
Great, so all good for integration. |
By all means, do whatever you think will give us the cleanest history. |
I will attempt to merge the latest develop into rgo-develop now. |
Oh ffs. I have to open a new PR now? |
The work of many years by many people finally contributed. --------- Co-authored-by: Andreas Mueller <andreas.mueller@ecmwf.int> Co-authored-by: Nils Wedi <nils.wedi@ecmwf.int> Co-authored-by: Willem Deconinck <willem.deconinck@ecmwf.int> Co-authored-by: Olivier Marsden <olivier.marsden@ecmwf.int> Co-authored-by: Lukas Mosimann <lmosimann@nvidia.com>
* develop: Add GPU capability (ecmwf-ifs#106 , ecmwf-ifs#120)
This will bring GPU capability into the develop branch.
@wdeconinck and I discussed again the issue of how to do this merge yesterday. As @wdeconinck pointed out before, if we just merge develop into redgreen-optimized before submitting a PR of the latter, we should be able to do the merge in one go rather than first preparing develop with the cpu/gpu subdirectory structure. That's what I've tried here. It seems to have worked, so if we merge this PR we no longer need #103.
This is a draft as I want to carry out tests on both cpu and gpu versions first. According to Meld there are (literally) no differences between src/trans in develop and src/trans/cpu in rgo-develop. So I think this PR shouldn't screw anything up on the CPU side at least.
Note that this branch is still not stable on AMD platforms due to an outstanding issue with HIPFFT. Solving this is on the todo list. In the meantime, for AMD support you have to fall back on redgreengpu, which is the unoptimised but more portable branch.