Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge redgreen-optimized into develop #106

Closed
wants to merge 449 commits into from

Conversation

samhatfield
Copy link
Collaborator

This will bring GPU capability into the develop branch.

@wdeconinck and I discussed again the issue of how to do this merge yesterday. As @wdeconinck pointed out before, if we just merge develop into redgreen-optimized before submitting a PR of the latter, we should be able to do the merge in one go rather than first preparing develop with the cpu/gpu subdirectory structure. That's what I've tried here. It seems to have worked, so if we merge this PR we no longer need #103.

This is a draft as I want to carry out tests on both cpu and gpu versions first. According to Meld there are (literally) no differences between src/trans in develop and src/trans/cpu in rgo-develop. So I think this PR shouldn't screw anything up on the CPU side at least.

Note that this branch is still not stable on AMD platforms due to an outstanding issue with HIPFFT. Solving this is on the todo list. In the meantime, for AMD support you have to fall back on redgreengpu, which is the unoptimised but more portable branch.

lukasm91 and others added 30 commits October 20, 2023 23:11
This was needed in an earlier version!
marsdeno and others added 4 commits June 19, 2024 13:05
* not CUDA-specific, and GPU-aware should now be the default
We don't want this option to be enabled if the user doesn't specify any options (i.e. if CPU only is intended).
@wdeconinck
Copy link
Collaborator

Looking great so far. Some conflicts need resolving. I suggest merging 'develop' into this branch once more to resolve conflicts.

@wdeconinck
Copy link
Collaborator

Is something substantial holding this back? (perhaps holidays :) )

@samhatfield
Copy link
Collaborator Author

Apologies, I should have given an update here. One big thing which makes me hesitant to merge is the fact that this branch is not yet usable within the IFS even on Nvidia platforms. When I use this branch within IFS-RAPS, I get numerical instabilities. This issue only appeared a couple weeks ago when I gave the branch a spin in RAPS on JSC/JURECA-DC (A100). This was a surprise to me as I successfully used an earlier version of rgo-develop within RAPS with (GPU enabled). I'm sure we can get to the bottom of it, but it's hard to say how long it will take.

We could just get this PR over the line now with big warning signs that it is unstable. Or we could nail this bug beforehand. Any thoughts from others?

@reuterbal
Copy link
Contributor

I'd say as long as CPU and GPU works in the standalone benchmark and doesn't add regression over current behaviour (i.e., CPU version fully functional in IFS) there's nothing that should prohibit merging this and fixing incrementally going forward.

@wdeconinck
Copy link
Collaborator

Great, so all good for integration.
Could I please propose a squash-merge of this branch into develop to keep history manageable?

@samhatfield
Copy link
Collaborator Author

By all means, do whatever you think will give us the cleanest history.

@samhatfield samhatfield marked this pull request as ready for review July 15, 2024 10:16
@samhatfield
Copy link
Collaborator Author

I will attempt to merge the latest develop into rgo-develop now.

@samhatfield samhatfield deleted the rgo-develop branch July 15, 2024 10:32
@samhatfield
Copy link
Collaborator Author

Oh ffs. I have to open a new PR now?

wdeconinck added a commit that referenced this pull request Jul 19, 2024
The work of many years by many people finally contributed.

---------

Co-authored-by: Andreas Mueller <andreas.mueller@ecmwf.int>
Co-authored-by: Nils Wedi <nils.wedi@ecmwf.int>
Co-authored-by: Willem Deconinck <willem.deconinck@ecmwf.int>
Co-authored-by: Olivier Marsden <olivier.marsden@ecmwf.int>
Co-authored-by: Lukas Mosimann <lmosimann@nvidia.com>
wdeconinck added a commit to DJDavies2/ectrans that referenced this pull request Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants