-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Woptim#24] update spack, uberenv, radiuss spack configs and use spack user_cache_path #1284
Conversation
This reverts commit 8554451.
@adrienbernede I don't see how your changes cause the CMake issue related to desul atomics? Do you have any ideas? |
@adrienbernede @davidbeckingsale After other PRs were merged, I merged develop into this. Now, it appears that all the target export stuff is broken. I don't know what's going on here. |
David thought it could be because of a hidden BLT update. |
@adrienbernede that's OK. We'll track it down. |
@adrienbernede, @rhornung67 and I poked at this some, and I just pushed what seems to be a minimal fix. The short version is that the RAJA export set didn't contain the exported BLT targets, and the targets were getting exported twice somehow. I'm not sure this will solve the whole problem, but hopefully one step closer. |
@trws asking @davidbeckingsale to chime in. He says your change will break other things. |
I'm second guessing myself now - let's see what the CI says. My concern was the fact that raja-config.cmake looks for camp. Both depend on the BLT targets, but if they are in the RAJA export set, they won't be there when |
They do, camp for some reason generates a |
I should say, the thing I'm not sure about is the downstream impact, you reworked the export setup for a reason as I recall @davidbeckingsale, and I certainly don't want to break app builds or something where they need the BLT targets as well. |
The export split fixed finding a camp that was built as part of RAJA - since it could depend on the BLT targets which were not imported until the RAJA targets were pulled in. But then adding target export to camp perhaps made the fix redundant because camp now puts the targets in its own export set too. |
When we think we have this stuff straightened out and we have tioga CI working, we should have a couple of apps try it out and then do a new coordinated release of RAJA, Umpire, camp, etc. |
@davidbeckingsale something's amiss with the new cmake test you added. That appears to be what is breaking gitlab CI. |
Yup, that's what I was thinking would happen! But it's not the test that's wrong :p |
Ok, the change I made means that file doesn't exist anymore, I'm not sure we can both fix the RAJA and camp targets and still generate that file based on the errors we were getting, do we need it? |
Ok, I think this is ready to go finally. It still has workarounds in it for the AMD issues, and refers explicitly to After this goes in, based on discussions we've had, I think we need to look at the following cleanup to avoid having this be quite so painful next time:
|
@trws do we need the logic you added to camp to look for |
Good question, the final version of the camp and raja updates I cooked up here don't actually do that, but if as part of the blt update we also unconditionally add the aliases, and protect the creation of them in blt from errors caused by them already existing, then we probably should. It would make it easier to make sure composed projects always get the appropriate targets, but right now we didn't need it so I didn't pop it in. |
Ah gotcha, that makes sense. |
And of course, now that lassen is back up and everything else works, corona's TCE update is breaking us. 😖 More importantly, the icpc 18 build is broken, though 19 is fine. Are we still intentionally supporting 18 at this point @rhornung67? |
No. Let's trash Intel 18. A few apps are still on Intel 19. I'm trying to persuade them to move to Intel 2022 if they want an intel compiler. We can remove Intel 19 when everyone is off that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working through all this mess.
Thanks @rhornung67, the new corona TCE is currently breaking us because the rocm 5.1.0 doesn't exist anymore on there. Corresponding PR on radiuss-spack-configs is already up. |
Awesome work thank you so much ! |
Why does LC insist on removing earlier versions of compilers so quickly. We need them around so we can do comparisons when issues arise. I thought we mentioned this to them before. |
Tioga still has them all, and corona still has some other older versions. I'm guessing something unfortunate happened on corona that broke it or corrupted it or something. |
Nothing like debugging CI when things happen that you can't control, huh? |
I think that when this PR is merged, we can turn off corona (if we don't need it) and move to tioga. Correct? |
Yes, and we'll get more relevant test results that way too honestly. Though be aware that it will be flipping to flux in a few weeks, shortly after corona does. |
hmm, that’s good to know. I’ll be attending the Flux AWS tutorial, hopefully I’ll still have enough time to move scripts from slurm to flux... |
Also, we are planning to use radiuss-shared-ci on raja, so I’ll have to implement a tioga pipeline there. |
On this I'm hoping it will be minimal work to get going, Ryan Day is maintaining a set of wrappers for flux commands so If you have access to rzalastor, there's already a 4-node flux partition on there that's ready for testing, not sure if the wrappers have made it on there or not though. |
SPEC: " tests=none %intel@18.0.2" | ||
DEFAULT_TIME: 40 | ||
extends: .build_and_test_on_ruby | ||
# icpc_18_0_2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
DEFAULT_TIME: 40 | ||
extends: .build_and_test_on_ruby | ||
# icpc_18_0_2: | ||
# variables: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@adrienbernede RAJA has not switched to use a service account for Gitlab CI yet. It runs as @davidbeckingsale |
It's green, it's finally ready to go. |
Merge this or wait for other PRs to be merged (camp, blt, radiuss-spack-configs, etc.) first? |
The others are all merged except spack, but on spack I'm seeing umpire builds failing, @davidbeckingsale? |
Summary
Design review
At the basis, the changes in this branch are the same as in task/add-tioga-ci-pipeline ( #1272 ), without the tioga specifics.
Thanks @kab163 for doing a good part of the update jobs.
Then I added changes to make sure Spack is used in an isolated environment, especially regarding clingo install. Those changes are based on @trws suggestions found here: spack/spack#31030 (comment).
I tested those change using may CI account. With this PR, the same test should run as @davidbeckingsale and give the same results, because clingo setup is now independent of the user config.
CI results:
Misc.