-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to building system libraries with Spack #353
Conversation
Hello @xylar! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2022-04-14 07:55:54 UTC |
Documentation needs to be updated. |
@matthewhoffman and @trhille, could you please confirm that you can run the The process for getting a conda environment and load script on each machine is:
The
See https://github.com/xylar/compass/blob/switch_to_spack/conda/albany_supported.txt#L3-L7 for supported compilers (always |
@mark-petersen, could you please confirm that you can run the The process for getting a conda environment and load script on each machine is:
This is the same as the instructions for Matt and Trevor but without the The
Grizzly and Cori-KNL will no longer be supported after this update but all compiler and MPI configurations for other machines in the following table should work: |
TestingI have run the ocean
I have run the landice
|
On Badger, I was able to run the
|
On Cori, I had to cherry-pick commit ef2a1d9319d8efbfc71e2d96c79b89f49fbda1f4 in order to build because MALI has not yet merged this commit.
This is from the end of
|
I also saw the failures in |
The updated documentation built successfully but it's undoubtedly full of typos...'cause I wrote it... |
I used gnu on badger successfully, start to finish, with:
and then compiled
you have badger, impi, intel listed above. Is that correct? or am I using it incorrectly? |
@mark-petersen, you had separate instructions not to include the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this works with intel on badger when I remove --with_albany
. Thanks!
./conda/configure_compass_env.py --env_name compass_spack --conda ~/miniconda3/ --compiler intel --spack /usr/projects/climate/SHARED_CLIMATE/compass/badger/test_spack --mpi impi
Thanks @mark-petersen! I appreciate the testing and review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar, I'm approving this based on testing on Badger and Cori that I reported on above. @matthewhoffman will test on Anvil; I'm not sure we have access to Chrysalis. I've looked through the 37 updated files, but most of the particulars there are beyond me.
|
||
If you are on a login node, the script should automatically recognize what | ||
machine you are on. You can supply the machine name with ``-m <machine>`` if | ||
you run into trouble with the automatic recognition (e.g. if you're setting | ||
up the environment on a compute node, which is not recommended). | ||
|
||
If you are working with MALI, you should specify ``--with_albany``. This will | ||
ensure that the Albany and Trilinos libraries are included along among those |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra word here? "along among"
@xylar , this is maybe a minor detail, but I noticed the
|
@xylar , I tried this on Anvil (gnu, mvapich), and I was able to create the conda env and build MALI without any problems, which is amazing! When I try
I'm guessing this is an Anvil env issue and not related to the spack functionality, but I'm familiar enough with COMPASS configuration to know what to do. Is the answer obvious to you? (Note: I've never tried to run COMPASS or MALI on Anvil before.) |
@xylar , in addition to my partial success on Anvil, I tried this on Badger, and things worked great end-to-end! I get the same results as Trevor - everything passes except a validation failure on the Regarding the Regarding additional testing, I'm happy with Trevor having successfully tried Badger and Cori and me successfully testing Badger and Anvil (once we resolve the Anvil issue I noted above). I've never run anything on Chrysalis. How similar is that to running on Anvil? (You can follow up directly with me about that.) Note: A side effect of this spack config in COMPASS is the total runtime for our full integration suite on Badger went down from 12:45 to 6:00. I think this has to do with us having had a wonky i/o layer in our old library stack (we used to get a bunch of i/o warnings in our output logs)), and the clean, consistent library stack being generated by Spack fixes that. That is really nice! |
@xylar , one other question - what was the reason for abandoning Cori-KNL? We've been using sometimes still recently with MALI. I don't think it would be a major limitation to not have that ability here, but I'm wondering if there was a reason beyond it's slow and not preferred. |
I agree, I've changed the |
Get rid of ESMF details Use path to `nc-config`, `nf-config` and `pnetcdf-config` to find NetCDF and pNetCDF paths on Cori (just like on other machines).
I also suspect that this is due to changes in Albany. You'll have to decide how you want to deal with that but I'm happy to help provide "before" and "after" Spack environments for assessing changes in Albany if that's needed. It won't be trivial so just keep that in mind.
The plan is that, before I merge this PR, I will build all 13 spack environment in the standard location. There are already standard locations defined for each machine and supplying the
E3SM has a preference for running on Anvil rather than Chrysalis but Chrysalis is much faster. I think it's rather important to do MALI testing on Chrysalis because it is one of the primary machines for E3SM simulations. But that really only matters once we get Intel working for Albany, since the production simulations are always done with Intel.
I struggled a lot trying to get Spack to work for KNL. It is very complicated because the packages need to be compiled on KNL compute nodes (i.e. interactive jobs) rather than on login nodes. Python performance on KNL is also abysmal. So I tried to get Spack working on KNL but I gave up after more than a month of trying to make it happen. I would discourage anyone else from wasting time on this for a machine (and an architecture) that has limited remaining lifespan. |
@matthewhoffman, could you re-review based on my responses as soon as you are able? That way, I can start building the 13 Spack environments and then merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar , thanks for your explanations to the questions I left yesterday. I was able to confirm I can run successfully on Anvil after correcting my salloc
command.
I also try running the MALI full_integration suite on Chrysalis. I was able to set up the compass conda env using spack with Albany without a problem (gnu/openmpi). I was able to use compass suite
to set up the test suite. However, when I ran the tests, I got this MALI runtime error:
/lcrc/group/e3sm/ac.mhoffman/COMPASS_TESTS/BASELINE_chrys/landice/dome/2000m/sia_restart_test/full_run/./landice_model: error while loading shared libraries: libhdf5.so.103: cannot open shared object file: No such file or directory
It's possible there is some corruption between Anvil and Chrysalis, but I tried to be careful to avoid that. In any case, I'm approving this PR in case you want to proceed with what is known to be working. I don't anticipate us using Chrysalis for MALI any time soon, so I'm happy to follow up on sorting that out in a later PR if you prefer. Or we can try to figure it out now - whatever is best for your workload.
This allows us to use modules other than the E3SM defaults where needed. This merge also adds such a custom template for building with gnu and mvapich on Badger.
If the spack environment include Albany, the libraries needed to link in Albany will be added to the environment variable `MPAS_EXTERNAL_LIBS`.
Spack is running out of space on `/tmp` on some machines (e.g. Compy) and needs a different temp. directory.
We want to exclude it by default because it adds a lot of build overhead.
This gives us a lot more freedom to explore compiler and MPI library combinations.
It's not working still...
A file lists the machines, compilers and MPI libraries that work with Albany. When a developer creates a compass environment with a given set and using the `--with_albany` flag, an error will be raised if the configuration is not supported.
This is currently only needed on Anvil and Chrysalis with Albany and OpenMPI.
@matthewhoffman, regarding your difficulty on Chrysalis, we should try to track this down. I was able to run without a problem but it isn't great if I'm the only one who can. But we can leave that for the future. |
DeploymentI have deployed the following machines and configurations:
|
This merge switched to using Spack to build system libraries. This change:
conda/albany_support.txt
for supported machines, compilers and MPI libraries)To support Albany, we have moved away from E3SM supported compilers and modules on Badger. In the future, it may make sense to update the E3SM configuration for Badger to match.
To do:
grizzly
andcori-knl
from the list of machines