Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scotch #550

Merged
merged 28 commits into from
May 19, 2023
Merged

Add scotch #550

merged 28 commits into from
May 19, 2023

Conversation

ulmononian
Copy link
Collaborator

@ulmononian ulmononian commented Apr 20, 2023

WW3 requires the scotch package for mesh/graph partitioning (see spack-stack #465, #336).

This simply adds scotch to the common/packages.yaml and to the ufs-weather-model-static template (though I am not certain this needs to be done here). My understanding is that WW3 needs scotch, but I am not sure if this will be added to the UFS common module set (perhaps @JessicaMeixner-NOAA or @MatthewMasarik-NOAA can comment on that?).

Installs successfully as standalone package and as a part of ufs-weather-model-env (intel & gcc) on Orion.

For testing purposes, scotch/7.0.3 has been installed in the Hera spack-stack/1.3.0 Unified Environment here: /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.3.0/envs/unified-env.

Depends on NOAA-EMC/spack/260.

Would close #465 and #336.

Note: .gitmodules pointer and spack submodule will revert back to current spack-stack develop (only modified for testing).

@JessicaMeixner-NOAA
Copy link

@MatthewMasarik-NOAA and I actually don't have access to a test to confirm if we'd need this in the ufs-common modules but from earlier comments from @DeniseWorthen I believe this is indeed the case.

@JessicaMeixner-NOAA
Copy link

@MatthewMasarik-NOAA is working to test the new modules with the WW3 standalone regression tests. @DeniseWorthen can test within the ufs-weather-model coupled framework.

@ulmononian ulmononian mentioned this pull request Apr 20, 2023
@ulmononian
Copy link
Collaborator Author

@MatthewMasarik-NOAA is working to test the new modules with the WW3 standalone regression tests. @DeniseWorthen can test within the ufs-weather-model coupled framework.

thanks for letting me know. it's simple enough for us to add/remove it from the ufs-wm spack/spack-stack configs, so we can play it by ear regarding ufs_common.

@climbfuji
Copy link
Collaborator

Ok, I got the ubuntu CI build going again, and it passed. Will work on the mac CI test tonight and then this is hopefully ready to go.

@ulmononian
Copy link
Collaborator Author

Hi @ulmononian @AlexanderRichert-NOAA, I wanted to check in on this to see if there is an updated install I should test?

Hi @MatthewMasarik-NOAA. Apologies, but I have not had time to work on this recently. I will try to get back to it soon.

@ulmononian
Copy link
Collaborator Author

Hi @ulmononian @AlexanderRichert-NOAA, I wanted to check in on this to see if there is an updated install I should test?

@MatthewMasarik-NOAA i've added two variants to the build recipe (to build w/ THREADS and MPI_THREAD_MULTIPLE set to OFF). i know you mentioned this was probably not required (so probably not the issue) on hera, but maybe we will get lucky. do you mind testing with this again (scotch has been updated but the module is just scotch/7.0.3): /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.3.0/envs/unified-env?

@MatthewMasarik-NOAA
Copy link

Hi @ulmononian, thank you for providing this new scotch build. Yes, I'll test it and let you know the outcome.

@ulmononian
Copy link
Collaborator Author

ulmononian commented May 16, 2023

@AlexanderRichert-NOAA perhaps you can weigh in on this comment from a scotch developer regarding the gnu issue:

"To my understanding, this is not an issue of gnu or intel compiler versions per se, but of which C standard version they are referring to when processing the source code files. Indeed, PRIu64 and its likes were inserted in C99 : https://en.cppreference.com/w/c/types/integer. Hence, in order to compile Scotch without errors, one has to make sure the compiler accepts the C99 standard,e.g., by using the "-std=c99" flag in gcc. Include files should already have the proper information, but what matters is to provide the adequate #define's to make this information accessible to the compilers."

i loaded up the stack-intel on hera and did gcc -std=c99 -dM -E - < /dev/null | grep __STDC_VERSION__ which shows:
#define __STDC_VERSION__ 199901L, so it seems like the C99 is accepted by the gnu/9.2 that is included in the stack-intel compiler modulefile (/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.3.0/envs/unified-env/install/modulefiles/Core/stack-intel/2021.5.0.lua).

based on this, i'm not sure if this it is indeed a gnu issue causing the ww3 init failures on hera right now (cc @MatthewMasarik-NOAA )

@AlexanderRichert-NOAA
Copy link
Collaborator

My understanding is that the WW3 issues have to do with runtime issue related to the graph infrastructure, as opposed to SCOTCH buildtime issues. @MatthewMasarik-NOAA does that sound right. @ulmononian have you tested building SCOTCH with Intel? As far as I know all the intel versions we care about have c99 support so I don't think that should be an issue.

@MatthewMasarik-NOAA
Copy link

@ulmononian @AlexanderRichert-NOAA, Alex is correct that the issues are seen during runtime, specifically during the model initialization where SCOTCH is called and the graph partitioning occurs. Since we had a work-around that did the trick (loading gnu at the end of the intel loads), compiling the C portions with -std=99 has not been investigated.

@ulmononian
Copy link
Collaborator Author

My understanding is that the WW3 issues have to do with runtime issue related to the graph infrastructure, as opposed to SCOTCH buildtime issues. @MatthewMasarik-NOAA does that sound right. @ulmononian have you tested building SCOTCH with Intel? As far as I know all the intel versions we care about have c99 support so I don't think that should be an issue.

so the version i am building for @MatthewMasarik-NOAA to test is w/ intel. i was asking about the gnu c99 comment only in regard to the gnu 9.2 that is added to the intel modulefile paths. i wanted to make sure it had the appropriate settings. everything compiles "successfully", so that is not an issue. but the runtime ww3 init failure suggest to me something is poorly configured for the compilation. or the mpi init step in the model needs to be changed perhaps.

@MatthewMasarik-NOAA
Copy link

@MatthewMasarik-NOAA i've added two variants to the build recipe (to build w/ THREADS and MPI_THREAD_MULTIPLE set to OFF). i know you mentioned this was probably not required (so probably not the issue) on hera, but maybe we will get lucky. do you mind testing with this again (scotch has been updated but the module is just scotch/7.0.3): /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.3.0/envs/unified-env?


@ulmononian, I haven't been successful getting all the modules loaded when I'm using this latest path.

Here's what I'm doing on hera:

module purge
module load cmake/3.20.1
module load intel/2022.1.2
module load impi/2022.1.2

then

module use /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.3.0/envs/unified-env

at this point when I do a module avail I get a pretty long list, with some truncated output shown here

$ module avail
   .
   .
   intel/2021.5.0/json/3.10.5
   intel/2021.5.0/krb5/1.15.1
   intel/2021.5.0/landsfcutil/2.4.1
   intel/2021.5.0/libbsd/0.11.5
   intel/2021.5.0/libidn2/2.3.0
   intel/2021.5.0/libjpeg/2.1.0
   intel/2021.5.0/libmd/1.0.4
   intel/2021.5.0/libpng/1.6.37
   .
   .

The next package we want to load for WW3 is libpng/1.6.37, the last line in the code snippet above shows this. I tried loading it the two sensible ways, shown here

[Matthew.Masarik@hfe04 regtests(develop)]$ module load intel/2021.5.0/libpng/1.6.37
Lmod has detected the following error:  These module(s) or extension(s) exist but cannot be loaded as requested: "zlib/1.2.13"
   Try: "module spider zlib/1.2.13" to see how to load the module(s).

and

[Matthew.Masarik@hfe04 regtests(develop)]$ module load libpng/1.6.37
Lmod has detected the following error:  These module(s) or extension(s) exist but cannot be loaded as requested: "libpng/1.6.37"
   Try: "module spider libpng/1.6.37" to see how to load the module(s).

I also tried being more specific with the path specified in the module use statement. Including

module use /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.3.0/envs/unified-env/install/modulefiles/Core/stack-intel

though for any of these attempts I also go the same output when i ran a 'module avail'.

Can you see what might be going wrong here?

@climbfuji
Copy link
Collaborator

Please follow the instructions in https://spack-stack.readthedocs.io/en/latest/PreConfiguredSites.html#noaa-rdhpcs-hera for loading the modules - let me know if this works.

@MatthewMasarik-NOAA
Copy link

Please follow the instructions in https://spack-stack.readthedocs.io/en/latest/PreConfiguredSites.html#noaa-rdhpcs-hera for loading the modules - let me know if this works.

ah! right. Thank you @climbfuji

@ulmononian
Copy link
Collaborator Author

ulmononian commented May 17, 2023

@MatthewMasarik-NOAA

please do this first (for intel):

module purge
module use /scratch1/NCEPDEV/jcsda/jedipara/spack-stack/modulefiles
module load miniconda/3.9.12
module load ecflow/5.5.3
module load mysql/8.0.31

module use /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.3.0/envs/unified-env/install/modulefiles/Core
module load stack-intel/2021.5.0
module load stack-intel-oneapi-mpi/2021.5.1
module load stack-python/3.9.12

then, based on the modules you were loading in your comment here, please do:

  module load libpng/1.6.37
  module load zlib/1.2.13
  module load jasper/2.0.32
  module load hdf5/1.14.0
  module load netcdf-c/4.9.2
  module load netcdf-fortran/4.6.0
  module load bacio/2.4.1
  module load g2/3.4.5
  module load w3emc/2.9.2
  module load esmf/8.3.0b09
  module load scotch/7.0.3

some of these versions differ from what you were loading, but they are the versions provided in spack-stack/1.3.0. obviously you do not need to load all of these if your module env has changed. please let me know if this works for you to at least get to the point where you can compile/start to run ww3.

@MatthewMasarik-NOAA
Copy link

@MatthewMasarik-NOAA

please do this first (for intel):

module purge
module use /scratch1/NCEPDEV/jcsda/jedipara/spack-stack/modulefiles
module load miniconda/3.9.12
module load ecflow/5.5.3
module load mysql/8.0.31

module use /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.3.0/envs/unified-env/install/modulefiles/Core
module load stack-intel/2021.5.0
module load stack-intel-oneapi-mpi/2021.5.1
module load stack-python/3.9.12

then, based on the modules you were loading in your comment here, please do:

  module load libpng/1.6.37
  module load zlib/1.2.13
  module load jasper/2.0.32
  module load hdf5/1.14.0
  module load netcdf-c/4.9.2
  module load netcdf-fortran/4.6.0
  module load bacio/2.4.1
  module load g2/3.4.5
  module load w3emc/2.9.2
  module load esmf/8.3.0b09
  module load scotch/7.0.3

some of these versions differ from what you were loading, but they are the versions provided in spack-stack/1.3.0. obviously you do not need to load all of these if your module env has changed. please let me know if this works for you to at least get to the point where you can compile/start to run ww3.

Perfect. Thank you, @ulmononian!!

@ulmononian
Copy link
Collaborator Author

@MatthewMasarik-NOAA any luck with this new installation by chance?

@MatthewMasarik-NOAA
Copy link

@MatthewMasarik-NOAA any luck with this new installation by chance?

@ulmononian, I got pulled away from my attempts to test this yesterday. I just got it set up, and the tests have been going for 20mins now. I'll know the outcome ~2hrs. I'll report back shortly after

@MatthewMasarik-NOAA
Copy link

@ulmononian, good news, all the tests went great! I loaded the modules you listed and WW3 built and ran fine (without even needing to set the SCOTCH_PATH env parameter). Thank you for re-doing the build!

@climbfuji
Copy link
Collaborator

Great! Thanks so much for this update. I'll work with @ulmononian to get this and other PRs merged this afternoon

@MatthewMasarik-NOAA
Copy link

Great! Thanks so much for this update. I'll work with @ulmononian to get this and other PRs merged this afternoon

Sure thing. Thank you!

@ulmononian
Copy link
Collaborator Author

@ulmononian, good news, all the tests went great! I loaded the modules you listed and WW3 built and ran fine (without even needing to set the SCOTCH_PATH env parameter). Thank you for re-doing the build!

awesome! very glad this finally worked. i assume we should just go with the threading variants i added, so i will update that over at JCSDA/spack#260. thanks for testing @MatthewMasarik-NOAA.

@MatthewMasarik-NOAA
Copy link

MatthewMasarik-NOAA commented May 18, 2023

awesome! very glad this finally worked. i assume we should just go with the threading variants i added, so i will update that over at JCSDA/spack#260. thanks for testing @MatthewMasarik-NOAA.

Yes, I agree we should go with the threading variants you added. thanks @ulmononian

Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scotch was tested successfully on Hera

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
INFRA JEDI Infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add SCOTCH: package for mesh/graph partitioning
5 participants