Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update lmod, miniconda, task modulefiles for Gaea #353

Merged
merged 12 commits into from
Sep 28, 2022
Merged

Update lmod, miniconda, task modulefiles for Gaea #353

merged 12 commits into from
Sep 28, 2022

Conversation

EdwardSnyder-NOAA
Copy link
Collaborator

@EdwardSnyder-NOAA EdwardSnyder-NOAA commented Sep 16, 2022

DESCRIPTION OF CHANGES:

Updated lmod, hpc-stack, miniconda, and task modulefiles so that we can run the WE2E tests on Gaea. Two PRs were created that address these issues in more detail for the release/public-v2 branch of the SRW App (SRW: #352 and Regional Workflow: #830). The quick tests and MET_verification tests will be conducted on Gaea via the Jenkins pipeline.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16, grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta, grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta, grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_HRRR
  • comprehensive tests (specify which if a subset was used)
  • MET_ensemble_verification and MET_verification

DEPENDENCIES:

DOCUMENTATION:

ISSUE:

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • Noticed a pmi/alps error in the make_orog and make_grid log files, however the tasks still complete.
Tue Sep 27 17:15:43 2022: [unset]:_pmi_alps_init:alps_get_placement_info returned with error -1
Tue Sep 27 17:15:43 2022: [unset]:_pmi_init:_pmi_alps_init returned -1
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Cheyenne test
  • Needs Jet test
  • Needs Hera test
  • Needs Orion test
  • help wanted

CONTRIBUTORS (optional):

@natalie-perlin for her help with the configuration and installations of: hpc-stack, lmod, and miniconda3 on Gaea, as well as testing and troubleshooting the MET_verification WE2E test.

@EdwardSnyder-NOAA EdwardSnyder-NOAA changed the title Feature/update gaea lmod modulefile Update lmod, miniconda, task modulefiles for Gaea Sep 16, 2022
@danielabdi-noaa
Copy link
Collaborator

@EdwardSnyder-NOAA I got a miniconda3/4.12.0 not found error when trying to build srw app on gaea with this PR, so I suppose there are still issues to resolve ?

Copy link
Collaborator

@natalie-perlin natalie-perlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May or may not need changes in load_modules_run_task.sh

ush/load_modules_run_task.sh Show resolved Hide resolved
@EdwardSnyder-NOAA
Copy link
Collaborator Author

@danielabdi-noaa that was a typo I fixed with yesterday's commit. Currently, this PR will crash in the middle of building the SRW App. It is an error building the UPP. I haven't had time to look into this error yet.

danielabdi-noaa added a commit to danielabdi-noaa/ufs-srweather-app that referenced this pull request Sep 22, 2022
danielabdi-noaa added a commit to danielabdi-noaa/ufs-srweather-app that referenced this pull request Sep 22, 2022
danielabdi-noaa added a commit to danielabdi-noaa/ufs-srweather-app that referenced this pull request Sep 23, 2022
danielabdi-noaa added a commit to danielabdi-noaa/ufs-srweather-app that referenced this pull request Sep 25, 2022
danielabdi-noaa added a commit that referenced this pull request Sep 25, 2022
* Add preamble script from global workflow.

* Call preamble script in j-jobs and ex-scripts

* Call preamble in other scripts.

* Make names of j-jobs and ex-scripts consistent.

* Working towards nco vars in table 1.

* Change default bin directory to exec

* Appen FATAL ERROR to print_err_msg_exit.

* Replace some cp, cd, mkdir calls with their corresponding _vrfy versions

* Add job and jobid to the job-card.

* Add cyc and subcyc to rocoto xml

* Add a j-job preamble script for setpdy.

* Add a j-job postamble as well.

* Define some Table 1 vars in setup.

* Remove unused SRC_DIR, and rename others

* Rename CYCLE_BASEDIR to COMIN_BASEDIR

* Create the NCO root directories in setup.

* Remove source machine file wrapper.

* Bug fix in job_preamble.

* Make make_ics/lbcs use DATA directory properly.

* Make run_fcst use DATA directory properly.

* Made run_post use DATA directory properly.

* Make make_grid use DATA properly (untested).

* Make make_sfc_climo use DATA properly (untested).

* Make make_orog use DATA properly (untested).

* Bug fix for none-nco mode.

* Don't pass arguments from j-jobs to ex-scripts.

* Make forecast and post-output go to COMOUT.

* Remove CYCLE_DIR and use COMIN instead.

* Bug fix for community mode.

* Append cyc to COMIN in NCO mode.

* Fix rocoto run_post dependency with run_fcst issue.

* Use OPSROOT instead of PTMP and STMP.

* Move nco vars in config_defaults.

* Move logdir location to COMROOT.

* Set all root directories to EXPTDIR in community mode.

* Use pgmout and pgmerr.

* Fix inline post.

* Make pgmout/err redirection work with community mode.

* Use print_err in get_obs_mrms.

* Add prep_step.

* Add post_step.

* Add dbn_alert to post-processed grib2 output.

* Download extrn files directly to COMIN.

* Make make_ics/lbcs directly output to COMIN.

* Change names of extrn_mdl_var_defns files.

* Name fixes for DO_ENSEMBLE=false, dyn/phy

* Don't create symlinks to grib2 files in NCO mode.

* Append rrfs to make_ics/lbcs output.

* Modify extrn_mdl_var_defns names.

* Move forecast output to DATA/RUN.PDY. This location
can be used to store output of other tasks as well.

* Move templates to parm.

* Fix for new parm location.

* Move metplus one level up.

* Fixes for community mode.

* Rename SCRIPTSDIR and JOBSDIR.

* Move all FIX** directories in to a fix/ directory.

* Make FIXrrfs be EXPTDIR for community mode.

* Symlink upp and ufs_utils parm files to top level parm directory.

* Remove UPP_DIR and UFS_UTILS_DIR.

* Define cycle with subcyc when it is non-zero.

* Don't delete COMIN_BASEDIR if it already exists.

* Disassociate NCO mode from pre-generated grid.

* Don't choose fix location based on RUN_ENVIR.

* Bug fix in make_lbcs.

* Add flag to symlink or copy fix files.

* Change slurm log file locations

* Minor fix for inline post in nco mode.

* Add unique workflow ID to avoid clashes between different runs, while
keeping the relation between different tasks, which PID can not do.

* Make verification tasks NCO complaint.

* Pass RUN_ENVIR to we2e script.

* Fixes for merge conflicts.

* Add versions for wcoss2.

* Fix symlinks.

* Minor changes.

* Move grid/orog/sfcc completion files to EXPTDIR/grid/orog etc.

* Output modified namelist file with seeds in current directory.

* Fixes for unittests.

* Bugfix wrf_io version

* Fix CI issue with bin locations.

* Allow NCO root directories to be set individually.

* Don't append workflow id in community mode.

* Add helper script to rename model e.g. rrfs->aqm

* Bug fixes and naming changes for consitency.

* Replace instances of USHrrfs etc with a generic USHdir etc.

* Add unittest for whole workflow now that the merge made it possible.

* Remove unused process_args utility.

* Remove hard coded paths from configs.

* Don't replace existing var value with None.

* Add config.nco to unittest.

* Fix for Orion issue.

* Fix default OPSROOT location in run_we2e.

* Modeify setup_we2e script to run fundamental tests on all machines.

* Fix conflicting ics/lbcs temp location by moving to DATA.

* Bug fix in load_modules taken from PR #353.

* Specify default shell instead of symlinking.

* Turn off grid/orog/sfc_climo tasks for NCO test cases.

* Use PDY and cyc in ex-scripts.

* Remove CDATE from xml and define int job_preamble.

* Use machine specific list of tests if available.

* Run all tests in community mode so that the last NCO test case
gets reported as finished.

* Minor changes

* Avoid using preamble in functions.

* Use preamble in function too.

* Turn on debugging for utility functions.

* Turn on debug & verbose in CI.

* Turn off set -e for init_env
The list of modules to be loaded needs updates.
@natalie-perlin
Copy link
Collaborator

@danielabdi-noaa - the module build_gaea_intel needed updates, should be working now!

@danielabdi-noaa
Copy link
Collaborator

Thanks for fixing the problem. I am able to build the PR successfully on Gaea now, so approving.

@natalie-perlin
Copy link
Collaborator

Waiting for the PR-830 to get approved in a regional_workflow repo, before merging these changes into the develop branch.

natalie-perlin added a commit to natalie-perlin/regional_workflow that referenced this pull request Sep 27, 2022
Updated following recent tests of Met verification, as in ufs-srweather-app repository PR-353: ufs-community/ufs-srweather-app#353
natalie-perlin added a commit to ufs-community/regional_workflow that referenced this pull request Sep 28, 2022
* Lmod/8.7.12 init, updated miniconda3 - for Gaea

Added new (Lmod/8.7.12) initialization wrapper script to the ENV_INIT_SCRIPTS_FPS variable; 
added the PROJ_LIB and PATH variables referring to a PROJ package location in the updated miniconda3/4.12.0 and the regional_workflow environment that contain the PROJ package

* update a new location of miniconda3/4.12.0, rocoto

Specify a new location of the miniconda3/4.12.0 with the regional_workflow environment containing all the necessary packages, and the rocoto/1.3.3 module installed on Gaea under EPIC role account: /lustre/f2/dev/wpo/role.epic/contrib/

* update a new location of miniconda3/4.12.0, rocoto

Specify a new location of the miniconda3/4.12.0 with the regional_workflow environment containing all the necessary packages, and the rocoto/1.3.3 module installed on Gaea under EPIC role account: /lustre/f2/dev/wpo/role.epic/contrib/.

* update a new location of miniconda3/4.12.0, rocoto

Specify a new location of the miniconda3/4.12.0 with the regional_workflow environment containing all the necessary packages, and the rocoto/1.3.3 module installed on Gaea under EPIC role account: /lustre/f2/dev/wpo/role.epic/contrib/

* update a new location of miniconda3/4.12.0, rocoto

Specify a new location of the miniconda3/4.12.0 with the regional_workflow environment containing all the necessary packages, and the rocoto/1.3.3 module installed on Gaea under EPIC role account: /lustre/f2/dev/wpo/role.epic/contrib/

* update a new location of miniconda3/4.12.0, rocoto

Specify a new location of the miniconda3/4.12.0 with the regional_workflow environment containing all the necessary packages, and the rocoto/1.3.3 module installed on Gaea under EPIC role account: /lustre/f2/dev/wpo/role.epic/contrib/

* update a new location of miniconda3/4.12.0, rocoto

Specify a new location of the miniconda3/4.12.0 with the regional_workflow environment containing all the necessary packages, and the rocoto/1.3.3 module installed on Gaea under EPIC role account: /lustre/f2/dev/wpo/role.epic/contrib/

* update a new location of miniconda3/4.12.0, rocoto

Specify a new location of the miniconda3/4.12.0 with the regional_workflow environment containing all the necessary packages, and the rocoto/1.3.3 module installed on Gaea under EPIC role account: /lustre/f2/dev/wpo/role.epic/contrib/

* update a new location of miniconda3/4.12.0, rocoto

Specify a new location of the miniconda3/4.12.0 with the regional_workflow environment containing all the necessary packages, and the rocoto/1.3.3 module installed on Gaea under EPIC role account: /lustre/f2/dev/wpo/role.epic/contrib/

* update a new location of miniconda3/4.12.0, rocoto

Specify a new location of the miniconda3/4.12.0 with the regional_workflow environment containing all the necessary packages, and the rocoto/1.3.3 module installed on Gaea under EPIC role account: /lustre/f2/dev/wpo/role.epic/contrib/

* update run_vx.local 

Updates to the MetPlus verification script; not yet officially supported in the release of public-v2. These changes and test of the MetPlus script were done by @EdwardSnyder-NOAA

* added missing argument for the ./etc/lmod-setup.sh script

A bug found by @EdwardSnyder-NOAA; a separate PR to be created into the develop branch

* Gaea: Lmod/8.7.12 initialization using a wrapper script, under role.epic account

* Gaea: initialize Lmod/8.7.12 using a wrapper script

* update a new location of miniconda3/4.12.0, rocoto

Specify a new location of the miniconda3/4.12.0 with the regional_workflow environment containing all the necessary packages, and the rocoto/1.3.3 module installed on Gaea under EPIC role account: /lustre/f2/dev/wpo/role.epic/contrib/

* Delete get_extrn_lbcs.local

A redundant module; it is placed under ./regional_workflow/modulefiles/tasks/gaea/ instead.

* Update load_modules_run_task.sh

* Update run_vx.local

Updated following recent tests of Met verification, as in ufs-srweather-app repository PR-353: ufs-community/ufs-srweather-app#353

* Update gaea.sh

Updated Met Installation locations on Gaea
@EdwardSnyder-NOAA EdwardSnyder-NOAA merged commit ebe966c into ufs-community:develop Sep 28, 2022
@EdwardSnyder-NOAA EdwardSnyder-NOAA deleted the feature/update-gaea-lmod-modulefile branch September 28, 2022 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants