Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Fix issue 555 (restore default behavior of EXPT_BASEDIR option) #562

Merged
merged 3 commits into from
Jan 31, 2023

Conversation

mkavulich
Copy link
Collaborator

DESCRIPTION OF CHANGES:

This PR restores the previous behavior of the variable EXPT_BASEDIR (Item 2 is the behavior that was previously broken), which has the following effect on the experiment directory EXPTDIR:

  1. If EXPT_BASEDIR is not set or set to a null value, the default value (${HOMEdir}/../expt_dirs) will be used
  2. If EXPT_BASEDIR is set to a relative path (i.e. the first character is not /), the user-specified path will be appended to the default value ${HOMEdir}/../expt_dirs (for example if the user specifies EXPT_BASEDIR=some/relative/path in their config.yaml, it will be updated to EXPT_BASEDIR=${HOMEdir}/../expt_dirs/some/relative/path in the workflow
  3. If EXPT_BASEDIR is set to an absolute path, that path will be used as entered

After the above logic is applied, EXPTDIR will be created by joining the paths EXPT_BASEDIR and EXPT_SUBDIR as usual.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • This change requires a documentation update

TESTS CONDUCTED:

Ran fundamental test suite on Hera, confirmed that the old, expected behavior for the expt_basedir argument has been restored. Ran python unit tests, only UFS_plot_domains.py failed which is a known pre-existing error.

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

DEPENDENCIES:

None

DOCUMENTATION:

While this behavior had been documented in comments and usage doc string for run_WE2E_tests.sh, it had not been included in configure_defaults.yaml or the Users Guide; I updated the documentation in those locations.

ISSUE:

Fixes issue mentioned in #555

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • New and existing tests pass with my changes

@venitahagerty
Copy link
Collaborator

Machine: jet
Compiler: intel
Job: WE
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/1217442740/20230126052012/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 10 experiments
If test failed, please make changes and add the following label back:
ci-jet-intel-WE

@MichaelLueken MichaelLueken changed the title Feature/issue 555 [develop] Feature/issue 555 Jan 26, 2023
@mkavulich mkavulich changed the title [develop] Feature/issue 555 [develop] Fix issue 555 (restore default behavior of EXPT_BASEDIR option) Jan 26, 2023
Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich These changes look good to me! Approving now.

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Jan 26, 2023
Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich All of the WE2E tests are failing in Jenkins. Attempting to manually run the WE2E tests on Jet was encountering issues as well. I'm seeing:

GLOBAL_VAR_DEFNS_FP = 'var_defns.sh'

rather than:

GLOBAL_VAR_DEFNS_FP = '/mnt/lfs4/HFIP/hfv3gfs/Michael.Lueken/exp_dirs/grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2/var_defns.sh'

while generating the experiment.

The following files are present in ush/, but not in the expt_dirs - ush/data_table, ush/fd_nems.yaml, ush/field_table, ush/fix_am, ush/input.nml, ush/nems.configure, ush/suite_FV3_GFS_v15p2.xml, and ush/var_defns.sh. Thus, I'm only seeing FV3LAM_wflow.xml, config.yaml, crontab.bak.2023-01-27_18:46:39, launch_FV3LAM_wflow.sh, log.generate_FV3LAM_wflow, and log.launch_FV3LAM_wflow in the experiment directory.

Can you think of what might be happening to keep var_defns.sh from being brought in?

@mkavulich mkavulich added the DO_NOT_MERGE Ensure that a PR isn't merged label Jan 27, 2023
@mkavulich
Copy link
Collaborator Author

@MichaelLueken I'm not sure what happened here, I must have run my tests on the incorrect code, because I am also seeing these same failures now. Clearly the code doesn't work as-is, I'll need to investigate what's going on.

@mkavulich mkavulich added ci-hera-intel-WE Kicks off automated workflow test on hera with intel ci-jet-intel-WE Kicks off automated workflow test on jet with intel and removed DO_NOT_MERGE Ensure that a PR isn't merged labels Jan 30, 2023
@venitahagerty venitahagerty removed ci-jet-intel-WE Kicks off automated workflow test on jet with intel ci-hera-intel-WE Kicks off automated workflow test on hera with intel labels Jan 30, 2023
@venitahagerty
Copy link
Collaborator

Machine: hera
Compiler: intel
Job: WE
Repo location: /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app

  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/chgres_cube does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/emcsfc_ice_blend does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/emcsfc_snow2mdl does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/filter_topo does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/fregrid does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/fvcom_to_FV3 does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/global_cycle does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/global_equiv_resol does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/inland does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/lakefrac does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/make_hgrid does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/make_solo_mosaic does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/upp.x does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/orog does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/orog_gsl does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/regional_esg_grid does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/sfc_climo_gen does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/shave does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/ufs_model does NOT exist'
  • echo 'FAIL: intel executable file /scratch1/BMC/zrtrr/rrfs_ci/autoci/pr/1217442740/20230130175017/ufs-srweather-app/tests/../install_intel/exec/vcoord_gen does NOT exist'
  • echo 'BUILD(S) FAILED'
  • msg=FAIL
  • echo FAIL
    Build failed
    If test failed, please make changes and add the following label back:
    ci-hera-intel-WE

@venitahagerty
Copy link
Collaborator

venitahagerty commented Jan 30, 2023

Machine: jet
Compiler: intel
Job: WE
Repo location: /lfs1/BMC/nrtrr/rrfs_ci/autoci/pr/1217442740/20230130175016/ufs-srweather-app
Build was Successful
Rocoto jobs started
Long term tracking will be done on 10 experiments
If test failed, please make changes and add the following label back:
ci-jet-intel-WE
Experiment Succeeded on jet: custom_ESGgrid
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2
Experiment Succeeded on jet: custom_GFDLgrid
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
Experiment Succeeded on jet: nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
Experiment Succeeded on jet: specify_DOT_OR_USCORE
Experiment Succeeded on jet: grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Experiment Succeeded on jet: specify_RESTART_INTERVAL
All experiments completed

@mkavulich
Copy link
Collaborator Author

@MichaelLueken I have applied the correct fix for this issue. I discussed with @venitahagerty offline and it looks like the Hera test is failing due to running out of disk space in the test location; I ran the Hera fundamental tests successfully on my own.

@MichaelLueken
Copy link
Collaborator

@mkavulich resubmission of the Jenkins tests showed that they all passed, with the exception of Orion. The community_ensemble_008mems test failed due to one of the post jobs exceeding the walltime. I've resubmitted the Orion tests one more time. Once it passes, I will approve and merge these changes.

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich Following the modification you pushed on Friday (and a resubmission of the Orion tests this morning), all Jenkins tests are now passing successfully. Reapproving these changes.

@MichaelLueken MichaelLueken merged commit 25418e8 into ufs-community:develop Jan 31, 2023
gspetro-NOAA pushed a commit to gspetro-NOAA/ufs-srweather-app that referenced this pull request Feb 9, 2023
@mkavulich mkavulich deleted the feature/issue_555 branch July 19, 2023 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

generate_FV3LAM_workflow.py no longer deals with relative paths correctly for "EXPT_BASEDIR"
4 participants