Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] First implementation of run_WE2E_tests.py #558

Conversation

mkavulich
Copy link
Collaborator

@mkavulich mkavulich commented Jan 24, 2023

DESCRIPTION OF CHANGES:

This PR introduces two new scripts to the repository: run_WE2E_tests.py and monitor_jobs.py. The purpose of these scripts is to eventually provide a pythonic replacement for the current workflow end-to-end test submission script. Additionally, the monitor_jobs function gives the capability to monitor and submit jobs automatically via the command line or a batch job, rather than relying on crontab entries.

run_WE2E_tests.py

This new script is roughly analogous to the legacy bash version, run_WE2E_tests.sh. This script will set up and run a set of workflow end-to-end tests as specified by the user.

This script is similar in behavior to the original script, but introduces several improvements and/or simplifications:

  • The argument list is made simpler and more pythonic (which will solve problems listed in Workflow end-to-end script should follow standard syntax for arguments, documentation needs updating #369)
  • The three various test specification arguments (tests_file, test_type, and test_name) are collapsed into a single argument, --tests.
  • The large bit of logic assessing available tests, linking test names, IDs, and subdirectories is omitted; replaced with simple checks based on file names.
  • "Verbose" and "Debug" options as applied to individual tests are replaced with "verbose_tests" and "debug_tests" respectively, and a "--debug" flag for script verbosity is added.
  • A new "--quiet" flag is introduced, which prevents output (except for warnings and errors) from generate_FV3LAM_workflow printing to screen (they will still be printed to the log file, log.run_WE2E_tests). This can prevent important messages from being drowned out by the flood of information printed to screen.
  • Because we are not intermingling all variables from test config, defaults, and machine defaults together, we can better control the resulting test files that are submitted to the generate function
  • Omits checks that are redundant to checks in generate_FV3LAM_workflow.py (for example, ensuring the forecast length is evenly divisible by the LBC interval)
  • Unless use_cron_to_relaunch is set to true, the experiment data will be automatically fed into the monitor_jobs() function, which will launch and track experiments until all are complete. If the script is interrupted for some reason, it can be re-run and experiments continued (see below)

Example usage:

  • Running a list of tests in a file named "testlist" on Hera
    ./run_WE2E_tests.py -m=hera -a=fv3lam --tests=testlist
  • Running the single test "nco_inline_post" on orion, while not printing output from generate_FV3LAM_workflow to screen
    ./run_WE2E_tests.py -t=nco_inline_post -m=orion -a=gsd-fv3-test -q
  • Running the "comprehensive" set of tests on Jet, in debug mode (lots more output print to screen)
    ./run_WE2E_tests.py -m=jet -a=gsd-fv3-dev --tests=comprehensive -d
  • Running the "fundamental" set of tests on Cheyenne, with the gnu compiler, with all experiment directories under the directory "/glade/scratch/kavulich"
    ./run_WE2E_tests.py -t=fundamental -c=gnu -m=cheyenne -a=P48500053 --expt_basedir=/glade/scratch/kavulich

Example output:

The --help flag gives some usage information:

$ ./run_WE2E_tests.py -h
usage: run_WE2E_tests.py [-h] -m MACHINE -a ACCOUNT -t [TESTS ...] [-c COMPILER] [-d] [-q] [--modulefile MODULEFILE]
                         [--run_envir RUN_ENVIR] [--expt_basedir EXPT_BASEDIR] [--exec_subdir EXEC_SUBDIR]
                         [--use_cron_to_relaunch] [--cron_relaunch_intvl_mnts CRON_RELAUNCH_INTVL_MNTS] [--debug_tests]
                         [--verbose_tests]

required arguments:
  -m MACHINE, --machine MACHINE
                        Machine name; see ush/machine/ for valid values
  -a ACCOUNT, --account ACCOUNT
                        Account name for running submitted jobs
  -t [TESTS ...], --tests [TESTS ...]
                        Can be one of three options (in order of priority): 1. A test name or list of test names. 2. A
                        test suite name ("fundamental", "comprehensive", or "all") 3. The name of a file (full or relative
                        path) containing a list of test names.

optional arguments:
  -h, --help            show this help message and exit
  -c COMPILER, --compiler COMPILER
                        Compiler used for building the app
  -d, --debug           Script will be run in debug mode with more verbose output
  -q, --quiet           Suppress console output from workflow generation; this will helpkeep the screen uncluttered
  --modulefile MODULEFILE
                        Modulefile used for building the app
  --run_envir RUN_ENVIR
                        Overrides RUN_ENVIR variable to a new value ( "nco" or "community" ) for all experiments
  --expt_basedir EXPT_BASEDIR
                        Explicitly set EXPT_BASEDIR for all experiments
  --exec_subdir EXEC_SUBDIR
                        Explicitly set EXEC_SUBDIR for all experiments
  --use_cron_to_relaunch
                        Explicitly set USE_CRON_TO_RELAUNCH for all experiments; this option disables the "monitor" script
                        functionality
  --cron_relaunch_intvl_mnts CRON_RELAUNCH_INTVL_MNTS
                        Overrides CRON_RELAUNCH_INTVL_MNTS for all experiments
  --debug_tests         Explicitly set DEBUG=TRUE for all experiments
  --verbose_tests       Explicitly set VERBOSE=TRUE for all experiments

For more information about config arguments (denoted in CAPS), see ush/config_defaults.yaml

For a real example of its usage, here I use the -q flag to suppress output from generate_FV3LAM_workflow(), showing just the output being made by this script:

./run_WE2E_tests.py -t=fundamental -m=orion -a=gsd-fv3-test -q 
Checking that all tests are valid
Will run 5 tests:
/work/noaa/gsd-fv3-test/kavulich/UFS/issue_462/ufs-srweather-app/tests/WE2E/test_configs/grids_extrn_mdls_suites_community/config.grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta.yaml
/work/noaa/gsd-fv3-test/kavulich/UFS/issue_462/ufs-srweather-app/tests/WE2E/test_configs/grids_extrn_mdls_suites_community/config.grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta.yaml
/work/noaa/gsd-fv3-test/kavulich/UFS/issue_462/ufs-srweather-app/tests/WE2E/test_configs/wflow_features/config.community_ensemble_008mems.yaml
/work/noaa/gsd-fv3-test/kavulich/UFS/issue_462/ufs-srweather-app/tests/WE2E/test_configs/wflow_features/config.deactivate_tasks.yaml
/work/noaa/gsd-fv3-test/kavulich/UFS/issue_462/ufs-srweather-app/tests/WE2E/test_configs/wflow_features/config.inline_post.yaml

Inline post is turned on, deactivating post-processing tasks:
RUN_TASK_RUN_POST = False

calling function that monitors jobs, prints summary
Writing information for all experiments to monitor_jobs_20230124154853.yaml
Checking tests available for monitoring...
Starting experiment grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta running
Starting experiment grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta running
Starting experiment community_ensemble_008mems running
Starting experiment deactivate_tasks running
Starting experiment inline_post running
Setup complete; monitoring 5 experiments
Experiment deactivate_tasks is COMPLETE; will no longer monitor.
Experiment grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta is COMPLETE; will no longer monitor.
Experiment grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta is COMPLETE; will no longer monitor.
Experiment inline_post is COMPLETE; will no longer monitor.
Experiment community_ensemble_008mems is COMPLETE; will no longer monitor.
All 5 experiments finished in 0:46:43.495644
All experiments are complete
Summary of results available in monitor_jobs_20230124154853.yaml

Note that the message about Inline post is a "warning"-level message from generate_FV3LAM_workflow, and so is still printed to screen.

monitor_jobs.py

This new script, designed to be called automatically by run_WE2E_tests.py or run stand-alone, will read a dictionary (either provided directly to the function or read from a YAML file) that specifies the location of a number of experiments that need to be monitored. The main function, monitor_jobs(), will keep track of these experiments, advance the workflow with calls to rocotorun in each experiment directory, and monitor successes and errors as they occur, reporting a summary at the end.

In addition, while these jobs are being monitored and run, a YAML file tracking the status of all jobs will be written to disk. This file can be read directly to see the details of how each job is coming along, but most importantly, this job file can be read back into monitor_jobs.py as a command line argument. Therefore if the ./run_WE2E_tests.py script fails or is quit at any point after the experiments have been generated, the script can be re-started and continue to monitor jobs where it left off.

Example usage:

In this case, I ran ./run_WE2E_tests.py but killed it before all experiments completed. Here I just look in my test directory for the latest "monitor_jobs" yaml file, and feed that back to the script:

$ ls -l
total 488
-rw-r--r-- 1 Michael.Kavulich fv3lam   5827 Jan 17 15:59 create_WE2E_resource_summary.py
-rwxr-xr-x 1 Michael.Kavulich fv3lam  15347 Jan 17 15:59 get_expts_status.sh
-rwxr-xr-x 1 Michael.Kavulich fv3lam  64794 Jan 17 15:59 get_WE2Etest_names_subdirs_descs.sh
-rw-r--r-- 1 Michael.Kavulich fv3lam  12423 Jan 23 21:33 log.monitor_jobs
-rw-r--r-- 1 Michael.Kavulich fv3lam 182013 Jan 24 21:55 log.run_WE2E_tests
drwxr-sr-x 2 Michael.Kavulich fv3lam   4096 Jan 23 22:19 machine_suites
-rw-r--r-- 1 Michael.Kavulich fv3lam    915 Jan 24 21:55 monitor_jobs_20230124215513.yaml
-rwxr-xr-x 1 Michael.Kavulich fv3lam  12508 Jan 24 00:26 monitor_jobs.py
drwxr-sr-x 2 Michael.Kavulich fv3lam   4096 Jan 24 18:47 __pycache__
drwxr-sr-x 9 Michael.Kavulich fv3lam   4096 Jan 20 04:51 rocoto
-rwxr-xr-x 1 Michael.Kavulich fv3lam  22882 Jan 24 21:50 run_WE2E_tests.py
-rwxr-xr-x 1 Michael.Kavulich fv3lam  50269 Jan 17 16:00 run_WE2E_tests.sh
-rwxr-xr-x 1 Michael.Kavulich fv3lam   2844 Jan 17 15:59 setup_WE2E_tests.sh
drwxr-sr-x 6 Michael.Kavulich fv3lam   4096 Jan 23 16:58 test_configs
$ cat monitor_jobs_20230124222802.yaml 
### WARNING ###
### THIS FILE IS AUTO_GENERATED AND REGULARLY OVER-WRITTEN BY monitor_jobs.py
### EDITS MAY RESULT IN MISBEHAVIOR OF EXPERIMENTS RUNNING
deactivate_tasks:
  expt_dir: /scratch2/BMC/fv3lam/kavulich/UFS/workdir/issue_462_new_WE2E_script/expt_dirs/deactivate_tasks
  status: CREATED
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR:
  expt_dir: /scratch2/BMC/fv3lam/kavulich/UFS/workdir/issue_462_new_WE2E_script/expt_dirs/nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
  status: CREATED
custom_ESGgrid:
  expt_dir: /scratch2/BMC/fv3lam/kavulich/UFS/workdir/issue_462_new_WE2E_script/expt_dirs/custom_ESGgrid
  status: CREATED
$ ./monitor_jobs.py -y=monitor_jobs_20230124215513.yaml 
Checking tests available for monitoring...
Starting experiment deactivate_tasks running
Starting experiment nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR running
Starting experiment custom_ESGgrid running
Setup complete; monitoring 3 experiments
Experiment deactivate_tasks is COMPLETE; will no longer monitor.
Experiment nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR is COMPLETE; will no longer monitor.
Experiment custom_ESGgrid is COMPLETE; will no longer monitor.
All 3 experiments finished in 0:15:59.928853

And querying the monitor file afterwards shows that all experiments are indeed complete:

$ cat monitor_jobs_20230124215513.yaml 
### WARNING ###
### THIS FILE IS AUTO_GENERATED AND REGULARLY OVER-WRITTEN BY monitor_jobs.py
### EDITS MAY RESULT IN MISBEHAVIOR OF EXPERIMENTS RUNNING
deactivate_tasks:
  expt_dir: /scratch2/BMC/fv3lam/kavulich/UFS/workdir/issue_462_new_WE2E_script/expt_dirs/deactivate_tasks
  status: COMPLETE
  make_grid_201907010000: SUCCEEDED
  make_orog_201907010000: SUCCEEDED
  make_sfc_climo_201907010000: SUCCEEDED
nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR:
  expt_dir: /scratch2/BMC/fv3lam/kavulich/UFS/workdir/issue_462_new_WE2E_script/expt_dirs/nco_grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
  status: COMPLETE
  get_extrn_ics_202008100000: SUCCEEDED
  get_extrn_lbcs_202008100000: SUCCEEDED
  make_ics_202008100000: SUCCEEDED
  make_lbcs_202008100000: SUCCEEDED
  run_fcst_202008100000: SUCCEEDED
custom_ESGgrid:
  expt_dir: /scratch2/BMC/fv3lam/kavulich/UFS/workdir/issue_462_new_WE2E_script/expt_dirs/custom_ESGgrid
  status: COMPLETE
  make_grid_201907010000: SUCCEEDED
  get_extrn_ics_201907010000: SUCCEEDED
  get_extrn_lbcs_201907010000: SUCCEEDED
  make_orog_201907010000: SUCCEEDED
  make_sfc_climo_201907010000: SUCCEEDED
  make_ics_201907010000: SUCCEEDED
  make_lbcs_201907010000: SUCCEEDED
  run_fcst_201907010000: SUCCEEDED
  run_post_f000_201907010000: SUCCEEDED
  run_post_f001_201907010000: SUCCEEDED
  run_post_f002_201907010000: SUCCEEDED
  run_post_f003_201907010000: SUCCEEDED
  run_post_f004_201907010000: SUCCEEDED
  run_post_f005_201907010000: SUCCEEDED
  run_post_f006_201907010000: SUCCEEDED

Additional changes

In addition to the new scripts, a few changes have been made to the rest of the workflow that should have no significant impact on existing tests:

  • generate_FV3LAM_wflow.py

    • function generate_FV3LAM_wflow() now takes an optional "debug" argument that, when true, will provide more verbose output from this and other functions that receive the argument.
    • function generate_FV3LAM_wflow() now returns EXPTDIR, the string indicating the full path of the created experiment. This is useful for scripts/functions that call this function directly rather than via command-line
    • Move the "Experiment generation completed" message to the very end of the script, so that any failure will pre-empt it
    • Generalize logging setup so that it will work properly even if called from another function.
  • setup.py

    • function setup() now takes an optional "debug" argument that, when true, will provide more verbose output
    • Add several debug logging messages, remove an old "print" statement
    • Avoid using "cd" and other full-path changes so that results will be the same regardless of the directory this function is called from
  • tests/WE2E/test_configs/wflow_features/config.deactivate_tasks.yaml

    • Remove sections for tasks that are not used (LBCS and ICS are not staged since those tasks are deactivated

Missing features vs. run_WE2E_tests.sh

This initial version is incomplete, and a few more capabilities must be added before a wholesale replacement of run_WE2E_tests.sh

  • The command-line options cron_relaunch_intvl_mnts, generate_csv_file, and opsroot are not yet implemented
  • Currently, the default behavior is to not submit jobs to cron, but rather to use monitor_jobs.py to monitor and run the experiments
  • The ability to include calls to UNIX date in DATE_FIRST_CYCL and DATE_LAST_CYCL is not yet implemented
  • No checks that all MET executables/directories are set up properly for the given machine...should these be implemented in setup.py instead?
  • No hard-coded "maxtries" based on platform yet
  • Potentially other edge use cases I have missed?

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)
    • Jet

DEPENDENCIES:

None

DOCUMENTATION:

Will be contributed with later PR deprecating old system. This is a preliminary implementation for wider exposure and feedback.

ISSUE:

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

 - Write new config file and print as test
…ge what gets printed to screen, and have all messages printed to log file
implement rest of the command line options, then finally implement
non-cron submission and tracking of jobs. Changes in this commit:

 - Ensure all dictionary entries within each section are uppercase
 - Implement functions for updating settings in task_get_extrn_ics and
   task_get_extrn_lbcs sections
 - Import generate_FV3LAM_wflow function directly and call it
 - Implement debigging arguments in generate_FV3LAM_wflow.py and
   setup.py
 - Tweak setup_logging to avoid double-printing of log messages when
   calling generate_FV3LAM_wflow from another function with logging
 - Remove workdir dependencies from setup function
 - Return "EXPT_DIR" from workflow generation function, move "completion"
message to outside of generation function to avoid confusing/incorrect
"success" messages
monitor_jobs.py (currently just starts the first rocotorun of each
experiment)
 - Add function for writing monitor file (this will be overwritten each
time)
 - Add finction to query rocoto database file for each experiment (using
sqlite3), and extract the relevant information for each job
 - Add a skeleton function for updating the status of each experiment
(based on each job's individual status within that experiment)
…o not fail if the respective input files don't exist (since that task will not be run anyway)
monitor_jobs.py:
 - Remove verbose flags from rocotorun calls, since we don't use the
output anyway
 - Remove some debug prints, add some others
 - We need to loop over a copy of running_expts rather than the original
so we can remove entries within the loop
 - Move rocotorun to the end of the status check loop to give time for
the rocoto database to be fully updated before it is checked again
 - Add a short delay between loops over running_expts for further safety
against slowly updating databases

run_WE2E_tests.py:
 - If RUN_TASK_GET_EXTRN_ICS or RUN_TASK_GET_EXTRN_LBCS are False, skip
the setup function for that respective task
 - Fix staged paths for ICS and LBCS
…false, do not fail if the respective input files don't exist (since that task will not be run anyway)"

I forgot that the default for USE_USER_STAGED_EXTRN_FILES is false, so if True then we should expect failure.

This reverts commit ef072f9.
…onfig.deactivate_tasks.yaml; since we are not running these tasks, we should not set any of those settings
…operator simply references the original). Also add some more debugging/timing info.
…figure out what to do with specify_template_filenames...
particularly handy for debugging the functionality, and seems to work
flawlessly as implemented :)

Additionally, adding multiple calls to rocotorun in a row to get around
a potential bug with rocotorun leaving hung background processes. In
correspondance with Chris to try to solve this in a cleaner way.
…; seems to be specific to Hera head nodes (but could appear elsewhere)
 - Remove tests that are symlinks to tests already included
 - Fix bug in capability to include list of tests as a file
 - Remove some unnecessary prints
 - Handle blank/empty lines in a test file without failure
 - Omit duplicate tests similarly to symlink duplicates
if not match:
raise Exception(f"Could not find test {test}")
# Because some test files are symlinks to other tests, check that we don't
# include the same test twice
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkavulich I thought you were getting rid of the symlinks feature. Seems like you're still checking for that here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I was confusing in our previous conversation; I had no plans to change the existence of symlinks; I only changed it so that this does not result in a failure, it simply omits duplicate tests. Whether or not that's the best approach, I'm all ears on opinions there.

@gsketefian
Copy link
Collaborator

@mkavulich I tried testing what happens with duplicate file names: I copied wflow_features/config.MET_verification.yaml to a new directory under test_configs named verification, i.e. I created the file verification/config.MET_verification.yaml. The files are identical.

When I launched run_WE2E_tests.py, it tried running both instances, so it would be good to have it check for duplicate tests under the test_configs directory. In this case, one of the MET_verification experiment directories got renamed to MET_verification_old_... (probably when the 2nd one launched), and the script seemed to monitor one experiment thereafter (I guess the first test failed because of the directory rename, but I'm not sure).

@mkavulich
Copy link
Collaborator Author

@gsketefian The case of duplicate file names is not one I had accounted for; I had only accounted for the user specifying duplicate tests. I just pushed a new commit with a check that there are no config.TESTNAME.yaml files with the same TESTNAME.

@mkavulich
Copy link
Collaborator Author

@gsketefian @christinaholtNOAA Thank you for your reviews, I believe I have addressed all your concerns. Let me know if I missed something or you have more questions/comments.

@gsketefian
Copy link
Collaborator

@mkavulich I was testing redundancy with symlinks (I had a symlink to a test, and I listed both in my list of tests), and the script caught the redundancy with an appropriate warning. Just the output to screen is not well-formatted (seems to be a lot of extra spaces where there should be just one) and thus a bit hard to read. Do you mind fixing if possible? Here's what the output looked like:

$ ./run_WE2E_tests.py --machine hera --account rtrr --tests my_tests.txt --expt_basedir /scratch2/BMC/det/Gerard.Ketefian/UFS_CAM/TEST_mkavulich_new_run_WE2E_tests/expt_dirs/set01 -q
Checking that all tests are valid
WARNING: test file /scratch2/BMC/det/Gerard.Ketefian/UFS_CAM/TEST_mkavulich_new_run_WE2E_tests/ufs-srweather-app/tests/WE2E/test_configs/verification/config.MET_verification_link.yaml is a symbolic link to a
                                test file (/scratch2/BMC/det/Gerard.Ketefian/UFS_CAM/TEST_mkavulich_new_run_WE2E_tests/ufs-srweather-app/tests/WE2E/test_configs/wflow_features/config.MET_verification.yaml) that is also included in the
                                test list. Only the latter test will be run.
Will run 1 tests:
/scratch2/BMC/det/Gerard.Ketefian/UFS_CAM/TEST_mkavulich_new_run_WE2E_tests/ufs-srweather-app/tests/WE2E/test_configs/wflow_features/config.MET_verification.yaml
Workflow for test MET_verification successfully generated in
/scratch2/BMC/det/Gerard.Ketefian/UFS_CAM/TEST_mkavulich_new_run_WE2E_tests/expt_dirs/set01/MET_verification

calling function that monitors jobs, prints summary
Writing information for all experiments to monitor_jobs_20230202091032.yaml
Checking tests available for monitoring...
Starting experiment MET_verification running
Setup complete; monitoring 1 experiments

@gsketefian
Copy link
Collaborator

@mkavulich Two questions came to mind as I was doing further tests (these are just for discussion; I'm approving the PR):

  1. The current (shell) version searches only a specific set of subdirectories under test_config for tests. I had done it that way because often during testing I may make a backup copy of a subdirectory before starting to modify the files in it. The new script searches all subdirectories. I'm ok with that (I can always copy backup directories to somewhere outside of test_configs), just wondering how you decided on that. Just code simplicity?
  2. Once the cron features are enabled in a future PR, will the monitoring script be able to monitor tasks running with cron as well? Currently, there's a script (get_expts_status.sh) that does this. It would be nice to have a proper replacement for it (or just leave it as is if not).

Copy link
Collaborator

@christinaholtNOAA christinaholtNOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned last week, I'm totally fine with the PR as-is, but left just a couple of follow up comments just to circle back to the review.

@@ -0,0 +1,54 @@
# This is an example yaml file showing the various entries that can be created for tracking jobs by monitor_jobs.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be nice just to drop in a comment on how this appeared here so that future you is not completely baffled. ;)

# If RUN_TASK_GET_EXTRN_ICS is false, do nothing and return
if 'workflow_switches' in cfg:
if 'RUN_TASK_GET_EXTRN_ICS' in cfg['workflow_switches']:
if cfg['workflow_switches']['RUN_TASK_GET_EXTRN_ICS'] is False:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Sure. What about:

if cfg.get('workflow_switches', {}).get('RUN_TASK_GET_EXTRN_ICS', True) is False:
    return cfg_ics

The logic should return if workflow_switches is set and RUN_TASK_GET_EXTRN_ICS is set to something that evaluates to False.

If workflow_switches or RUN_TASK_GET_EXTRN_ICS isn't set, or RUN_TASK_GET_EXTRN_ICS evaluates to something not False, it does not return.

@mkavulich
Copy link
Collaborator Author

Thanks @christinaholtNOAA; for expediency and to avoid unnecessary testing, I will roll these suggestions into my working branch for the next update to these scripts.

@MichaelLueken MichaelLueken merged commit 7299fef into ufs-community:develop Feb 7, 2023
MichaelLueken pushed a commit that referenced this pull request Feb 7, 2023
PR #566 changed the variable "MODEL" to a more descriptive name, but failed to make this change in config.community.yaml. The unit tests for generate_FV3LAM_wflow.py make use of this file as an input config.yaml, so they are now failing due to this incorrect variable name. This wasn't caught because prior to #558 the unit tests were broken for a different reason.

This change simply makes the appropriate rename, which should fix the failing unit test. Also created an f-string that was missed in a setup.py error message.
MichaelLueken pushed a commit that referenced this pull request Mar 20, 2023
This PR improves on the new ./run_WE2E_tests.py script (introduced in #558), implementing all the features present in the previous shell-based workflow. Some new files are also introduced for better organization and additional functionality:

* tests/WE2E/utils.py This is a collection of functions used by other scripts, contained here to avoid circular dependencies.
* tests/WE2E/WE2E_summary.py Given an experiment directory or .yaml file, outputs a summary to screen of each experiment, its status, and the number of core hours used. It also prints a summary file with detailed information about each task for each experiment.
* tests/WE2E/print_test_info.py Will print a file WE2E_test_info.txt, very similar to the legacy WE2E_test_info.csv with just a few minor format differences.

Any scripts can be run with the -h argument to print information about all available options (not including utils.py, which is not designed to be run stand-alone).

With this PR, the old superseded shell-based tools are removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python unittest fails in github actions
5 participants