Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace shell-based WE2E test scripts and run_srw_tests.py script with run_WE2E_tests.py and monitor_jobs.py #586

Closed
7 of 15 tasks
mkavulich opened this issue Feb 8, 2023 · 4 comments · Fixed by #637
Closed
7 of 15 tasks
Assignees
Labels
enhancement New feature or request

Comments

@mkavulich
Copy link
Collaborator

mkavulich commented Feb 8, 2023

Description

#558 introduced the first implementation of run_WE2E_tests.py and monitor_jobs.py. The current version accomplishes most of the existing functionality of run_WE2E_tests.sh, but stopped short of a full replacement to reduce the scope of the effort. Now that the first PR has been merged, we should identify the work needed to completely deprecate run_WE2E_tests.sh and shift all testing efforts to the python script.

The scripts that will be deprecated are:

  • run_WE2E_tests.sh: This is the original, bash-based script for generating and submitting the workflow end-to-end tests
  • run_srw_tests.py: Introduced in [develop] Python-based crontab replacement #466, this was a temporary solution to platforms that had limitations running full experiment suites in crontabs. monitor_jobs.py offers a more robust solution.
  • get_WE2Etest_names_subdirs_descs.sh: This is a script that creates a pipe-delimited .csv file describing the various WE2E tests and settings within. There is a lot of other functionality used by the run_WE2E_tests.sh script, but this is unnecessary in the python version.
  • get_expts_status.sh This script
  • create_WE2E_resource_summary.py This script prints out a summary of the number of core hours used by a given experiment or set of experiments based on information in the rocoto database files for each experiment.

Solution

This issue will track the steps needed to achieve this replacement without disruption to the normal development process.

Requirements

  • run_WE2E_tests.py and its related functions should include all functionality present in scripts to be replaced
    • run_WE2E_tests.sh
    • run_srw_tests.py
    • get_WE2Etest_names_subdirs_descs.sh
      • The test details file functionality has been added to run_WE2E_tests.py with new --print_test_details flag
      • Most other functionality is unnecessary with new scripts
    • get_expts_status.sh
      • This functionality will now be available from both monitor_jobs.py and WE2E_summary.py
    • create_WE2E_resource_summary.py
      • This information will now be available from WE2E_summary.py, with several improvements:
        • Core hours are calculated more correctly (by charging full use of node)
        • Can pass either an experiment directory or a yaml experiment file
  • OR, give proper justification for why a feature will be removed/deprecated, with approval from the code management team
  • Jenkins testing must be updated to use the new script
  • Automated label-based WE2E tests must be updated to use the new script
  • Any additional use cases not mentioned above should be identified and accounted for

Acceptance Criteria

  • Requirements must be met.

Dependencies

Will track relevant PRs here.

@mkavulich mkavulich added the enhancement New feature or request label Feb 8, 2023
@mkavulich mkavulich self-assigned this Feb 8, 2023
@MichaelLueken
Copy link
Collaborator

@mkavulich -
While running tests/WE2E/run_WE2E_tests.py, I'm encountering the following error on Jet:

calling function that monitors jobs, prints summary
Writing information for all experiments to monitor_jobs_20230208170852.yaml
Checking tests available for monitoring...
Starting experiment get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2mems running
Setup complete; monitoring 1 experiments
Experiment get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2mems is COMPLETE; will no longer monitor.

*********************************************************************
FATAL ERROR:
Experiment generation failed. See the error message(s) printed below.
For more detailed information, check the log file from the workflow
generation script: log.run_WE2E_tests
*********************************************************************

Traceback (most recent call last):
  File "/mnt/lfs4/HFIP/hfv3gfs/Michael.Lueken/new_tests/tests/WE2E/./run_WE2E_tests.py", line 461, in <module>
    run_we2e_tests(homedir,args)
  File "/mnt/lfs4/HFIP/hfv3gfs/Michael.Lueken/new_tests/tests/WE2E/./run_WE2E_tests.py", line 195, in run_we2e_tests
    monitor_file = monitor_jobs(monitor_yaml, debug=args.debug)
  File "/mnt/lfs4/HFIP/hfv3gfs/Michael.Lueken/new_tests/tests/WE2E/monitor_jobs.py", line 84, in monitor_jobs
    logging.info(f'All {num_expts} experiments finished in {str(total_walltime)}')
NameError: name 'num_expts' is not defined

Visual inspection, using rocotostat, for the experiment showed that all of the experiment tasks completed successfully. This behavior is also being seen while running several tests (i.e., fundamental WE2E tests).

@mkavulich
Copy link
Collaborator Author

Thanks, I noticed this problem earlier today also. This is an error in the final print message before completing due to some last-minute changes I made that were not committed back correctly. This error only appears after all runs complete, so it shouldn't affect the results, but I can commit a quick fix later today.

@christinaholtNOAA
Copy link
Collaborator

All the needed features have been implemented. Needs comments/docs updates. Then will open the PR.

This PR will remove the bash version, and all the other bash scripts in that directory.

@christinaholtNOAA christinaholtNOAA moved this from In Progress to In Review in RRFS Merge to SRW Mar 6, 2023
@mkavulich mkavulich moved this from In Review to In Progress in RRFS Merge to SRW Mar 13, 2023
@mkavulich mkavulich moved this from In Progress to In Review in RRFS Merge to SRW Mar 13, 2023
@christinaholtNOAA
Copy link
Collaborator

PR needs reviewers, and has a few more comments to address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants