Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal: Add support for creating multiple input datasets for use case categories #1694

Closed
7 of 26 tasks
georgemccabe opened this issue Jul 11, 2022 · 7 comments · Fixed by #1697
Closed
7 of 26 tasks
Assignees
Labels
component: CI/CD Continuous integration and deployment issues component: docker component: testing Software testing issue priority: blocker Blocker reporting: DTC NCAR Base NCAR Base DTC Project reporting: DTC NOAA BASE NOAA Office of Atmospheric Research DTC Project requestor: METplus Team METplus Development Team required: FOR DEVELOPMENT RELEASE Required to be completed in the development release for the assigned project type: enhancement Improve something that it is currently doing
Milestone

Comments

@georgemccabe
Copy link
Collaborator

georgemccabe commented Jul 11, 2022

Currently the automated tests are set up so that each model_applications category corresponds to an input data set that contains all of the data required to run all of the use cases in that category. The s2s input data set has become so large that while it doesn't exceed the maximum allowable size for the Docker data volume that stores it to use in the tests, but use case test groups that use this data run out of disk space when they write output data from the use cases.

The use case groups that fail also use the Conda environments required for METplotpy and METcalcpy, which are very large in size due to the many Python package dependencies. This also contributes to the total disk size that can be used in the test environment. The newly created metplotpy environment for #1566 is much larger in size than the existing environment, so this may cause disk space issues when that work is completed.

Size of current conda environments:

du -sh /usr/local/envs/*
897M    /usr/local/envs/metplotpy
185M    /usr/local/envs/metplus_base

Size of conda environments using Python 3.8.6 and updated package requirements:

du -sh /usr/local/envs/*
2.2G    /usr/local/envs/metplotpy.v5
168M    /usr/local/envs/metplus_base.v5

We may need to reconsider new requirements of use cases and how to group them in the tests, including:

  • Size of input data
  • Size out output data generated
  • Size of conda environment required to run
  • Others?

Describe the Enhancement

  • Come up with a good naming convention for the additional input data sets. Currently they are named after the category, i.e. s2s. We will need another data set such as s2s_2.
  • Update the automated test logic to support multiple input data sets for a given category
  • Update the Contributor's Guide Add Use Cases chapter with the updated process for adding new data
  • Update User's Guide to describe how to find input data for use cases since they will not just correspond to the model_applications sub-directory name anymore

Time Estimate

1-3 days

Sub-Issues

Consider breaking the enhancement down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

ASAP

Funding Source

2702691 2792541

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Add any new Python packages to the METplus Components Python Requirements table.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@georgemccabe georgemccabe added type: enhancement Improve something that it is currently doing component: testing Software testing issue priority: blocker Blocker alert: NEED MORE DEFINITION Not yet actionable, additional definition required alert: NEED ACCOUNT KEY Need to assign an account key to this issue component: CI/CD Continuous integration and deployment issues requestor: METplus Team METplus Development Team required: FOR DEVELOPMENT RELEASE Required to be completed in the development release for the assigned project component: docker labels Jul 11, 2022
@georgemccabe georgemccabe added this to the METplus-5.0.0 milestone Jul 11, 2022
@georgemccabe georgemccabe self-assigned this Jul 11, 2022
@hankenstein2
Copy link
Contributor

Look into paying for more disk space to host bigger data sets
Look into hosting a TDS(Thredds) server to dynamically host data, i.e. just grab what you want.
Probably need to split up datasets more starting with S2S

@hankenstein2
Copy link
Contributor

Try to split s2s data into two parts that are two descriptive s2s categories, i.e. s2s_ocean or s2s_fubar

@hankenstein2
Copy link
Contributor

Phase, OMI, RMM can be grouped (7-10) NJO
Blocking, Weather Regime (1-3, 11) Blocking
s2s_mjo
s2s_mid_lat
s2s - for all the rest (0,4-6,12-14)

@georgemccabe
Copy link
Collaborator Author

Proposed groupings for splitting up s2s use cases

s2s_mjo

  • UserScript_obsERA_obsOnly_PhaseDiagram
  • UserScript_fcstGFS_obsERA_OMI
  • UserScript_obsERA_obsOnly_OMI
  • UserScript_obsERA_obsOnly_RMM

s2s_mid_lat

  • UserScript_fcstGFS_obsERA_Blocking
  • UserScript_obsERA_obsOnly_Blocking
  • UserScript_obsERA_obsOnly_WeatherRegime
  • UserScript_fcstGFS_obsERA_WeatherRegime

s2s

  • GridStat_SeriesAnalysis_fcstNMME_obsCPC_seasonal_forecast
  • TCGen_fcstGFSO_obsBDECKS_GDF_TDF
  • UserScript_obsPrecip_obsOnly_Hovmoeller
  • UserScript_obsPrecip_obsOnly_CrossSpectraPlot
  • UserScript_obsERA_obsOnly_Stratosphere
  • SeriesAnalysis_fcstCFSv2_obsGHCNCAMS_climoStandardized_MultiStatisticTool
  • GridStat_fcstCFSv2_obsGHCNCAMS_MultiTercile

Should the s2s group have an additional identifier? Should/can this group be divided into smaller groups?

@TaraJensen
Copy link
Contributor

@CPKalb @georgemccabe @j-opatz - please run these stratifications past CPC, PSL, etc... to see if they make sense to them. After all - they are the S2S community we are serving. Thanks!

@CPKalb
Copy link
Contributor

CPKalb commented Jul 12, 2022

I just heard back from Maria, and she thinks s2s_mjo is great! I'll let you know when I hear back about the blocking weather regime

georgemccabe added a commit that referenced this issue Jul 12, 2022
…dingly, and turned on all s2s use cases to test that they all run successfully after the changes
@CPKalb
Copy link
Contributor

CPKalb commented Jul 12, 2022

I just heard back from Doug, and he thinks s2s_mid_lat is a good name as well.

@georgemccabe georgemccabe linked a pull request Jul 14, 2022 that will close this issue
14 tasks
@TaraJensen TaraJensen added reporting: DTC NCAR Base NCAR Base DTC Project reporting: DTC NOAA BASE NOAA Office of Atmospheric Research DTC Project and removed alert: NEED ACCOUNT KEY Need to assign an account key to this issue labels Jul 14, 2022
@georgemccabe georgemccabe removed the alert: NEED MORE DEFINITION Not yet actionable, additional definition required label Jul 14, 2022
georgemccabe added a commit that referenced this issue Jul 18, 2022
…ort range in the Verification Datasets section of the documentation
georgemccabe added a commit that referenced this issue Jul 18, 2022
* per #1694, moved 4 use cases from s2s to s2s_mjo, updated paths accordingly, and turned on all s2s use cases to test that they all run successfully after the changes

* per #1694, fixed paths to s2s_mjo conf files

* updated documentation for use cases that were moved from s2s to s2s_mjo

* attempt to free up unused disk space in GHA runner environment

* moved 4 s2s use cases into s2s_mid_lat

* added new model application categories to contrib guide for adding new use cases

* per #947, changed convection_allowing_models use cases to short_range

* changed which use case tests run to the ones that are failing and added other METdbLoad use case to see if that fails as well

* test to determine which files are preventing MySQL database from being created properly

* test 2 to determine which files are preventing MySQL database from being created properly

* test 3 to see if removing these files is not the cause of the METdbLoad failure

* updated references to METdatadb to METdataio since the repository was renamed

* fixed typo

* changed path to sql file needed to create database because it was moved from METviewer to METdataio

* fixed path to sql file that was moved from METviewer to METdataio

* removed temporary fix because metdataio conda env was created in the dtcenter/metplus-envs:metdataio Docker image

* added note to update path when METviewer Dockerfile changes to reflect METdatadb rename to METdataio, ci-skip-unit-tests

* fixed path to METdataio repo

* add back commands to free up disk space because issue with METdbLoad use case was likely not related, ci-skip-unit-tests

* run all tests with ci-run-all-diff

* remove use case group added for testing, ci-skip-all

* changed exit code for diff tests to 2 so it is easier to see if a use case test job failed due to an actual failure or due to differences in the output

* changed grouping of s2s mid lat use cases to original grouping to prevent warning that artifact contains more than 10,000 files. The 2 WeatherRegime use cases produce a lot of output files, so splitting them up should resolve this warning

* per #1694, changed all references to convection allowing models to short range in the Verification Datasets section of the documentation

* changed URLs to develop version of documentation to a URL relative to the current version of the documentation to match the quick search links from the METplus User's Guide

* per #947, changed references to convection_allowing_model (without the s) to short_range that were missed

* updated use case test scripts to rename convection_allowing_models to short_range and added note to alert developers that the list of use cases in the script is not maintained and therefore not complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: CI/CD Continuous integration and deployment issues component: docker component: testing Software testing issue priority: blocker Blocker reporting: DTC NCAR Base NCAR Base DTC Project reporting: DTC NOAA BASE NOAA Office of Atmospheric Research DTC Project requestor: METplus Team METplus Development Team required: FOR DEVELOPMENT RELEASE Required to be completed in the development release for the assigned project type: enhancement Improve something that it is currently doing
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

4 participants