Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling of the LETKF with 30 and 80 members and a complete obs space #1234

Open
guillaumevernieres opened this issue Jul 24, 2024 · 4 comments
Assignees
Labels

Comments

@guillaumevernieres
Copy link
Contributor

guillaumevernieres commented Jul 24, 2024

LETKF Profiling

Test Configurations

Test # Ensemble Size Nodes MPI Tasks Per Node Total MPI Tasks Total Time (s) Total Memory (GB) Description
1 30 6 5 30 TBD TBD Small ensemble, fewer nodes
2 30 6 10 60 TBD TBD Small ensemble, moderate scaling
3 30 12 5 60 TBD TBD Small ensemble, more nodes
4 30 12 10 120 TBD TBD Small ensemble, max scaling
5 80 6 13 78 TBD TBD Large ensemble, fewer nodes
6 80 6 15 90 TBD TBD Large ensemble, moderate scaling
7 80 12 13 156 TBD TBD Large ensemble, more nodes
8 80 12 15 180 TBD TBD Large ensemble, max scaling

Notes

  • Nodes: Fixed as 6 or 12 for all tests.
  • MPI Tasks Per Node: Adjusted for optimal utilization of each node while keeping within practical limits.
  • Ensemble Size: Tests performed with 30 and 80 members to evaluate scalability and performance.
  • Total Time (s): Measure of total runtime for each configuration (to be recorded post-tests).
  • Total Memory (GB): Measure of total memory usage across all nodes (to be recorded post-tests).
  • Description: Provides a quick overview of the setup and scaling intention for each test case.
TerrenceMcGuinness-NOAA added a commit to NOAA-EMC/global-workflow that referenced this issue Oct 16, 2024
# Description

This PR has the GitHub Pipeline script in the `github/workflows`
directory for running CI tests
to be preformed an AWS virtual cluster. It is setup to be launched from
the dispatch action from the Actions tab.

For now it will only run C48_ATM 

Resolves #3006 

Once the yaml pipeline is in `.github/workflows` directory of the
default branch we can test it against [PR
2977](#2977) which may
be needed to build on Parallel Works Centos AWS.

Code managers can check to see if the self-hosted runner
[globalworkflow_parallelworks](https://github.com/NOAA-EMC/global-workflow/settings/actions/runners/22)
is up and ready by checking the
[Running](https://github.com/NOAA-EMC/global-workflow/settings/actions/runners)
Settings.

In pending work we should also be able spin up the cluster on demand
from GitHub as well.

<!-- For more on writing good commit messages, see
https://cbea.ms/git-commit/ -->

# Type of change
- [ ] Bug fix (fixes something broken)
- [ ] New feature (adds functionality)
- [x] Maintenance (code refactor, clean-up, new CI test, etc.)

# Change characteristics
<!-- Choose YES or NO from each of the following and delete the other
-->
- Is this a breaking change (a change in existing functionality)? YES/NO
- Does this change require a documentation update? YES/NO
- Does this change require an update to any of the following submodules?
YES/NO (If YES, please add a link to any PRs that are pending.)
  - [ ] EMC verif-global <!-- NOAA-EMC/EMC_verif-global#1234 -->
  - [ ] GDAS <!-- NOAA-EMC/GDASApp#1234 -->
  - [ ] GFS-utils <!-- NOAA-EMC/gfs-utils#1234 -->
  - [ ] GSI <!-- NOAA-EMC/GSI#1234 -->
  - [ ] GSI-monitor <!-- NOAA-EMC/GSI-Monitor#1234 -->
  - [ ] GSI-utils <!-- NOAA-EMC/GSI-Utils#1234 -->
  - [ ] UFS-utils <!-- ufs-community/UFS_UTILS#1234 -->
  - [ ] UFS-weather-model <!-- ufs-community/ufs-weather-model#1234 -->
  - [ ] wxflow <!-- NOAA-EMC/wxflow#1234 -->

# How has this been tested?
<!-- Please list any test you conducted, including the machine.

CI Tests runs-end-to end on an AWS Centos based virtual cluster on
Parallel Works.

-->

# Checklist
- [ ] Any dependent changes have been merged and published
- [x] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have documented my code, including function, input, and output
descriptions
- [ ] My changes generate no new warnings
- [ ] New and existing tests pass with my changes
- [x] This change is covered by an existing CI test or a new one has
been added
- [ ] I have made corresponding changes to the system documentation if
necessary

---------

Co-authored-by: tmcguinness <terry.mcguinness@noaa.gov>
EricSinsky-NOAA added a commit to NOAA-EMC/global-workflow that referenced this issue Oct 28, 2024
<!--
  *** PLEASE READ ***

  Any PRs not following this template will be closed.

  Please delete all these comments before submitting the PR.

Please use a short (<60 char), descriptive title for the PR title above.
It should complete the sentence "If merged, this PR will _____".
Capitalize the first word and do not end with a period.

  No content should appear above the "Description" header.

If this PR is not merge-ready (e.g. it depends on other PRs not yet
merged), please mark it as draft until it is ready.

  PRs should meet these guidelines:
  - Each PR should address ONE topic and have an associated issue.
  - No hard-coded paths or personal directories.
  - No temporary or backup files should be committed (including logs).
- Any code that you disabled by being commented out should be removed or
reenabled.
-->
# Description
<!-- This description will become the commit message for the PR. -->
<!--
  Solely pointing to an issue is not an adequate description!

  Please use this format for your description:

Describe your changes. Focus on the *what* and *why*. The *how* will be
evident from the changes. In particular, be sure to note any interface
changes, such as command line syntax, that will need to be communicated
to users.

At the end of your description, please be sure to add the issue this PR
solves using the word "Resolves". If there are any issues that are
related but not yet resolved (including in other repos), you may use
"Refs".

  Resolves #1234
  Refs #4321
  Refs NOAA-EMC/repo#5678
-->

This PR brings recent changes from the develop branch to the GEFS
reforecast branch. This PR updates the GEFS reforecast branch to develop
hash ac3cde5 (10/11/2024). This version
of global-workflow uses the ufs-weather-model hash
[6a4e09e](https://github.com/ufs-community/ufs-weather-model/tree/6a4e09e94773ffa39ce7ab6a54a885efada91f21)
(9/9/2024).

Furthermore, this PR ensures the following adjustments for the
reforecast:

- [x] Speed up rocoto by grouping post job
- [x] Optimize PE configuration
- [x]  Remove duplicate OCNSPPT and EPBL settings
- [x] Set restart_interval to fhmax
- [x] Turn off SHUM in config.efcs
- [x] Set FHMIN_WAV to 3 in config.base 
- [x] Turn off ATM history file output
- [x] Change  HMS=${cyc}0000 to HMS=030000 in Wavepostpnt script (#2788)
- [x] Include YYYYMMDDHH (PDY) in job name
- [x] Change CA seed based on case and cyc for control member and
perturbed members
- [x] Fix post ensemble info
- [x] Add tob to ocean products (#2995 )
- [x] Move PEVPR from b group to a group for atmos products (#2995)
- [x] Add option to download initial condition from HPSS
- [x] Add ability to download and stage replay analysis from AWS, which
is needed for the repair_replay task
- [x] Add capability to run forecasts in 7-day intervals  (#2928)
- [x] Update defaults.yaml so that many of the reforecast-specific
settings can be used by default

<!-- For more on writing good commit messages, see
https://cbea.ms/git-commit/ -->

# Type of change
- [ ] Bug fix (fixes something broken)
- [ ] New feature (adds functionality)
- [x] Maintenance (code refactor, clean-up, new CI test, etc.)

# Change characteristics
<!-- Choose YES or NO from each of the following and delete the other
-->
- Is this a breaking change (a change in existing functionality)? NO
- Does this change require a documentation update? NO
- Does this change require an update to any of the following submodules?
NO
  - [ ] EMC verif-global <!-- NOAA-EMC/EMC_verif-global#1234 -->
  - [ ] GDAS <!-- NOAA-EMC/GDASApp#1234 -->
  - [ ] GFS-utils <!-- NOAA-EMC/gfs-utils#1234 -->
  - [ ] GSI <!-- NOAA-EMC/GSI#1234 -->
  - [ ] GSI-monitor <!-- NOAA-EMC/GSI-Monitor#1234 -->
  - [ ] GSI-utils <!-- NOAA-EMC/GSI-Utils#1234 -->
  - [ ] UFS-utils <!-- ufs-community/UFS_UTILS#1234 -->
  - [ ] UFS-weather-model <!-- ufs-community/ufs-weather-model#1234 -->
  - [ ] wxflow <!-- NOAA-EMC/wxflow#1234 -->

# How has this been tested?
<!-- Please list any test you conducted, including the machine.

Example:
- Clone and build on WCOSS
- Cycled test on Orion
- Forecast-only on Hera
-->

This branch is being tested on WCOSS2. When testing has succeeded, this
PR will be marked as ready for review.

# Checklist
- [ ] Any dependent changes have been merged and published
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have documented my code, including function, input, and output
descriptions
- [ ] My changes generate no new warnings
- [ ] New and existing tests pass with my changes
- [ ] This change is covered by an existing CI test or a new one has
been added
- [ ] I have made corresponding changes to the system documentation if
necessary

---------

Co-authored-by: Wei Huang <wei.huang@noaa.gov>
Co-authored-by: Kate Friedman <kate.friedman@noaa.gov>
Co-authored-by: Cory Martin <cory.r.martin@noaa.gov>
Co-authored-by: Andrew.Tangborn <Andrew.Tangborn@noaa.gov>
Co-authored-by: Walter Kolczynski - NOAA <Walter.Kolczynski@noaa.gov>
Co-authored-by: AndrewEichmann-NOAA <58948505+AndrewEichmann-NOAA@users.noreply.github.com>
Co-authored-by: DavidBurrows-NCO <82525974+DavidBurrows-NCO@users.noreply.github.com>
Co-authored-by: AnningCheng-NOAA <48297505+AnningCheng-NOAA@users.noreply.github.com>
Co-authored-by: David Huber <69919478+DavidHuber-NOAA@users.noreply.github.com>
Co-authored-by: Rahul Mahajan <aerorahul@users.noreply.github.com>
Co-authored-by: AntonMFernando-NOAA <167725623+AntonMFernando-NOAA@users.noreply.github.com>
Co-authored-by: BoCui-NOAA <53531984+BoCui-NOAA@users.noreply.github.com>
Co-authored-by: DavidNew-NOAA <134300700+DavidNew-NOAA@users.noreply.github.com>
Co-authored-by: Jeffrey Whitaker <jeffrey.s.whitaker@noaa.gov>
Co-authored-by: mingshichen-noaa <48537176+mingshichen-noaa@users.noreply.github.com>
Co-authored-by: Jiarui Dong <Jiarui.Dong@noaa.gov>
Co-authored-by: David Huber <david.huber@noaa.gov>
Co-authored-by: Guillaume Vernieres <guillaume.vernieres@gmail.com>
Co-authored-by: RussTreadon-NOAA <26926959+RussTreadon-NOAA@users.noreply.github.com>
Co-authored-by: Innocent Souopgui <162634017+InnocentSouopgui-NOAA@users.noreply.github.com>
Co-authored-by: Neil Barton <103681022+NeilBarton-NOAA@users.noreply.github.com>
DavidHuber-NOAA pushed a commit to NOAA-EMC/global-workflow that referenced this issue Oct 31, 2024
# Description

This update to the GitHub dispatched CI pipeline to execute the
self-hosted GitHub Runner on Parallel Works now adds the feature that
starts up the virtual compute cluster automatically. We now have a
complete end-to-end automated process for running CI tests in Parallel
Works.

Next steps would be tear-down and adding more test to see if it scales.

It also has the update for getting a PR to load up when its originating
from a forked repo.

# Type of change
- [ ] Bug fix (fixes something broken)
- [x] New feature (adds functionality)
- [ ] Maintenance (code refactor, clean-up, new CI test, etc.)

# Change characteristics
<!-- Choose YES or NO from each of the following and delete the other
-->
- Is this a breaking change (a change in existing functionality)? NO
- Does this change require a documentation update? YES
- Does this change require an update to any of the following submodules?
NO (If YES, please add a link to any PRs that are pending.)
  - [ ] EMC verif-global <!-- NOAA-EMC/EMC_verif-global#1234 -->
  - [ ] GDAS <!-- NOAA-EMC/GDASApp#1234 -->
  - [ ] GFS-utils <!-- NOAA-EMC/gfs-utils#1234 -->
  - [ ] GSI <!-- NOAA-EMC/GSI#1234 -->
  - [ ] GSI-monitor <!-- NOAA-EMC/GSI-Monitor#1234 -->
  - [ ] GSI-utils <!-- NOAA-EMC/GSI-Utils#1234 -->
  - [ ] UFS-utils <!-- ufs-community/UFS_UTILS#1234 -->
  - [ ] UFS-weather-model <!-- ufs-community/ufs-weather-model#1234 -->
  - [ ] wxflow <!-- NOAA-EMC/wxflow#1234 -->

# How has this been tested?

The start up aspected has been tested from my forked repo but could not
test repos that are forked.
The test from forked repos has to be tested once the workflow pipeline
in the **develop** branch.

# Checklist
- [x] Any dependent changes have been merged and published
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have documented my code, including function, input, and output
descriptions
- [x] My changes generate no new warnings
- [x] New and existing tests pass with my changes
- [x] This change is covered by an existing CI test or a new one has
been added
- [ ] Any new scripts have been added to the .github/CODEOWNERS file
with owners
- [ ] I have made corresponding changes to the system documentation if
necessary

---------

Co-authored-by: tmcguinness <terry.mcguinness@noaa.gov>
Co-authored-by: tmcguinness <tmcguinness@129.qarestr.sub-172-16-82.myvzw.com>
@guillaumevernieres
Copy link
Contributor Author

We need to turn off most of the io @AndrewEichmann-NOAA , this is what you currently output:

 output:
  datadir: letkf_output/
  date: '2021-03-24T15:00:00Z'
  exp: letkf
  type: ens
output mean prior:
  datadir: letkf_output/
  date: '2021-03-24T15:00:00Z'
  exp: letkf.mean_prior
  type: fc
output variance prior:
  datadir: letkf_output/
  date: '2021-03-24T15:00:00Z'
  exp: letkf.var_prior
  type: fc
output variance posterior:
  datadir: letkf_output/
  date: '2021-03-24T15:00:00Z'
  exp: letkf.var_post
  type: an
output increment:
  datadir: letkf_output/
  date: '2021-03-24T15:00:00Z'
  exp: letkf.inc
  type: ens

reduce it to the increments only:

 output increment:
  datadir: letkf_output/
  date: '2021-03-24T15:00:00Z'
  exp: letkf.inc
  type: ens

@guillaumevernieres guillaumevernieres changed the title Profiling of the LETKF with 80 members and a complete obs space Profiling of the LETKF with 30 and 80 members and a complete obs space Dec 11, 2024
@AndrewEichmann-NOAA
Copy link
Collaborator

Thin in the localization?

@shlyaeva
Copy link
Collaborator

@AndrewEichmann-NOAA: I think to do that, one could add max nobs in the obs localization section (https://github.com/JCSDA-internal/ufo/blob/49ddba60c484bc5b8bac81b660e3b1b0905cd314/src/ufo/obslocalization/ObsHorLocParameters.h#L85), so it'd look something like:

obs localizations:
- localization method: Rossby
  ...
  max nobs: 1000 # or whatever makes sense

@AndrewEichmann-NOAA
Copy link
Collaborator

Message from @shlyaeva:
Hi! I think here's what you'd need to do to rebuild with the changes: go to oops, note the hash that you are using (git branch should output that), check out my branch feature/packEigenAutoSize (so you can pick up the commit from there), go back to the hash you were using before, and then run git cherry-pick 6de68a6 (that's the commit I added on top of develop). For ioda, it would be similar, except git cherry-pick a2a6d59. If you did git diff with your previous hashes after that, the diff should look like https://github.com/JCSDA-internal/oops/commit/6de68a61fadd19dbc01af3b1ace440866d633597 for oops and https://github.com/JCSDA-internal/ioda/commit/a2a6d594365d9b6132a960e62d3eae7c8611d099 for ioda. This would pull just the changes I did and keep everything else the same

DavidHuber-NOAA added a commit to NOAA-EMC/global-workflow that referenced this issue Jan 7, 2025
# Description
As referred within #3019, the variable 5WAVH is being removed from each
of the files `parm/wmo/grib2_awpgfs[000-240].003` and
`parm/wmo/grib2_awpgfs_20km_[ak,conus,pac,prico]f000` for the purpose of
remedying "error code 30" that was generated through the execution of
`exgfs_atmos_awips_20km_1p0deg.sh` during the GFSv17 HR4 test run.
Obsolete code is also being removed from the script
`exgfs_atmos_awips_20km_1p0deg.sh`.

No other errors mentioned in #3019 are addressed in this PR.

# Type of change
- [x] Bug fix (fixes something broken)
- [ ] New feature (adds functionality)
- [ ] Maintenance (code refactor, clean-up, new CI test, etc.)

# Change characteristics
- Is this a breaking change (a change in existing functionality)? NO
- Does this change require a documentation update? NO
- Does this change require an update to any of the following submodules?
NO
  (If YES, please add a link to any PRs that are pending.)
  - [ ] EMC verif-global <!-- NOAA-EMC/EMC_verif-global#1234 -->
  - [ ] GDAS <!-- NOAA-EMC/GDASApp#1234 -->
  - [ ] GFS-utils <!-- NOAA-EMC/gfs-utils#1234 -->
  - [ ] GSI <!-- NOAA-EMC/GSI#1234 -->
  - [ ] GSI-monitor <!-- NOAA-EMC/GSI-Monitor#1234 -->
  - [ ] GSI-utils <!-- NOAA-EMC/GSI-Utils#1234 -->
  - [ ] UFS-utils <!-- ufs-community/UFS_UTILS#1234 -->
  - [ ] UFS-weather-model <!-- ufs-community/ufs-weather-model#1234 -->
  - [ ] wxflow <!-- NOAA-EMC/wxflow#1234 -->

# How has this been tested?
Removal of variable 5WAVH from the GRIB2 files should allow completion
of TOCGRIB2 processing (within `exgfs_atmos_awips_20km_1p0deg.sh`) of
the GRIB2 files. @RuiyuSun, or the GW team, may wish to include the
requested modifications for future GFSv17 tests that include
post-processing jobs.

# Checklist
- [ ] Any dependent changes have been merged and published
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have documented my code, including function, input, and output
descriptions
- [ ] My changes generate no new warnings
- [ ] New and existing tests pass with my changes
- [ ] This change is covered by an existing CI test or a new one has
been added
- [ ] Any new scripts have been added to the .github/CODEOWNERS file
with owners
- [ ] I have made corresponding changes to the system documentation if
necessary

Co-authored-by: christopher hill <christopher.m.hill@dlogin05.dogwood.wcoss2.ncep.noaa.gov>
Co-authored-by: Rahul Mahajan <aerorahul@users.noreply.github.com>
Co-authored-by: David Huber <69919478+DavidHuber-NOAA@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants