Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Testing CESM+DART infrastructure #463

Open
kdraeder opened this issue Mar 10, 2023 · 8 comments
Open

Feature request: Testing CESM+DART infrastructure #463

kdraeder opened this issue Mar 10, 2023 · 8 comments
Assignees
Labels
CAM Community Atmosphere Model CLM Community Land Model Discussion Requires team discussion Enhancement New feature or request

Comments

@kdraeder
Copy link
Contributor

Use case

It would be very helpful to people who use CESM+DART
if the released versions of CESM have been tested to ensure
that they provide the functionality required by DART,
such as the ability to build and run large, multi-instance jobs
and interact with DART as an External System Processing component.

Is your feature request related to a problem?

Recently developed CESM components can be used and tested with DART
only if the CESM infrastructure continues to enable it.
Often in the past CESM or CIME development has neglected to test for this use,
and the resulting release has been incompatible with DART (CESM issue #1807).
This required people who are not experts in CESM code to inefficiently find and suboptimally fix the problems.
This can take as long as it takes the CESM developers to release a new version,
resulting in a seemingly endless cycle, during which DART cannot be used with recent CESMs.

Describe your preferred solution

(Further) Integrate testing of the functionality required by DART into the CESM testing suite.
This is most important for major releases, which are most likely to be used with DART.
But it would be useful whenever a new version of a component model is made available
(if there's a DART interface to that component), so that DART can be used to evaluate it.
We anticipate that the CESM testing would not involve running an assimilation,
but just the features of CESM that enable that.
After passing the CESM tests, then people who want to do the assimilation
would do the full assimilation testing and model evaluation.

Describe any alternatives you have considered

See "Is this related to a problem"


The rest of the text is a discussion of strategies for implementing solutions.

Short list of issues

  1. Current status of testing (@alperaltuntas)
  2. Multi-instance + ensemble size
  3. SourceMods and other modified code (@braczka , sea ice person?)
  4. Compsets

Depth of testing

It's possible that a different level of testing could be done,
based on what parts of CESM had been upgraded (e.g. multi-instance scripting
vs a wave-model upgrade, which is currently irrelevant to DART)
or what level of release is being made (e.g. CESM2_3_1 vs CESM3).
? Does the testing currently vary, depending on these things?

Size of Ensemble

Several times in the past, tests using 2-3 members have passed,
but tests with larger numbers have failed, or shown unacceptable performance degradation.
My (Kevin's) intuition is that at least 10 members are required to see the latter.
For example, at one point the number of calls to the serial task, python build-namelist
was a function of the number of instances squared. That was not noticable
for 2 members, but for 80 it was 6400 calls.

Components

DART currently has interfaces to atm (CAM-FV, CAM-SE, WACCM(-X)), CLM5, POP(2?),
and CICE. The potential next interfaces are to a river or land ice model.
Work is underway to do assimilation with multiple components,
but that will (probably) be hidden within the ESP component and not require testing by CESM.
As of 2018-6 any compset that included a land ice component other than "stub"
couldn't be used, because all of those other land ice models cannot use the gregorian calendar.
A compset defined (2018) specifically for CAM6 assimilations is
FHIST_DARTC6 = HIST_CAM60_CLM50%SP_CICE%PRES_DOCN%DOM_SROF_SGLC_SWAV.

SourceMods

Each DART interface may currently require SourceMods in order to build
a model that works with DART effectively. These have been necessary partly because
of the lack of testing within CESM. It may be easy, or at least appropriate,
to include some of them into CESM, while others may not be. See attached files for examples.
There may be changes to CIME, which we implemented for DART,
which are not in the (cam.src) SourceMods. For example,
/glade/work/raeder/Models/cesm2_1_relsd_m5.6/cime/src/drivers/mct/cime_config/buildnml
has a time saving upgrade that just changes the log file name in modelio namelist files,
instead of regenerating them in every assimilation cycle.
There's no SourceMods mechanism for CIME code, that I know of, so it needs to be substituted manually.
? Do other components have nonSourceMods changes?

Size of the model(s)

We have not run into cases in which the resolution of the model was a factor
in the testing or functionality of the code. Of course, it's always possible
to exceed resources using a high resolution model, but that's not in the testing scope.
So testing a "large" ensemble may not require a large number of nodes, which can delay testing.

@kdraeder kdraeder added Enhancement New feature or request Discussion Requires team discussion CAM Community Atmosphere Model CLM Community Land Model labels Mar 10, 2023
@kdraeder
Copy link
Contributor Author

Here's the SourceMods I developed for the CAM6 Reanalysis.
cesm2_1_relsd_m5.6z_DART+CAM_SourceMods.tgz

Here are the changes to CIME I made for the CAM6 Reanalysis, which are candidates for merging into main.
See the included file merge_list_2023-3-11 for details.
DART_CIME_maint-5.6_mods.tgz

There were more changes needed (or useful), but they are too specific to DART and the Reanalysis
to include. So we will still have mods that we need to install manually (unless there's a SourceMods
mechanism for CIME).

@kdraeder
Copy link
Contributor Author

It seems that we'll want to gather SourceMods (and other software variants)
from the other components that have DART interfaces; CLM, POP, and CICE. (MPAS?, ...?)
I don't have reliable access to those.

@kdraeder
Copy link
Contributor Author

CIME github issue #2455 shows that a multi-instance test for CAM ("dartcambigens")
has been developed and is being used in pre-beta tests .
This may work for other components, or it could be used as a template.

Jim Edwards would like DART to be able to run with no modifications to CIME,
so I'll open an issue in the CIME github to handle importing our changes.

@kdraeder
Copy link
Contributor Author

kdraeder commented May 15, 2023

So far several CAM Reanalysis modifications to CESM2.1 have been resolved in CESM2.3 (CMEPS mode).

  1. The slow creation of modelio_nml has been solved by handling them (in parallel) in fortran code, instead of serially in python.
  2. The inability of the driver to write "daily" auxiliary coupler history files ("forcing") at the end of a forecast that's < 24 hours has been fixed in CMEPS.
  3. The issue of incompatibilities between some (aux) history file names and their contents (related to averaging and the times in the files) appears to have been fixed by the uniformity and generality built into components/cmeps/cime_config/namelist_definition_drv.xml.

The next issue I'll try to resolve is DART's creation of several more file "types", that CESM's st_archive doesn't handle; means, spreads, obs_seq files, stages, etc.
The atm variation of this was controlled by

  • components/cam/cime_config/config_archive.xml
  • cime/config/cesm/config_archive.xml (->? in CESM2.3)
  • cime/scripts/lib/CIME/case/case_st_archive.py (->? cime/CIME/case/case_st_archive.py)
  • cime/src/drivers/mct/cime_config/config_archive.xml (->? ./cmeps/cime_config/config_archive.xml)

but there may be similar changes (hopefully the same) needed in the other components.
My version is in /glade/u/home/raeder/cesm2_1_relsd_m5.6/SourceMods/src.cam/config_archive.xml.
@braczka @amrhein @johnsonbk if you have any modified config_archive.xml for the ocn, lnd, etc.,
or strategies that you prefer for handling the new file types, please send them along.
We may need to do this for CICE too, but without an expert on hand.
I'd like to organize this before opening an issue in CESM.

@hkershaw-brown
Copy link
Member

  • running with calendar=GREGORIAN

@kdraeder
Copy link
Contributor Author

I'm working to include all DART output files in the st_archive process.
I'd like to hear any thoughts about the following strategy.

The top level decision is that the assimilate.csh script for each component should rename
the set of DART output files, which we want to archive, using the CESM file naming convention.
This minimizes the changes to CESM code and will make the DART+CESM interfaces
more uniform. It should also handle coupled assimilation, which may or will create
DART output files for multiple components; obs_seq.final files for both CAM and POP.

Then there were questions about files that are associated with a component and DART,
such as the ensemble of files for each stage. To me those seem like a kind of history
file (as opposed to the other 2 archive categories; restart and log) of the component,
so I chose to archive those in the archive/$component/hist directories.
This is accomplished by adding a history file extension .e. to each component's
config_archive.xml file and naming the files
${casename}.${comp}_${instance}.e.${stage}_${domain}.${date}.nc, e.g.
St_arch_beta17_3inst.clm2_0001.e.forecast_d01.1850-01-01-21600.nc
This prevents the archive/esp/hist directory from becoming cluttered, and also results in shorter names.

I chose to archive the ensemble and inflation; mean and sd files in the archive/esp/hist directory,
since they are more closely tied to DART than the components, in my view. E.g.
St_arch_beta17_3inst.dart.e.clm2_analysis_mean_d01.1850-01-01-21600.nc
If there's a good reason to put the _d01 somewhere else in the name, let me know.

The obs_seq.final files are also there, with a component in the name:
St_arch_beta17_3inst.dart.e.cam_obs_seq_final.1850-01-01-21600

I also chose to rename the input.nml files as log files:
da.cam.input.nml.log.5023008.desched1.240702-055618.gz
The log file from assimilate.csh is still
da.log.5023008.desched1.240702-055618.gz
St_archive uses the 'log' in the name to archive them in the archive/logs directory.

@braczka
Copy link
Contributor

braczka commented Jul 23, 2024

@kdraeder, Thank you and sorry for delayed response. All of your choices seem reasonable to me. One alternative might be to create a separate 'archive/$component/DART' directory for all DART related files. However, we may be trying to stick to existing archived directories only?

@kdraeder
Copy link
Contributor Author

kdraeder commented Jul 23, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CAM Community Atmosphere Model CLM Community Land Model Discussion Requires team discussion Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants