Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking recipes (Lauer et al.) #3598

Open
wants to merge 71 commits into
base: main
Choose a base branch
from

Conversation

axel-lauer
Copy link
Contributor

@axel-lauer axel-lauer commented May 16, 2024

Description

This PR implements a set of benchmarking recipes for comparison of different metrics (RMSE, bias, correlation, EMD) calculated for a given model simulation to the results from an ensemble of (model) datasets:

For this, the existing monitoring diagnostics monitoring/monitor.py and monitor/multi_datasets.py have been extended.

The new diurnal cycle plot has also been added to the following existing recipes:

Documentation for the benchmarking recipes is available in recipes/recipe_benchmarking.rst, the documentation for monitoring and model evaluation have been updated to include the diurnal cycle plots.

Note for testing

The benchmarking recipes require the new preprocessor functions local_solar_time and distance_metric and the extended version of preprocessor resample_hours.

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

New or updated recipe/diagnostic

Copy link
Contributor

@alistairsellar alistairsellar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @schlunma and @axel-lauer !

Some minor suggestions and questions from me.

I note also that readthedocs build still failing (so I haven't fully reviewed the documentation), and a couple of style complaints in run_tests.

@@ -27,6 +27,11 @@
produce multi panel plots for data with `shape_id` or `region`
coordinates of length > 1. Supported coordinates: `time`, `shape_id`
(optional) and `region` (optional).
- Diurnal cycle (plot type ``diurnal_cycle``): Generate a diurnal cycle
plot (timeseries like climatological from 0 to 24 hours). It will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
plot (timeseries like climatological from 0 to 24 hours). It will
plot (timeseries like climatological from 0 to 23 hours). It will

I presume it doesn't show the 0 (=24) hour twice, though that would be a valid choice if it did and was documented as such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact the example diurnal plot in this PR only appears to plot hours 1 to 22 inclusive. Is that a data or code limitation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a data limitation. We wrote 0-24 hours as (if data are available), one could plot 0:00:00 to 23:59:59.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a data limitation.
Does MIROC not provide a full hourly timeseries? Or some days are missing some hours and the diurnal cycle mean requires them all?

We wrote 0-24 hours as (if data are available), one could plot 0:00:00 to 23:59:59.
Thanks, that makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 3-hourly CMIP6 output used in the example provides data at 1.30 (0.00-3.00), 4.30 (3.00-6.00), ..., 22.30 (21.00-24.00). CMIP6 data are converted to full hours by preprocessor resample_hours so that they can be compared to ERA5 data (provided at full hours). The preprocessor distance_metric, applied afterwards requires all datasets to have the same time coordinate.

The hourly climatology is done inside the function so the users can
plot both the timeseries and the diurnal cycle in one go
"""
if 'diurnal_cycle' not in self.plots:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a curious place to put the control statement, rather than where the method is called, e.g.

if 'diurnal_cycle' in self.plots:
   self.plot_diurnal_cycle(cube, var_info)

However, this seems to be an existing design decision for monitor.py prior to this PR, and arguably outside the scope of my science review, so feel free to ignore this comment,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is right, I just followed the structure that was there already.

return (plot_path, {netcdf_path: cube})

def _plot_benchmarking_boxplot(self, df, cubes, variables, datasets):
"""Plot benchmarking boxplot."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere, either here or at a higher level, it would be good to specify that boxplot methodology shows data quartiles and full distribution without outliers, with reference to seaborn.boxplot documentation for further info. Conventions for boxplots can vary, and it seems that Seaborn has its own algorithm for deciding on outliers, so it would be best to make the specifics of implementation discoverable for those who need to see the detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LisaBock Would you be able to address this comment?

Comment on lines +2054 to +2055
# Make sure that the data has the correct dimensions
cube = dataset['cube']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this line ensure that the data has the right dimensions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I understand that _plot_benchmarking_zonal() expects the cube to have zonal mean applied as pre-processor, rather than being a function that makes and plots zonal mean of 3D data. Also, often zonal means are 1D line plots (i.e. zonal mean of a 2D lat-lon field), so scope for confusion there too. Is required shape of data documented?

"""Get dataset to be benchmarked."""
variable = datasets[0][self.cfg['group_variables_by']]
benchmark_datasets = [d for d in datasets if
d.get('benchmark_dataset', False)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this really be False? It looks like it's selecting datasets that specify benchmark_dataset: false. If correct, please add a comment to avoid confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, the Python function dictionary.get(keyname, value) returns value if the specified key does not exist, so setting value to False is just to make sure that datasets which do not have a 'benchmark_dataset' key are not selected.

f"'{variable}', got {len(ref_datasets)}")
return None

def _get_benchmark_datasets(self, datasets):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there documentation (in code or docs) to indicate that user should specify benchmark_dataset: true in recipe, to select that dataset to be benchmarked? This is clear from example recipes, but should really be captured somewhere beyond those.

Worth noting in that documentation that some plots will (I think) only plot a single benchmark even if multiple specified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll add that to the documentation.

annual_cycle:
annual_mean_kwargs: False
plot_kwargs:
'MIROC6':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the choice of model here need to be consistent with selection of benchmark_dataset above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make sense to be consistent if the benchmark_dataset is to be plotted with a special line color and style. I'll add this to the documentation.

@axel-lauer axel-lauer added the requires new ESMValCore release A new release of ESMValCore is needed to solve this issue/merge this pull request.. label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
looking for technical reviewer metric new recipe Use this label if you are adding a new recipe requires new ESMValCore release A new release of ESMValCore is needed to solve this issue/merge this pull request..
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants