Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowed special case for unit conversion of precipitation (kg m-2 s-1 <--> mm day-1) #1574

Merged
merged 8 commits into from
Jun 20, 2022

Conversation

schlunma
Copy link
Contributor

@schlunma schlunma commented May 10, 2022

Description

This PR expands the preprocessor convert_units so that it supports the "special unit conversions" from kg m-2 s-1 to mm day-1 for precipitation fluxes.

Closes #1573

Link to documentation: https://esmvaltool--1574.org.readthedocs.build/projects/ESMValCore/en/1574/recipe/preprocessor.html#unit-conversion


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@schlunma schlunma added the preprocessor Related to the preprocessor label May 10, 2022
@schlunma schlunma added this to the v2.6.0 milestone May 10, 2022
@schlunma schlunma requested a review from zklaus May 10, 2022 14:52
@schlunma schlunma self-assigned this May 10, 2022
@codecov
Copy link

codecov bot commented May 10, 2022

Codecov Report

Merging #1574 (140c789) into main (de65ed8) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1574      +/-   ##
==========================================
+ Coverage   91.47%   91.48%   +0.01%     
==========================================
  Files         204      204              
  Lines       11143    11163      +20     
==========================================
+ Hits        10193    10213      +20     
  Misses        950      950              
Impacted Files Coverage Δ
esmvalcore/preprocessor/_units.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de65ed8...140c789. Read the comment docs.

zklaus
zklaus previously requested changes May 11, 2022
Copy link

@zklaus zklaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs proper handling of standard names. Also, standard names are probably the better way to identify possible conversions.

@schlunma
Copy link
Contributor Author

This still needs proper handling of standard names.

Can you elaborate on that?

Also, standard names are probably the better way to identify possible conversions.

I disagree. This preprocessor needs to be explicitly called by the user, so it will not transform data unexpectedly. If the user explicitly asks for the conversion to mm day-1 on something that "looks like" precipitation fluxes (identified by the correct input units kg m-2 s-1 and the short_name pr or the appearance of precipitation in the standard or long name) I think we should allow it.

Probably most of ESMValTool's input data uses the correct standard names, but we also need to think about possible derived variables without a standard name or the usage of this preprocessor outside of ESMValTool (e.g., in a Jupyter Notebook) on data that is not fully CMOR-compliant.

@zklaus
Copy link

zklaus commented May 11, 2022

This still needs proper handling of standard names.

Can you elaborate on that?

Sure. If there is a standard name, then it has associated canonical units. Changing the units without changing the standard name produces inconsistent results.

Also, standard names are probably the better way to identify possible conversions.
I disagree. This preprocessor needs to be explicitly called by the user, so it will not transform data unexpectedly. If the user explicitly asks for the conversion to mm day-1 on something that "looks like" precipitation fluxes (identified by the correct input units kg m-2 s-1 and the short_name pr or the appearance of precipitation in the standard or long name) I think we should allow it.

It's not about allowing it or not, but rather about having the necessary information. Suppose you want to convert (convective_precipitation_flux, kg m-2 s-1) to mm day-1. What is the quantity you want? Is it convective_precipitation_rate or lwe_convective_precipitation_rate? In some cases, a mapping table of standard names may allow you to identify the correct change, but some are ambiguous.

Probably most of ESMValTool's input data uses the correct standard names, but we also need to think about possible derived variables without a standard name or the usage of this preprocessor outside of ESMValTool (e.g., in a Jupyter Notebook) on data that is not fully CMOR-compliant.

I don't see the impact of use outside of ESMValTool; this is also not about CMOR but just the basic CF. If there is no standard name, things may be different, though in that case, I think it is easier to treat this as a data problem and apply a fix in the preprocessor.

While we can consider all kinds of things outside of ESMValTool and with highly irregular data (see the Australian SILO data for a dataset that never mentions precipitation but uses "rain"), let's start by making sure that the normal use inside ESMValTool doesn't turn correct data into incorrect data.

@schlunma
Copy link
Contributor Author

Sure. If there is a standard name, then it has associated canonical units. Changing the units without changing the standard name produces inconsistent results.

The CF conventions say: that "Unless it is dimensionless, a variable with a standard_name attribute must have units which are physically equivalent (not necessarily identical) to the canonical units [...]". One could definitely argue that the two units are physically equivalent here.

It's not about allowing it or not, but rather about having the necessary information. Suppose you want to convert (convective_precipitation_flux, kg m-2 s-1) to mm day-1. What is the quantity you want? Is it convective_precipitation_rate or lwe_convective_precipitation_rate? In some cases, a mapping table of standard names may allow you to identify the correct change, but some are ambiguous.

If we really want to go down this road with have to modify a large part of our preprocessors. Here are some examples:

  • area_statistics with operator: sum basically multiplies the data with m2, but does neither change the standard_name or the units.
  • anomalies with standardize: true changes the units to 1 but also does not touch the standard_name
  • linear_trend also changes the units but not the standard_name.

@zklaus
Copy link

zklaus commented May 11, 2022

Sure. If there is a standard name, then it has associated canonical units. Changing the units without changing the standard name produces inconsistent results.

The CF conventions say: that "Unless it is dimensionless, a variable with a standard_name attribute must have units which are physically equivalent (not necessarily identical) to the canonical units [...]". One could definitely argue that the two units are physically equivalent here.

While the language of the conventions in some places is regrettably unclear, the standard names table has a more explicit note about this at the top. In the CF community it is generally understood that units must be convertible to the canonical units by udunits.

It's not about allowing it or not, but rather about having the necessary information. Suppose you want to convert (convective_precipitation_flux, kg m-2 s-1) to mm day-1. What is the quantity you want? Is it convective_precipitation_rate or lwe_convective_precipitation_rate? In some cases, a mapping table of standard names may allow you to identify the correct change, but some are ambiguous.

If we really want to go down this road with have to modify a large part of our preprocessors. Here are some examples:

* `area_statistics` with `operator: sum` basically multiplies the data with `m2`, but does neither change the `standard_name` or the `units`.
* `anomalies` with `standardize: true` changes the units to `1` but also does not touch the `standard_name`
* `linear_trend` also changes the `units` but not the `standard_name`.

Go ahead and open the issues. Good to track them.

Of course, previous mistakes are no reason to introduce more mistakes, much less when a clear solution is available.

@schlunma
Copy link
Contributor Author

The only thing I can offer is to remove the standard_name from the datasets entirely whenever this "special" conversion is used. I won't go through each of the possible candidates and check if there is an appropriate counterpart. Otherwise I will close this PR.

@zklaus
Copy link

zklaus commented May 11, 2022

The code I posted in the issue contains a decent start. I imagine you have a concrete problem you are trying to solve, which might give you another entry. Why not start with those and throw an exception otherwise. Then anyone who needs a new case can add it quickly.

@valeriupredoi
Copy link
Contributor

I like this and I support @schlunma here - we need to allow the user to change whatever they want as long as it's within some norms and regulations - I don't see anything wrong with this, and as long as it's a user option, we should become less and less restrictive wrt what the user needs. @zklaus what you mean by

much less when a clear solution is available

🍺

@schlunma
Copy link
Contributor Author

The use case I originally had in mind was for precipitation_flux; however, there is no counterpart of this with units m s-1. There is no clear obvious solution for this to me...

Also, the standard names convective_precipitation_rate and lwe_convective_precipitation_rate are not used in any of the CMIP6 CMOR tables, would it really make sense to use them?

@zklaus
Copy link

zklaus commented May 11, 2022

@valeriupredoi's points:

I like this

I too agree that this would be very nice functionality.

and I support @schlunma here - we need to allow the user to change whatever they want

Users can do whatever they want in diagnostics. In preprocessors, we have generally higher standards wrt compliance.

as long as it's within some norms and regulations

That's exactly the point: It's not. If there is a standard name, the units must agree with it; hence changing the units without changing the standard name produces inconsistent results.

  • I don't see anything wrong with this, and as long as it's a user option, we should become less and less restrictive wrt what the user needs. @zklaus what you mean by

much less when a clear solution is available

The solution is to change the standard name along with the units. The simplest thing for us to do is to require the user to provide the standard name. Other approaches, such as a lookup table are possible.

@schlunma's points:

The use case I originally had in mind was for precipitation_flux; however, there is no counterpart of this with units m s-1. There is no clear obvious solution for this to me...

The counterpart for precipitation_flux is lwe_precipitation_rate. Why are snow and graupel included in precipitation and not separated out in the "liquid water equivalent"? Because precipitation_flux is mass-based and the mass doesn't change with the form of the precipitation, but the volume and with it the column height does.

Also, the standard names convective_precipitation_rate and lwe_convective_precipitation_rate are not used in any of the CMIP6 CMOR tables, would it really make sense to use them?

As model resolution is increased, these become important. The first we'll likely have to deal with it is in CORDEX.

@valeriupredoi
Copy link
Contributor

The solution is to change the standard name along with the units. The simplest thing for us to do is to require the user to provide the standard name. Other approaches, such as a lookup table are possible.

right! OK, I agree with this and I too think we should require that.

About the flexibility point - higher standards for norms are good but we must find the balance between them and usability - remember the age-old argument about data being strictly CMOR, ok maybe less, fine we can just allow the user to run with only CF-compliant data if they wish - these sort of things push our tool away from users. But yeah, wasn't sure what you meant and cheers for clarifying, K-man!

@schlunma
Copy link
Contributor Author

All right, I changed it now so that only precipitation_flux is currently supported. I still think that's an unnecessary restriction though.

@schlunma
Copy link
Contributor Author

@zklaus any further comments on this?

@zklaus
Copy link

zklaus commented May 13, 2022

Good progress! I agree with you that it would be nice to include more standard names. We could start with the inverse conversion. Suppose we set things up to have a collection of standard names with any necessary additional information that allows interconversion between any two of them. In that case, we have a flexible system that allows the easy addition of more standard names as required.

I would also not frame this conversion as "allowing an exception" that cf-units is forbidding (for some nefarious reason?). This really is just something completely different, namely the conversion of one quantity (e.g. mass flux per time per area) to a very different quantity (e.g. rate in column height per time); of course a unit conversion cannot do that.
In that context, it is probably worth mentioning in the documentation that this conversion is rather simple because 1mm of water over 1m2 turns out to weigh exactly 1kg at a density of 1000 kg/m3---a sensible default for esm evaluation, but perhaps not sufficiently accurate for some higher precision comparisons, certainly reason enough for small discrepancies.

On the technical side, we now have two unit conversions on the cube. That is a potentially costly operation that we can avoid by calculating a single conversion factor.

@schlunma
Copy link
Contributor Author

Good progress! I agree with you that it would be nice to include more standard names. We could start with the inverse conversion. Suppose we set things up to have a collection of standard names with any necessary additional information that allows interconversion between any two of them. In that case, we have a flexible system that allows the easy addition of more standard names as required.

What do you have in mind here? The system is already flexible, further standard_names can be supported by extending the special_cases dictionary.

I would also not frame this conversion as "allowing an exception" that cf-units is forbidding (for some nefarious reason?). This really is just something completely different, namely the conversion of one quantity (e.g. mass flux per time per area) to a very different quantity (e.g. rate in column height per time); of course a unit conversion cannot do that. In that context, it is probably worth mentioning in the documentation that this conversion is rather simple because 1mm of water over 1m2 turns out to weigh exactly 1kg at a density of 1000 kg/m3---a sensible default for esm evaluation, but perhaps not sufficiently accurate for some higher precision comparisons, certainly reason enough for small discrepancies.

I 100% agree that cf_units should not allow that. Since this is an addition to the convert_units preprocessor it feels like an "exception"; however, I will try to rephrase this.

On the technical side, we now have two unit conversions on the cube. That is a potentially costly operation that we can avoid by calculating a single conversion factor.

Good point, will change that!

@zklaus
Copy link

zklaus commented May 13, 2022

Right now there is a clear asymmetry: You have to specify a source and a target for every conversion. To add the inverse, you need to duplicate the information. Worse, for n quantities that are all mutually convertible, you need n*(n-1) = O(n^2) entries in the table, which is more difficult to maintain than the n entries you need in a simple set of quantities. Hence my preference.

Regarding your other comments, perhaps this fits better into its own preprocessor?

@schlunma
Copy link
Contributor Author

Regarding your other comments, perhaps this fits better into its own preprocessor?

I don't think so. Converting precipitation data from kg m-2 -s1 to mm day-1 is a standard operation which (I think) most people associate with "converting units".

@zklaus
Copy link

zklaus commented May 13, 2022

It certainly is a standard operation and I have no problem colocating it in this preprocessor.

@schlunma
Copy link
Contributor Author

On the technical side, we now have two unit conversions on the cube. That is a potentially costly operation that we can avoid by calculating a single conversion factor.

For units that differ by a constant factor that's not a problem, but for others (like degC -> K) this is not trivial. For simplicity, I would leave it as is, especially since it's only really two conversions if the source units differ from the ones in the cube, which should not be the case for most datasets.

@zklaus
Copy link

zklaus commented May 13, 2022

I am not sure I follow. This only applies to the special quantity conversion, no? To be clear, I was referring only to the two conversions in lines 74 and 76, not to the one triggering the special treatment in line 57.

@schlunma
Copy link
Contributor Author

To be clear, I was referring only to the two conversions in lines 74 and 76, not to the one triggering the special treatment in line 57.

Yes, that's what I understood.

As far as can tell, in order to avoid one of them, you would need to calculate a conversion factor between the two (e.g., from cf_units.Unit.convert(). While this is straightforward for units which differ just by a constant factor (e.g., m and km), you would need to a more complex syntax for units that differ by a more general linear transformation ax + b (e.g., celsius and fahrenheit). Since the first conversion in l.74 will be a trivial operation in 99% of the data (units of precipitation_flux will most likely be in kg m-2 s-1, not in g m-2 yr-1), I don't think it's worth the effort to avoid this second conversion.

@zklaus
Copy link

zklaus commented May 13, 2022

As for efficiency, I think the impact in terms of FLOPS is likely small in any case. I am a bit more concerned with the impact on laziness and the number of additional tasks that the dask scheduler will have to deal with. Since this is an ad-hoc solution that only covers the conversions listed explicitly, and since those are only of the form mentioned, I don't think there is a problem with a conversion factor.

@schlunma
Copy link
Contributor Author

I implemented all you comments, please give it another look 👍

@schlunma schlunma changed the title Allowed special case for unit conversion of pr (kg m-2 s-1 to mm day-1) Allowed special case for unit conversion of pr (kg m-2 s-1 <--> mm day-1) May 13, 2022
@schlunma schlunma changed the title Allowed special case for unit conversion of pr (kg m-2 s-1 <--> mm day-1) Allowed special case for unit conversion of precipitation (kg m-2 s-1 <--> mm day-1) May 13, 2022
@schlunma
Copy link
Contributor Author

@zklaus are these changes fine for you? Is there anything else I can do?

@zklaus
Copy link

zklaus commented May 19, 2022

Already looks good. I still have some comments, but need a bit more time.

@schlunma schlunma mentioned this pull request May 31, 2022
10 tasks
@sloosvel
Copy link
Contributor

sloosvel commented Jun 3, 2022

@schlunma could you please solve the conflicts in this branch?
@zklaus have the changes you requested been addressed?

Copy link
Contributor

@sloosvel sloosvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look and the changes looks reasonable to me. At the end of the day, it's work that adds a new functionality to the core with respect to the past release and that is needed by people using the tool.

@zklaus all your points are very valid but considering that the code freeze is coming up, and that @schlunma kindly adressed your requests in a timely manner would it be possible to be able to merge this?

@sloosvel
Copy link
Contributor

sloosvel commented Jun 7, 2022

Just merging with main. If by 4PM there are no news I would suggest @schlunma creates a new PR and we merge that instead since this already has two approved reviews.

@sloosvel sloosvel modified the milestones: v2.6.0, v2.7.0 Jun 8, 2022
@schlunma
Copy link
Contributor Author

I will merge this on Monday if there are no further comments.

@sloosvel sloosvel modified the milestones: v2.7.0, v2.6.0 Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

convert_units does not support "special conversions" like kg m-2 s-1 to mm day-1 for precipitation
4 participants