Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/coarsen on pressure #87

Merged
merged 79 commits into from
Jan 15, 2020
Merged

Feature/coarsen on pressure #87

merged 79 commits into from
Jan 15, 2020

Conversation

oliverwm1
Copy link
Contributor

Introduces functions to coarsen restart files on constant pressure surfaces, as well as compute layer thicknesses assuming hydrostatic balance and a consistent surface height.

Oliver Watt-Meyer and others added 30 commits December 17, 2019 16:06
Interpolating cell-centered quantities to the interfaces fails when using a 1-based indexing system for the tiles. We chose this to more closely align FV3s file naming conventions, but I think we should revert to 0-based indexing everywhere.
@oliverwm1
Copy link
Contributor Author

Not sure what's going on, but there are some strange artifacts in the coarsened data with the code as it stands :/ Will look into this in the new year. Happy holidays all!

Unknown-2

@spencerkclark
Copy link
Member

I did a little digging -- it's indeed a subtle issue. The root cause is the following pattern in remap_levels (da_no_coords is representative of p_out, which has no horizontal coordinates when it enters the function):

In [1]: import numpy as np; import xarray as xr

In [2]: da_no_coords = xr.DataArray(np.random.random((2, 2)), dims=['x', 'y'])

In [3]: da_no_coords
Out[3]:
<xarray.DataArray (x: 2, y: 2)>
array([[0.14846017, 0.39570592],
       [0.00748756, 0.36609841]])
Dimensions without coordinates: x, y

In [4]: da_no_coords_stacked_unstacked = da_no_coords.stack(column=('x', 'y')).unstack()

Note that stacking and unstacking leads to the addition of dummy coordinates to the previously unlabeled 'x' and 'y' dimensions; this was a design decision that dates back to the initial implementation of dimensions without coordinates in xarray (though there doesn't seem to be much discussion about it).

In [5]: da_no_coords_stacked_unstacked
Out[5]:
<xarray.DataArray (x: 2, y: 2)>
array([[0.14846017, 0.39570592],
       [0.00748756, 0.36609841]])
Coordinates:
  * x        (x) int64 0 1
  * y        (y) int64 0 1

If we have a reference dataset that has labeled 'x' and 'y' dimensions (like the restart files), and then try to insert this DataArray into it, xarray will try to align things using the dummy coordinates creating via stacking. This leads to NaNs on the outer edges when you do ds_remap[var] = remap_levels(...).

Details

In [6]: da_coords = xr.DataArray(np.random.random((2, 2)), coords=[range(1, 3), range(1, 3)], dims=
   ...: ['x', 'y'])

In [7]: da_coords
Out[7]:
<xarray.DataArray (x: 2, y: 2)>
array([[0.72792859, 0.34686449],
       [0.4581061 , 0.85930211]])
Coordinates:
  * x        (x) int64 1 2
  * y        (y) int64 1 2

In [8]: ds_coords = da_coords.rename('foo').to_dataset()

In [9]: reference = xr.zeros_like(ds_coords)

In [10]: reference['a'] = da_no_coords_stacked_unstacked

In [11]: reference
Out[11]:
<xarray.Dataset>
Dimensions:  (x: 2, y: 2)
Coordinates:
  * x        (x) int64 1 2
  * y        (y) int64 1 2
Data variables:
    foo      (x, y) float64 0.0 0.0 0.0 0.0
    a        (x, y) float64 0.3661 nan nan nan

The upshot is, we should probably restore the horizontal coordinates on delp_coarse_on_fine in _remap_given_delp before passing things down to remap_levels. I think you might be able to take advantage of something like this function I wrote for coarse-graining the surface data:
https://github.com/VulcanClimateModeling/fv3net/blob/0439cdb57c7b71731a3a11bd220aee337c7775d0/external/vcm/vcm/complex_sfc_data_coarsening.py#L420-L420
but I think you might need to modify it some to make it more robust to staggered vs. unstaggered dimensions and differing orders of downsampled vs. reference dimensions. A nice thing about that function is that it also restores chunk sizes in addition to coordinates.

@oliverwm1
Copy link
Contributor Author

Thanks for looking into this Spencer. Indeed a subtle issue, nice job tracking it down! This would have been introduced when I switched f_out to be initialized from p_out instead of f_in. I’ll take a look at your suggestion and work on implementing today.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @oliverwm1; I'm pleased with where this is now. Thanks for all the changes.

Copy link
Contributor

@nbren12 nbren12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good, but it seems like there is a too much code for dealing with metadata and coordinate info. I think it would be cleaner to assume all the restart data is one Dataset with corrected metadata as returned by open_restarts. Then this PR would introduce a single function

def coarse_grain_on_pressure(restart_data: xr.Dataset) -> xr.Dataset:
   ...

I am working on another PR #97, which will make it easier to use the dimension renaming code on it's own. Finally, we would need to add a function to write the single combined restart Dataset to the separate files, but that shouldn't be too hard.

external/vcm/vcm/calc/thermo.py Outdated Show resolved Hide resolved
external/vcm/vcm/cubedsphere/remapz.py Outdated Show resolved Hide resolved
external/vcm/vcm/cubedsphere/remapz.py Outdated Show resolved Hide resolved
external/vcm/vcm/cubedsphere/remapz.py Outdated Show resolved Hide resolved
external/vcm/vcm/coarsen.py Outdated Show resolved Hide resolved
@oliverwm1
Copy link
Contributor Author

I think I've addressed all your comments, @nbren12. I went back to not outputting a vertical coordinate when calculating pressure or height on layer interfaces, which reduces some of the dimension/coordinate complications.

I agree the code would be simpler if we could assume all the restart data is in one xarray Dataset, but I think it is outside of the scope of this PR to build out that framework. For this PR, I took the approach of replicating the current model-level methods for coarsening, which operate separately-ish on each "restart category" (coarsening fv_tracer does depend on fv_core).

Copy link
Contributor

@nbren12 nbren12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to merge, but I think we should open an issue to make this code work on the combined restart data (as returned by open_restarts). I think this will make it easier to implement with dataflow.

cc @frodre

@oliverwm1
Copy link
Contributor Author

Feel free to merge, but I think we should open an issue to make this code work on the combined restart data (as returned by open_restarts). I think this will make it easier to implement with dataflow.

cc @frodre

Sounds good, I made an issue.

@oliverwm1 oliverwm1 merged commit f91209b into master Jan 15, 2020
@oliverwm1 oliverwm1 deleted the feature/coarsen-on-pressure branch January 15, 2020 16:04
spencerkclark added a commit that referenced this pull request Jun 28, 2022
)

In the original implementation (#87), the default method for coarse-graining 3D fields in the restart files was to first vertically remap them to a common set of pressure levels within a coarse grid cell, and then take a masked-mass-weighted average (masking is to take into account the fact that some fine-grid cells do not have air at pressures we are interpolating to, so we ignore them in averaging).  Here the masked mass was computed as the product of the masked fine-grid cell area and the original fine-grid cell pressure thickness.  In reality though, since we have interpolated the fine-grid fields to constant pressure surfaces, for the pressure thickness we should use the updated pressure thickness field (i.e. "delp_coarse_on_fine"), or better yet, no pressure thickness at all, since it will be constant within a coarse grid box and so will not contribute anything meaningful to the weighting.

This PR addresses this by switching to using just the masked area as weights when pressure-level coarse-graining restart files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants