Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bilinear resampler for 1D data #324

Merged
merged 9 commits into from
Jun 27, 2022
Merged

Conversation

pnuu
Copy link
Member

@pnuu pnuu commented Jan 8, 2021

Resampling of 1D data with the Numpy-based bilinear resampler got broken when I did the big refactoring with common base for Numpy and XArray resamplers. This PR will eventually allow 1D data to be resampled with both.

@pnuu pnuu added the bug label Jan 8, 2021
@pnuu pnuu requested review from mraspaud and djhoese January 8, 2021 08:19
@pnuu pnuu self-assigned this Jan 8, 2021
@ghost
Copy link

ghost commented Jan 8, 2021

DeepCode's analysis on #4760fd found:

  • ℹ️ 1 minor issue. 👇

Top issues

Description Example fixes
Access to a protected member _source_geo_def of a client class Occurrences: 🔧 Example fixes

👉 View analysis in DeepCode’s Dashboard | Configure the bot

@pnuu
Copy link
Member Author

pnuu commented Jan 8, 2021

This works for the NumpyBilinearResampler (see the linked issue), but with XArrayBilinearResampler there comes a problem with the DataArray coordinates and dimensions: ValueError: different number of dimensions on data and dims: 2 vs 1

The below snippet from debugger shows the problem:

In [3]: res = resampler.get_sample_from_bil_info(data)
> /home/lahtinep/Software/pytroll/pytroll_packages/pyresample/pyresample/bilinear/xarr.py(120)_finalize_output_data()
    119         import ipdb; ipdb.set_trace()
--> 120         return DataArray(res, dims=data.dims, coords=self._out_coords)
    121 

ipdb> res
dask.array<reshape, shape=(1959, 1934), dtype=float64, chunksize=(1959, 1934), chunktype=numpy.ndarray>
ipdb> data.dims
('dim_0',)
ipdb> self._out_coords
{'x': array([2427377.90891052, 2429377.70033157, 2431377.49175261, ...,
       6288975.14294739, 6290974.93436843, 6292974.72578948]), 'y': array([5445513.41697785, 5443513.20653354, 5441512.99608923, ...,
       1533101.78791077, 1531101.57746646, 1529101.36702215]), 'dim_0': <xarray.DataArray 'dim_0' (dim_0: 2457600)>
array([      0,       1,       2, ..., 2457597, 2457598, 2457599])
Dimensions without coordinates: dim_0}

The dim_0 dimension comes from the raveled input data I used with no proper dimension labels, and is added by _add_missing_coordinates method. This method is there to add eg. the bands dimension when used via Satpy, but in the case of 1D data adds a nonsensical dimension. Any suggestion what to do in this case?

@ibrewster
Copy link

I was wondering if there was any update on this? I'm currently still stuck on pyresample v1.16, which in turn appears to mean that I can't update beyond python 3.9. I was going to "fix" my code to work with the current version of pyresample (and the XArrayBilinearResampler), but my data is in a 1-D timeseries format, and every thought I have come up with as to working with it in a 2-D format has, so far, been painful 😃

@djhoese
Copy link
Member

djhoese commented May 12, 2022

@pnuu I'm not sure I understand the question/problem as you've put it in your last comment. If there is only a single dimension on the input data, can you assume/require that it be considered a y dimension? Does that help at all? That's what I ended up doing for some readers in Satpy a while ago.

@pnuu
Copy link
Member Author

pnuu commented May 12, 2022

I'll see if I have some time tomorrow to have a look and figure out what the actual problem was last year.

@pnuu
Copy link
Member Author

pnuu commented May 13, 2022

As a start, I pushed a failing test, pytest pyresample/test/test_bilinear.py::TestXarrayBilinear::test_get_sample_from_bil_info_1d:

=========================================================== FAILURES ============================================================
______________________________________ TestXarrayBilinear.test_get_sample_from_bil_info_1d ______________________________________

self = <pyresample.test.test_bilinear.TestXarrayBilinear testMethod=test_get_sample_from_bil_info_1d>

    def test_get_sample_from_bil_info_1d(self):
        """Test resampling using resampling indices for 1D data."""
        import dask.array as da
        from xarray import DataArray
        from pyresample.bilinear import XArrayBilinearResampler
    
        resampler = XArrayBilinearResampler(self.source_def_1d, self.target_def,
                                            self.radius)
        resampler.get_bil_info()
    
        # Sample from 1D data
        data = DataArray(da.ones(self.source_def_1d.shape), dims=('y'))
>       res = resampler.get_sample_from_bil_info(data)  # noqa

pyresample/test/test_bilinear.py:747: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pyresample/bilinear/_base.py:236: in get_sample_from_bil_info
    return self._finalize_output_data(data, res, fill_value)
pyresample/bilinear/xarr.py:120: in _finalize_output_data
    return DataArray(res, dims=data.dims, coords=self._out_coords)
../../../../mambaforge/envs/pytroll/lib/python3.9/site-packages/xarray/core/dataarray.py:406: in __init__
    coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

shape = (4, 4)
coords = {'x': array([-1070912.72,  -470912.72,   129087.28,   729087.28]), 'y': array([1190031.36,  590031.36,   -9968.64, -609968.64])}
dims = ('y',)

    def _infer_coords_and_dims(
        shape, coords, dims
    ) -> "Tuple[Dict[Any, Variable], Tuple[Hashable, ...]]":
        """All the logic for creating a new DataArray"""
    
        if (
            coords is not None
            and not utils.is_dict_like(coords)
            and len(coords) != len(shape)
        ):
            raise ValueError(
                f"coords is not dict-like, but it has {len(coords)} items, "
                f"which does not match the {len(shape)} dimensions of the "
                "data"
            )
    
        if isinstance(dims, str):
            dims = (dims,)
    
        if dims is None:
            dims = [f"dim_{n}" for n in range(len(shape))]
            if coords is not None and len(coords) == len(shape):
                # try to infer dimensions from coords
                if utils.is_dict_like(coords):
                    dims = list(coords.keys())
                else:
                    for n, (dim, coord) in enumerate(zip(dims, coords)):
                        coord = as_variable(coord, name=dims[n]).to_index_variable()
                        dims[n] = coord.name
            dims = tuple(dims)
        elif len(dims) != len(shape):
>           raise ValueError(
                "different number of dimensions on data "
                f"and dims: {len(shape)} vs {len(dims)}"
            )
E           ValueError: different number of dimensions on data and dims: 2 vs 1

../../../../mambaforge/envs/pytroll/lib/python3.9/site-packages/xarray/core/dataarray.py:123: ValueError

@codecov
Copy link

codecov bot commented May 17, 2022

Codecov Report

Merging #324 (bf047fe) into main (bc1cdc7) will increase coverage by 1.29%.
The diff coverage is 92.59%.

@@            Coverage Diff             @@
##             main     #324      +/-   ##
==========================================
+ Coverage   92.69%   93.98%   +1.29%     
==========================================
  Files          45       65      +20     
  Lines        9166    11335    +2169     
==========================================
+ Hits         8496    10653    +2157     
- Misses        670      682      +12     
Flag Coverage Δ
unittests 93.98% <92.59%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pyresample/bilinear/_base.py 95.80% <66.66%> (+0.12%) ⬆️
pyresample/bilinear/xarr.py 92.90% <83.33%> (+0.77%) ⬆️
pyresample/test/test_bilinear.py 100.00% <100.00%> (ø)
pyresample/utils/cartopy.py 59.25% <0.00%> (-35.09%) ⬇️
pyresample/bucket/__init__.py 97.95% <0.00%> (-0.96%) ⬇️
pyresample/grid.py 87.50% <0.00%> (-0.74%) ⬇️
pyresample/utils/__init__.py 83.52% <0.00%> (-0.57%) ⬇️
pyresample/kd_tree.py 92.02% <0.00%> (-0.46%) ⬇️
pyresample/plot.py 42.37% <0.00%> (ø)
pyresample/boundary.py 94.44% <0.00%> (ø)
... and 59 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc1cdc7...bf047fe. Read the comment docs.

@pnuu
Copy link
Member Author

pnuu commented May 17, 2022

@ibrewster I hope the last pushes fixes this, but could you test with your own real-world data?

@pnuu
Copy link
Member Author

pnuu commented May 17, 2022

pre-commit.ci autofix

@ibrewster
Copy link

Yeah, I'll try to test this later today or tomorrow. Thanks!

@ibrewster
Copy link

...and my "later today or tomorrow" apparently became "two weeks later" 😛. Sorry about that! In any case, once I figured out how to properly incorporate this pull request into my local copy, I was able to successfully test it and confirm that it works as expected with my real-world data. Thanks!

@pnuu pnuu marked this pull request as ready for review June 3, 2022 07:05
@pnuu
Copy link
Member Author

pnuu commented Jun 3, 2022

Thanks for testing! I've now marked this as ready for review.

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More complicated than I had hoped it would be but nice job figuring it all out.

def _get_output_dims(self, data, res):
if data.ndim == res.ndim:
return data.dims
return list(self._out_coords.keys())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does self._out_coords not work? I mean, when is the return data.dims needed? Shouldn't self._out_coords always be valid? What if the input data had non-geo dimensions ('bands', 'time_bounds', etc)?

Copy link
Member Author

@pnuu pnuu Jun 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyresample/test/test_bilinear.py::TestXarrayBilinear::test_get_sample_from_bil_info seems to fail without it because the coordinates are in the wrong order (('x', 'y') vs. ('y', 'x')). Based on the currently existing tests, data.dims seems to be sufficient and self._out_coords are never accessed 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that thinking emoji mean you think you need to add more tests?

pyresample/bilinear/xarr.py Show resolved Hide resolved
@djhoese djhoese merged commit e531bd9 into pytroll:main Jun 27, 2022
@pnuu pnuu deleted the bugfix-1d-bilinear branch December 8, 2022 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to pyresample 1.17.0 causes IndexError with one-dimensional data
3 participants