-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor bilinear #300
Refactor bilinear #300
Conversation
I'm going to ignore the DeepCode errors/warnings. The access to private attributes are within the tests, and in deprecated functions ( The |
Any other suggestions? @djhoese made a PR for Satpy (pytroll/satpy#1361) so that Windows testing would happen in Travis instead of Appveyor. Should we make similar change also to Pyresample, merge that, rebase/merge to this and see that also the Windows tests pass? |
How backwards compatible is this PR? I would wait to turn off appveyor. Plus the Azure builds are failing for some other reason. Not sure why yet. |
This is fully backwards compatible, including both Numpy and XArray versions. And again I mixed up Azure and Appveyor 🙄 And yeah, didn't figure what's wrong with the Azure builds. |
Looking again at the logs again, it seems that Python 3.9 fails to install
|
I think I posted it on slack but did you ever turn off python 3.9? Check the skip variable at the top of the azure config and add an entry for the cp39 equivalent. |
Oops, either didn't notice or completely forgot. Trying now. |
With these two additional |
I ran the timings again with the current Satpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job. It looks easier to follow and I like the documentation. I had a lot of suggestions for renaming and reordering, but mostly I felt like some of the refactoring went too far. A lot of the class methods seem unnecessary or that they don't do that much. With all the class instances this may just appear this way, but I'm wondering if this can be avoided.
On a larger note, what do you and @mraspaud think about avoiding putting a lot of code in __init__.py
modules? Could a lot of the stuff be put in bilinear/base.py
and/or bilinear/npy.py
?
@@ -268,10 +268,74 @@ Click images to see the full resolution versions. | |||
|
|||
The *perceived* sharpness of the bottom image is lower, but there is more detail present. | |||
|
|||
|
|||
XArrayResamplerBilinear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think Resampler
should be the last word: XArrayBilinearResampler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, while I think this section is fine here for now, I think we need to refactor the documentation. This document is getting really long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This is the name Satpy currently expects, so for backwards compatibility it needs to be like this for now. I can change this in the follow-up changes I already have waiting (not yet PR'd) for both Pyresample and Satpy.
docs/source/swath.rst
Outdated
*********************** | ||
|
||
**bilinear.XArrayResamplerBilinear** is a class that handles bilinear interpolation for data in | ||
`xarray.DataArray` arrays. The parallelisation is done automatically using `dask`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
US English would be parallelization
. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
>>> result = resampler.get_sample_from_bil_info(data) | ||
|
||
|
||
NumpyResamplerBilinear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here for the name, Resampler should be last in my opinion. Willing to debate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on swapping these sections around? Numpy first then xarray? The tests could even use the same lons/lats/data from the numpy section (I think doctest lets you do that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the naming in the follow-ups. The Xarray version is preferred for performance, so thought it should come first.
>>> source_def = geometry.SwathDefinition(lons=lons, lats=lats) | ||
>>> resampler = XArrayResamplerBilinear(source_def, target_def, 30e3) | ||
>>> resampler.get_bil_info() | ||
>>> result = resampler.get_sample_from_bil_info(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_sample_from_bil_info
seems odd given that you don't actually give it bil_info
. Why can't get_bil_info
be called automatically if it hasn't been already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The separate get_bil_info()
step is necessary when caching the resampling info. In the follow-ups I move the caching from Satpy to Pyresample, so with that the caching step can be shown here. In another follow-up I'll add resampler.resample()
method that wraps resampler.get_bil_info()
and resampler.get_sample_from_bil_info()
together. It could also have the cache_dir
kwarg so that there'd be no need to call it separately.
docs/source/swath.rst
Outdated
Function for resampling using bilinear interpolation for irregular source grids. | ||
Convenience function for resampling using bilinear interpolation for irregular source grids. | ||
|
||
..note: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this syntax work? I think you need a space after ..
and two colons ::
and a blank line after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it doesn't. Fixed.
pyresample/bilinear/__init__.py
Outdated
if self._resample_kdtree is None: | ||
return | ||
|
||
self._get_target_lonlats() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This chunk of code seems odd. Calling a series of methods sounds like it should maybe be some other method? Or even better, do these methods need to be creating/storing data in instance attributes? Could they instead return the values they are generating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some the instance attributes are re-used at least once, and the computations are not cheap (not sure how well Dask can handle them) so I think this is necessary. I'll move the single-use ones to where they are needed and remove the instance attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._valid_output_index
attribute removed, it's the only one that wasn't re-used and/or needed for caching (public attributes).
pyresample/bilinear/__init__.py
Outdated
self.mask_slices = self._index_array >= self._source_geo_def.size | ||
|
||
def _get_target_lonlats(self): | ||
self._target_lons, self._target_lats = self._target_geo_def.get_lonlats() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this method buys you anything in readability. I personally would remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I had a separate version with chunks
kwarg for the Xarray version, but the coordinates need to be computed for the query and the "laziness" just made things a lot slower. Removed the unnecessary method.
pyresample/bilinear/__init__.py
Outdated
self._valid_output_index, self._resample_kdtree, | ||
self._neighbours, self._epsilon, | ||
self._radius_of_influence) | ||
return res, None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method too seems odd as it doesn't really do much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, moved the query_no_distance()
call to _get_index_array()
.
pyresample/bilinear/__init__.py
Outdated
elif data.ndim == 3: | ||
return _slice3d | ||
else: | ||
raise ValueError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have a message with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, added.
pyresample/bilinear/__init__.py
Outdated
return _apply_fill_value_or_mask_data( | ||
self._reshape_to_target_area(res, data.ndim), | ||
fill_value | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking the inline call of _reshape_to_target_area
out would make this clearer in my opinion. So _finalize_output_data
would be a two step method (reshape, fill/mask).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, adjusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the comments on naming. Otherwise I can't see anything alarming. Some function/methods could be shrunk even more, but it's already a huge improvement over what we had before, great job!
This is a big refactoring for bilinear interpolation. Most of the bits are also re-used in the legacy Numpy version, but I didn't want to put too much time in pulling out all the parts of
pyresample.bilinear.get_sample_from_bil_info()
.There will be some renaming and such that can be improved, but I've run out of steam for now 😅 Suggestions are wellcome!
git diff origin/master **/*py | flake8 --diff
This refactoring also brings some performance improvements when using pre-computed resampling info. With the script below, using Satpy, I got the following timings:
EDIT: Timings updated for latest Satpy
master
(Oct-7 2020) with minimal other load on the laptop.Pyresample
master
branchgenerate=False
):3 m 28 s3 m 14 sgenerate=True
):4 m 42 s4 m 20 sgenerate=False
): 18 sgenerate=True
):1 m 25 s1 m 21 sThis PR:
generate=False
):2 m 50 s1 m 36 sgenerate=True
):3 m 30 s2 m 14 sgenerate=False
):16 s15 sgenerate=True
):48 s45 s