Refactor bilinear #300

pnuu · 2020-09-09T11:38:50Z

This is a big refactoring for bilinear interpolation. Most of the bits are also re-used in the legacy Numpy version, but I didn't want to put too much time in pulling out all the parts of pyresample.bilinear.get_sample_from_bil_info().

There will be some renaming and such that can be improved, but I've run out of steam for now 😅 Suggestions are wellcome!

Closes Refactor bilinear interpolation #299
Tests added and updated
Tests passed
Passes git diff origin/master **/*py | flake8 --diff
Fully documented

This refactoring also brings some performance improvements when using pre-computed resampling info. With the script below, using Satpy, I got the following timings:

EDIT: Timings updated for latest Satpy master (Oct-7 2020) with minimal other load on the laptop.

Pyresample master branch

initial run (generate=False): ~~3 m 28 s~~ 3 m 14 s
initial run (generate=True): ~~4 m 42 s~~ 4 m 20 s
reusing the cached resampling data (generate=False): 18 s
reusing the cached resampling data (generate=True): ~~1 m 25 s~~ 1 m 21 s

This PR:

initial run (generate=False): ~~2 m 50 s~~ 1 m 36 s
initial run (generate=True): ~~3 m 30 s~~ 2 m 14 s
reusing the cached resampling data (generate=False): ~~16 s~~ 15 s
reusing the cached resampling data (generate=True): ~~48 s~~ 45 s

#!/usr/bin/env python

import os
os.environ['DASK_NUM_WORKERS'] = '2'
os.environ['OMP_NUM_THREADS'] = '1'

import glob
from satpy import Scene

def main():
    fnames = glob.glob('/home/lahtinep/data/satellite/geo/msg/*201611281100*__')

    glbl = Scene(reader='seviri_l1b_hrit', filenames=fnames)
    glbl.load(["natural_color", "fog", "overview",
               "hrv_clouds", "hrv_fog", "convection"],
              generate=True)
#              generate=False)
    lcl = glbl.resample('euro4', resampler="bilinear", cache_dir="/tmp", reduce_data=False)
    lcl.save_datasets(base_dir='/tmp')


if __name__ == "__main__":
    main()

…hods

…ethods

pnuu · 2020-09-10T10:11:23Z

I'm going to ignore the DeepCode errors/warnings. The access to private attributes are within the tests, and in deprecated functions (resample_bilinear(), get_bil_info() and get_sample_from_bil_info()) for which I don't want to change the names of the attributes.

The resample_bilinear() function should be replaced with .resample() convenience method in a later PR.

pnuu · 2020-09-14T10:19:59Z

Any other suggestions?

@djhoese made a PR for Satpy (pytroll/satpy#1361) so that Windows testing would happen in Travis instead of Appveyor. Should we make similar change also to Pyresample, merge that, rebase/merge to this and see that also the Windows tests pass?

djhoese · 2020-09-14T10:49:01Z

How backwards compatible is this PR?

I would wait to turn off appveyor. Plus the Azure builds are failing for some other reason. Not sure why yet.

pnuu · 2020-09-14T10:51:37Z

This is fully backwards compatible, including both Numpy and XArray versions.

And again I mixed up Azure and Appveyor 🙄 And yeah, didn't figure what's wrong with the Azure builds.

pnuu · 2020-09-14T13:03:18Z

Looking again at the logs again, it seems that Python 3.9 fails to install pyproj because proj executable isn't available:

created virtual environment CPython3.9.0.candidate.1-64 in 430ms
  creator CPython3Posix(dest=/tmp/tmp.uuC6OgfBeC/venv, clear=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
    added seed packages: pip==20.2.2, setuptools==49.6.0, wheel==0.35.1
  activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator
    + pip install /tmp/cibuildwheel/repaired_wheel/pyresample-1.16.0+96.g6e01eaa-cp39-cp39-manylinux2010_x86_64.whl
  ERROR: Command errored out with exit status 1:
   command: /tmp/tmp.uuC6OgfBeC/venv/bin/python /tmp/tmp.uuC6OgfBeC/venv/lib/python3.9/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpe3vvkoco
       cwd: /tmp/pip-install-vgtgu41m/pyproj
  Complete output (1 lines):
  proj executable not found. Please set the PROJ_DIR variable.For more information see: https://pyproj4.github.io/pyproj/stable/installation.html

djhoese · 2020-09-14T13:05:54Z

I think I posted it on slack but did you ever turn off python 3.9? Check the skip variable at the top of the azure config and add an entry for the cp39 equivalent.

pnuu · 2020-09-14T13:07:42Z

Oops, either didn't notice or completely forgot. Trying now.

…lerBilinear

pnuu · 2020-09-15T10:11:07Z

With these two additional da.compute() calls the overall compute calls drop from 74 to 33 for the initial run of the above example script. At the same time, the processing time drops to around 2 m 35 s, 15 s less than earlier.

pnuu · 2020-10-07T12:08:30Z

I ran the timings again with the current Satpy master branch and minimal load on the laptop. The updated timings are in the description.

djhoese

Nice job. It looks easier to follow and I like the documentation. I had a lot of suggestions for renaming and reordering, but mostly I felt like some of the refactoring went too far. A lot of the class methods seem unnecessary or that they don't do that much. With all the class instances this may just appear this way, but I'm wondering if this can be avoided.

On a larger note, what do you and @mraspaud think about avoiding putting a lot of code in __init__.py modules? Could a lot of the stuff be put in bilinear/base.py and/or bilinear/npy.py?

djhoese · 2020-10-08T13:40:39Z

docs/source/swath.rst

@@ -268,10 +268,74 @@ Click images to see the full resolution versions.

 The *perceived* sharpness of the bottom image is lower, but there is more detail present.

+
+XArrayResamplerBilinear


I personally think Resampler should be the last word: XArrayBilinearResampler

Also, while I think this section is fine here for now, I think we need to refactor the documentation. This document is getting really long.

I agree. This is the name Satpy currently expects, so for backwards compatibility it needs to be like this for now. I can change this in the follow-up changes I already have waiting (not yet PR'd) for both Pyresample and Satpy.

djhoese · 2020-10-08T13:41:11Z

docs/source/swath.rst

+***********************
+
+**bilinear.XArrayResamplerBilinear** is a class that handles bilinear interpolation for data in
+`xarray.DataArray` arrays.  The parallelisation is done automatically using `dask`.


US English would be parallelization. 😉

djhoese · 2020-10-08T13:42:52Z

docs/source/swath.rst

+>>> result = resampler.get_sample_from_bil_info(data)
+
+
+NumpyResamplerBilinear


Same here for the name, Resampler should be last in my opinion. Willing to debate it.

Thoughts on swapping these sections around? Numpy first then xarray? The tests could even use the same lons/lats/data from the numpy section (I think doctest lets you do that).

I'll update the naming in the follow-ups. The Xarray version is preferred for performance, so thought it should come first.

djhoese · 2020-10-08T13:43:26Z

docs/source/swath.rst

+>>> source_def = geometry.SwathDefinition(lons=lons, lats=lats)
+>>> resampler = XArrayResamplerBilinear(source_def, target_def, 30e3)
+>>> resampler.get_bil_info()
+>>> result = resampler.get_sample_from_bil_info(data)


get_sample_from_bil_info seems odd given that you don't actually give it bil_info. Why can't get_bil_info be called automatically if it hasn't been already?

The separate get_bil_info() step is necessary when caching the resampling info. In the follow-ups I move the caching from Satpy to Pyresample, so with that the caching step can be shown here. In another follow-up I'll add resampler.resample() method that wraps resampler.get_bil_info() and resampler.get_sample_from_bil_info() together. It could also have the cache_dir kwarg so that there'd be no need to call it separately.

djhoese · 2020-10-08T13:45:12Z

docs/source/swath.rst

-Function for resampling using bilinear interpolation for irregular source grids.
+Convenience function for resampling using bilinear interpolation for irregular source grids.
+
+..note:


Does this syntax work? I think you need a space after .. and two colons :: and a blank line after.

No it doesn't. Fixed.

djhoese · 2020-10-08T13:49:28Z