Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask resampler and gradient search overhaul #341

Merged
merged 85 commits into from
Jun 2, 2022

Conversation

mraspaud
Copy link
Member

@mraspaud mraspaud commented Mar 19, 2021

This PR add the DaskResampler class to perform resampling in a dask-friendly manner. As an application, we change the gradient search to use this class.

Todo:

  • Tests for the slicing
  • Tests for the get_bbox_coords method
  • TiepointSwathDefinition class
  • area -> area
  • area -> area when output area is not fully covered
  • swath -> area
  • area -> swath
  • swath -> swath?
  • New gradient search, both bilinear and nearest neighbour. (even median?)
  • Tests added
  • Tests passed
  • Passes git diff origin/master **/*py | flake8 --diff
  • Fully documented

@mraspaud mraspaud self-assigned this Mar 19, 2021
@ghost
Copy link

ghost commented Mar 19, 2021

Congratulations 🎉. DeepCode analyzed your code in 3.589 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
@codecov
Copy link

codecov bot commented May 17, 2021

Codecov Report

Merging #341 (4c37100) into main (1b14698) will increase coverage by 0.23%.
The diff coverage is 96.84%.

@@            Coverage Diff             @@
##             main     #341      +/-   ##
==========================================
+ Coverage   93.95%   94.19%   +0.23%     
==========================================
  Files          65       68       +3     
  Lines       11252    12207     +955     
==========================================
+ Hits        10572    11498     +926     
- Misses        680      709      +29     
Flag Coverage Δ
unittests 94.19% <96.84%> (+0.23%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pyresample/slicer.py 90.69% <90.69%> (ø)
pyresample/gradient/__init__.py 95.14% <91.66%> (-2.81%) ⬇️
pyresample/test/test_resample_blocks.py 98.62% <98.62%> (ø)
pyresample/test/test_gradient.py 98.98% <99.04%> (+0.98%) ⬆️
pyresample/geometry.py 87.28% <100.00%> (+0.23%) ⬆️
pyresample/test/test_geometry.py 99.48% <100.00%> (+0.01%) ⬆️
pyresample/test/test_slicer.py 100.00% <100.00%> (ø)
pyresample/test/utils.py 74.76% <0.00%> (-0.94%) ⬇️
pyresample/test/test_bucket.py 100.00% <0.00%> (ø)
pyresample/bucket/__init__.py 97.95% <0.00%> (+0.39%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b14698...4c37100. Read the comment docs.

@pnuu
Copy link
Member

pnuu commented May 18, 2021

Initial test with the script below:

  • current main: 57 s / 2.5 GB of RAM
  • this PR: 108 s / 1.8 GB of RAM
#!/usr/bin/env python

import time
import glob
import datetime as dt

from satpy import Scene
from dask.diagnostics import ResourceProfiler, Profiler, CacheProfiler
from dask.diagnostics import visualize


def main():
    comps = ['airmass', 'hrv_fog', 'convection', 'dust', 'ash', 'fog', 'night_fog', 'snow', 'day_microphysics']
    tic = time.time()
    fnames = glob.glob("/home/lahtinep/data/satellite/geo/msg/*201611281100*")
    glbl = Scene(reader='seviri_l1b_hrit', filenames=fnames)
    glbl.load(comps, generate=False)
    lcl = glbl.resample('euron1', resampler='gradient_search')
    lcl.save_datasets(base_dir='/tmp')
    toc = time.time()
    print("Processing took %.2f s." % (toc - tic))


if __name__ == "__main__":
    # with dask.config.set(scheduler=CustomScheduler()):
    with ResourceProfiler(dt=0.25) as rprof, Profiler() as prof, CacheProfiler() as cprof:
        main()
    visualize([rprof, prof, cprof],
              file_path=dt.datetime.utcnow().strftime('/tmp/gradient_profile_%Y%m%d_%H%M%S.html'))

@pnuu
Copy link
Member

pnuu commented May 18, 2021

I think some of the slowdown is due to more tasks in the Dask graph.

With current main branch the work units are much larger...
bokeh_plot_main

... than with this PRs branch:
bokeh_plot_pr

@pnuu
Copy link
Member

pnuu commented May 18, 2021

As discussed on Slack, the above test should not be looked at too closely. The version in this PR is recalculating the gradient every time, so it's not very optimal yet. Below are similar plots when only one channel is loaded, resampled and saved:

main:
bokeh_main_single_chan

The current main seems to be practically single threaded. The green bar is transform.

this PR:
bokeh_pr_single_chan png

The blue boxes are getitem-gradient_resampler-broadcast_to, which I believe to be where different input and output chunks are sorted and matched together.

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good. I did not walk through (or understand) most of the low-level logic, but know that you guys have been testing things quite a lot. I mostly commented on the high-level naming and structure of the code. I mostly skimmed through the test code, but if you can use parametrize more that'd be nice.

pyresample/geometry.py Outdated Show resolved Hide resolved
pyresample/geometry.py Outdated Show resolved Hide resolved
pyresample/geometry.py Outdated Show resolved Hide resolved
pyresample/geometry.py Outdated Show resolved Hide resolved
pyresample/geometry.py Outdated Show resolved Hide resolved
pyresample/resampler.py Outdated Show resolved Hide resolved
pyresample/resampler.py Outdated Show resolved Hide resolved
pyresample/slicer.py Outdated Show resolved Hide resolved
pyresample/test/test_gradient.py Show resolved Hide resolved
pyresample/test/test_gradient.py Outdated Show resolved Hide resolved
@mraspaud mraspaud merged commit f4fb7e8 into pytroll:main Jun 2, 2022
@mraspaud mraspaud deleted the feature-dask-resampler branch June 2, 2022 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants