-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask resampler and gradient search overhaul #341
Conversation
Congratulations 🎉. DeepCode analyzed your code in 3.589 seconds and we found no issues. Enjoy a moment of no bugs ☀️. 👉 View analysis in DeepCode’s Dashboard | Configure the bot |
Signed-off-by: Martin Raspaud <martin.raspaud@smhi.se>
Codecov Report
@@ Coverage Diff @@
## main #341 +/- ##
==========================================
+ Coverage 93.95% 94.19% +0.23%
==========================================
Files 65 68 +3
Lines 11252 12207 +955
==========================================
+ Hits 10572 11498 +926
- Misses 680 709 +29
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Initial test with the script below:
#!/usr/bin/env python
import time
import glob
import datetime as dt
from satpy import Scene
from dask.diagnostics import ResourceProfiler, Profiler, CacheProfiler
from dask.diagnostics import visualize
def main():
comps = ['airmass', 'hrv_fog', 'convection', 'dust', 'ash', 'fog', 'night_fog', 'snow', 'day_microphysics']
tic = time.time()
fnames = glob.glob("/home/lahtinep/data/satellite/geo/msg/*201611281100*")
glbl = Scene(reader='seviri_l1b_hrit', filenames=fnames)
glbl.load(comps, generate=False)
lcl = glbl.resample('euron1', resampler='gradient_search')
lcl.save_datasets(base_dir='/tmp')
toc = time.time()
print("Processing took %.2f s." % (toc - tic))
if __name__ == "__main__":
# with dask.config.set(scheduler=CustomScheduler()):
with ResourceProfiler(dt=0.25) as rprof, Profiler() as prof, CacheProfiler() as cprof:
main()
visualize([rprof, prof, cprof],
file_path=dt.datetime.utcnow().strftime('/tmp/gradient_profile_%Y%m%d_%H%M%S.html')) |
As discussed on Slack, the above test should not be looked at too closely. The version in this PR is recalculating the gradient every time, so it's not very optimal yet. Below are similar plots when only one channel is loaded, resampled and saved: The current The blue boxes are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good. I did not walk through (or understand) most of the low-level logic, but know that you guys have been testing things quite a lot. I mostly commented on the high-level naming and structure of the code. I mostly skimmed through the test code, but if you can use parametrize more that'd be nice.
This PR add the DaskResampler class to perform resampling in a dask-friendly manner. As an application, we change the gradient search to use this class.
Todo:
git diff origin/master **/*py | flake8 --diff