-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize AreaDefinition.get_proj_coords when requesting dask arrays #401
Conversation
Codecov Report
@@ Coverage Diff @@
## main #401 +/- ##
==========================================
+ Coverage 93.81% 93.83% +0.02%
==========================================
Files 65 65
Lines 11076 11118 +42
==========================================
+ Hits 10391 10433 +42
Misses 685 685
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't spot anything obviously wrong, so LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job with finding this optimization! It wasn't obvious in any way. I just have noted some code duplication, otherwise LGTM
pyresample/geometry.py
Outdated
x = np.arange(start_x_idx, end_x_idx, dtype=dtype) * pixel_size_x + pixel_upper_left_x | ||
y = np.arange(start_y_idx, end_y_idx, dtype=dtype) * -pixel_size_y + pixel_upper_left_y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be refactored so that it doesn't duplicate the code in _get_proj_vectors
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, feel free to merge.
This is part of my work on the Satpy issue pytroll/satpy#1902. As described in some of the docstrings in this PR, these changes make it so that
get_proj_coords
, the method that returns a 2D X and a 2D Y coordinate array for each pixel of an AreaDefinition, generates the data usings amap_blocks
call. Previously this data was generated by getting the 1D vectors vianp.arange
and some other math, then passing them tonp.meshgrid
to turn them into 2D vectors. Those series of steps produce an odd series of tasks for dask when the coordinates are used for future work like converting to lon/lat arrays and then using those lon/lats to generate solar zenith angles (for example). The result is that dask has trouble scheduling the future tasks because it sees the complicated set of tasks and series of linked tasks required to turn two 1D arrays into two 2D arrays.This PR switches to using
map_blocks
to generate the 2D coordinate arrays from the scalar properties of the AreaDefinition which essentially shows up as 1 task in the dask graph. As shown in the satpy issue referenced above this reduces memory usage of processing (specifically AHI) by a lot.git diff origin/main **/*py | flake8 --diff