Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize AreaDefinition.get_proj_coords when requesting dask arrays #401

Merged
merged 5 commits into from
Dec 3, 2021

Conversation

djhoese
Copy link
Member

@djhoese djhoese commented Dec 1, 2021

This is part of my work on the Satpy issue pytroll/satpy#1902. As described in some of the docstrings in this PR, these changes make it so that get_proj_coords, the method that returns a 2D X and a 2D Y coordinate array for each pixel of an AreaDefinition, generates the data usings a map_blocks call. Previously this data was generated by getting the 1D vectors via np.arange and some other math, then passing them to np.meshgrid to turn them into 2D vectors. Those series of steps produce an odd series of tasks for dask when the coordinates are used for future work like converting to lon/lat arrays and then using those lon/lats to generate solar zenith angles (for example). The result is that dask has trouble scheduling the future tasks because it sees the complicated set of tasks and series of linked tasks required to turn two 1D arrays into two 2D arrays.

This PR switches to using map_blocks to generate the 2D coordinate arrays from the scalar properties of the AreaDefinition which essentially shows up as 1 task in the dask graph. As shown in the satpy issue referenced above this reduces memory usage of processing (specifically AHI) by a lot.

@codecov
Copy link

codecov bot commented Dec 2, 2021

Codecov Report

Merging #401 (17f0884) into main (e6a452e) will increase coverage by 0.02%.
The diff coverage is 98.76%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #401      +/-   ##
==========================================
+ Coverage   93.81%   93.83%   +0.02%     
==========================================
  Files          65       65              
  Lines       11076    11118      +42     
==========================================
+ Hits        10391    10433      +42     
  Misses        685      685              
Flag Coverage Δ
unittests 93.83% <98.76%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pyresample/geometry.py 87.01% <98.63%> (+0.39%) ⬆️
pyresample/test/test_geometry.py 99.30% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6a452e...17f0884. Read the comment docs.

@coveralls
Copy link

coveralls commented Dec 2, 2021

Coverage Status

Coverage increased (+0.02%) to 93.643% when pulling 17f0884 on djhoese:optimize-areadef-proj-coords into e6a452e on pytroll:main.

Copy link
Member

@pnuu pnuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't spot anything obviously wrong, so LGTM.

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job with finding this optimization! It wasn't obvious in any way. I just have noted some code duplication, otherwise LGTM

Comment on lines 1133 to 1134
x = np.arange(start_x_idx, end_x_idx, dtype=dtype) * pixel_size_x + pixel_upper_left_x
y = np.arange(start_y_idx, end_y_idx, dtype=dtype) * -pixel_size_y + pixel_upper_left_y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be refactored so that it doesn't duplicate the code in _get_proj_vectors?

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to merge.

@djhoese djhoese merged commit 74bcdb5 into pytroll:main Dec 3, 2021
@djhoese djhoese deleted the optimize-areadef-proj-coords branch December 3, 2021 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants