-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_neighbour_info
slows down significantly when working with large target rasters using many segments
#504
Comments
Awesome! Nice job.
Sorry, I don't see a link anywhere. I don't use |
Ah there's the PR. Thanks. |
Thanks :)
You were apparently way to fast for me ;)
Well there are cases where the index and distance arrays are much smaller then predicted by the target area, i.e. when working with masked data. At first I did exactly that, but one of your tests immediately caught me. One could pre-allocate the array with the output size preemptively, and then trim everything at the end again to avoid this issue, but that might increase memory usage significantly for some of your users and break their code. I haven't thought of the |
Code Sample, a minimal, complete, and verifiable piece of code
Problem description
I am working on a software application that deals with large raster files (approximately 15000x15000 pixels). Recently, I encountered an issue where the
get_neighbour_info
function was significantly slowing down, when I used multiple segments to fit the kd_tree lookups into RAM. The code snipped provided is a simplified benchmark demonstrating the issue. Upon profiling the code usingpy-spy
, I discovered that the concatenation of segments was the cause of the slow down:As you can see, the current implementation results in many copies of the segments, which would even result in a peak RAM usage of 2 * total_size - segment_size.
Expected Output
To address this, a solution would involve pre-allocation of result arrays to avoid memcpys. Thanks to your habitable code base and excellent test coverage, I was able to quickly draft up a solution proposal, which you can find in the linked pull request. By pre-allocating the result arrays, most time is spent in the different workers querying the kd_tree, instead of copying data:
Also the execution time went down from ~35min to just ~4min.
Associated pull request
In this first draft I've opted for adding a simple pre-allocation size parameter to
get_neighbour_info
, to avoid increasing the surface of the public API. However, another option would be to introduce a new method likeput_neighbour_info
where the results are not returned, but provided as output parameters. This would give users even more flexibility such as specifying the result data type. However this comes at the cost of another public API function, and probably some significant refactoring ofget_neighbour_info
to avoid code duplication. What do you think?The text was updated successfully, but these errors were encountered: