-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort the mergeable segments before computing merged segments #1207
Conversation
cpp/include/cuspatial/detail/intersection/linestring_intersection.cuh
Outdated
Show resolved
Hide resolved
cpp/include/cuspatial/detail/intersection/linestring_intersection.cuh
Outdated
Show resolved
Hide resolved
cpp/include/cuspatial/detail/intersection/linestring_intersection.cuh
Outdated
Show resolved
Hide resolved
cpp/include/cuspatial/detail/intersection/linestring_intersection_with_duplicates.cuh
Outdated
Show resolved
Hide resolved
cpp/include/cuspatial/detail/intersection/linestring_intersection_with_duplicates.cuh
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a few suggestions and a question about perf impact.
#include <thrust/uninitialized_fill.h> | ||
|
||
namespace cuspatial { | ||
namespace detail { | ||
|
||
/** | ||
* @internal | ||
* @brief Kernel to merge segments, naive n^2 algorithm. | ||
* @brief Kernel to merge segments. Each thread works on one segment space, | ||
* with a naive n^2 algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps naive is the wrong word. "brute force"? Is lower complexity a possibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. What's the lower bound of the algorithm complexity here? Given a set of segments, to merge all mergeable segments, one at least needs to look at all segments once, so the lower bound is O(N)
(N being the number of segments in the set). To achieve that, I assume you need to design some sort of hashing function. When you look at one segment, the hashing function will compute the key and look into the map to find if there is existing segment that can be merged with it. This key would be similar to what we designed in the sorting comparator here - a combination of group id, slope and lower left of the segment. A map on CPU code is simple, but may be quite difficult to implement on device and could invoke unwanted slow down.
So that said, the current algorithm isn't naive (or brute force) at all - while there is a nested loop here, the sorting actually already did part of the hashing work like above. The sorting makes sure that the outer loop only run against all other segments once in every mergeable group, all subsequent segments are marked merged_flag == 1
after the initial run. So the nested loop here is actually on the order of O(N) here. The aggregated complexity is dominated by the sorting, which is O(NlogN).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
/merge |
EDIT: the thrust bug is a known issue. Tracked by NVIDIA/cccl#153 This PR fixes an issue in `remove_if`. Strangely, when calling `reduce_by_key` with iterator to integer, and when using `thrust::plus<index_t>()`, even if by definition the argument type of the plus operator is strongly typed with a higher bit width integer type, and I expected that the flags (`uint8_t`) were cast to the higher bit depth before addition, the overflow still happens. I have filed a thread in the thrust channel to discuss if this is a bug in thrust. Meanwhile, a quick WAR is to explicitly use a transform iterator to cast the uchar in to `index_t` before adding. This should give the correct result. Fixes #1200 Depend on #1207 Authors: - Michael Wang (https://github.com/isVoid) - H. Thomson Comer (https://github.com/thomcom) Approvers: - Mark Harris (https://github.com/harrism) - H. Thomson Comer (https://github.com/thomcom) URL: #1209
Description
This PR is part-1 fix to #1200. In
find_and_combine_segments
the N^2 algorithm depends on the fact that the API needs to be presorted with certain criteria. This Pr adds such sorting.Checklist