Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort the mergeable segments before computing merged segments #1207

Merged
merged 22 commits into from
Jul 25, 2023

Conversation

isVoid
Copy link
Contributor

@isVoid isVoid commented Jun 29, 2023

Description

This PR is part-1 fix to #1200. In find_and_combine_segments the N^2 algorithm depends on the fact that the API needs to be presorted with certain criteria. This Pr adds such sorting.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@isVoid isVoid requested a review from a team as a code owner June 29, 2023 19:25
@isVoid isVoid requested review from trxcllnt and harrism June 29, 2023 19:25
@github-actions github-actions bot added the libcuspatial Relates to the cuSpatial C++ library label Jun 29, 2023
@isVoid isVoid added bug Something isn't working non-breaking Non-breaking change labels Jun 30, 2023
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few suggestions and a question about perf impact.

@isVoid isVoid requested a review from a team as a code owner July 14, 2023 04:58
@github-actions github-actions bot added the Python Related to Python code label Jul 14, 2023
@isVoid isVoid requested a review from harrism July 14, 2023 07:43
#include <thrust/uninitialized_fill.h>

namespace cuspatial {
namespace detail {

/**
* @internal
* @brief Kernel to merge segments, naive n^2 algorithm.
* @brief Kernel to merge segments. Each thread works on one segment space,
* with a naive n^2 algorithm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps naive is the wrong word. "brute force"? Is lower complexity a possibility?

Copy link
Contributor Author

@isVoid isVoid Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. What's the lower bound of the algorithm complexity here? Given a set of segments, to merge all mergeable segments, one at least needs to look at all segments once, so the lower bound is O(N) (N being the number of segments in the set). To achieve that, I assume you need to design some sort of hashing function. When you look at one segment, the hashing function will compute the key and look into the map to find if there is existing segment that can be merged with it. This key would be similar to what we designed in the sorting comparator here - a combination of group id, slope and lower left of the segment. A map on CPU code is simple, but may be quite difficult to implement on device and could invoke unwanted slow down.

So that said, the current algorithm isn't naive (or brute force) at all - while there is a nested loop here, the sorting actually already did part of the hashing work like above. The sorting makes sure that the outer loop only run against all other segments once in every mergeable group, all subsequent segments are marked merged_flag == 1 after the initial run. So the nested loop here is actually on the order of O(N) here. The aggregated complexity is dominated by the sorting, which is O(NlogN).

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@isVoid
Copy link
Contributor Author

isVoid commented Jul 25, 2023

/merge

@rapids-bot rapids-bot bot merged commit 8ee773c into rapidsai:branch-23.08 Jul 25, 2023
rapids-bot bot pushed a commit that referenced this pull request Jul 28, 2023
EDIT: the thrust bug is a known issue. Tracked by NVIDIA/cccl#153

This PR fixes an issue in `remove_if`. Strangely, when calling `reduce_by_key` with iterator to integer,
and when using `thrust::plus<index_t>()`, even if by definition the argument type of the plus operator is strongly typed
with a higher bit  width integer type, and I expected that the flags (`uint8_t`) were cast to the higher
bit depth before addition, the overflow still happens. I have filed a thread in the thrust channel to discuss if this
is a bug in thrust.

Meanwhile, a quick WAR is to explicitly use a transform iterator to cast the uchar in to `index_t` before adding.
This should give the correct result.

Fixes #1200 
Depend on #1207

Authors:
  - Michael Wang (https://github.com/isVoid)
  - H. Thomson Comer (https://github.com/thomcom)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - H. Thomson Comer (https://github.com/thomcom)

URL: #1209
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcuspatial Relates to the cuSpatial C++ library non-breaking Non-breaking change Python Related to Python code
Projects
Status: Review
Development

Successfully merging this pull request may close these issues.

3 participants