Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use quadtree for .contains_properly #910

Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
d9bc2fe
Trying to get quadtree pip working with contains.
thomcom Feb 6, 2023
e6618fa
Get pip with quadtree working with the exception of colinearity. Curr…
thomcom Feb 7, 2023
534f3fa
Needs to have equality comparison so as to not exclude closed polygons.
thomcom Feb 7, 2023
65191b0
Pass all tests.
thomcom Feb 7, 2023
00090e0
Get allpairs working.
thomcom Feb 8, 2023
d7aa3cc
Update point_indices with @isVoid's better solution.
thomcom Feb 8, 2023
951391b
Quadtree polygon-index return value should match the original input i…
thomcom Feb 10, 2023
062d17c
Update python/cuspatial/cuspatial/core/binpreds/binpreds.py
thomcom Feb 10, 2023
077b54c
Update python/cuspatial/cuspatial/core/binpreds/binpreds.py
thomcom Feb 10, 2023
fc762eb
Merge branch 'branch-23.04' into feature/use-quadtree-for-contains-pr…
thomcom Feb 10, 2023
9eaad1d
Update cpp/src/utility/point_in_polygon.cuh
thomcom Feb 22, 2023
2534d88
Improve comment.
thomcom Mar 6, 2023
c410f03
Update python/cuspatial/cuspatial/core/binpreds/binpreds.py
thomcom Mar 6, 2023
5e1fc73
Fix error with suggestion.
thomcom Mar 6, 2023
6457db5
Update python/cuspatial/cuspatial/tests/binpreds/test_contains_proper…
thomcom Mar 6, 2023
76e30b2
Resolve auto scale selection.
thomcom Mar 6, 2023
f281d13
Refactoring allpairs to take string argument.
thomcom Mar 7, 2023
49d1512
Refactoring to use totally different code paths for pairwise and unpa…
thomcom Mar 7, 2023
96a6d16
Get rid of the split I was investigating.
thomcom Mar 7, 2023
47b2fa6
Rename allpairs to mode in tests.
thomcom Mar 7, 2023
fe4cf46
Merge branch 'branch-23.04' into feature/use-quadtree-for-contains-pr…
thomcom Mar 7, 2023
3654452
Change file to use GeoSeries.
thomcom Mar 7, 2023
bb0f5aa
Still updating API for 23.04
thomcom Mar 8, 2023
6594790
Remove polygon-bboxes part offset refactoring.
thomcom Mar 8, 2023
199c62f
Merge branch 'feature/use-quadtree-for-contains-properly' of github.c…
thomcom Mar 14, 2023
f8a388a
Refactoring for two modes.
thomcom Mar 14, 2023
058b3e6
Add byte_limited function
thomcom Mar 14, 2023
dac1af0
Add byte_limited function
thomcom Mar 14, 2023
0d2a021
Getting tests passing.
thomcom Mar 14, 2023
5485da9
Merge branch 'branch-23.04' into feature/use-quadtree-for-contains-pr…
thomcom Mar 14, 2023
d3eb406
Updated for review.
thomcom Mar 14, 2023
d7370d2
Rename basic contains_properly functions.
thomcom Mar 14, 2023
fa6968c
Address review from another PR.
thomcom Mar 14, 2023
5034c9a
Problem with style.
thomcom Mar 15, 2023
f305257
Rename back to allpairs=True for GTC.
thomcom Mar 15, 2023
203f956
Don't copy point_result
thomcom Mar 15, 2023
1590542
Merge branch 'branch-23.04' into feature/use-quadtree-for-contains-pr…
thomcom Mar 15, 2023
5583dba
Minor refactoring as a result of meeting.
thomcom Mar 16, 2023
e2d3ee9
Move most difficult code into standalone function with simple input a…
thomcom Mar 16, 2023
eb57b48
Refactor quadtree post processing into something sensible.
thomcom Mar 16, 2023
11911e3
Refactor brute force postprocessing.
thomcom Mar 16, 2023
d2b07d4
Clean up join_utils and remove duplicate definition.
thomcom Mar 16, 2023
c4b91e8
Handle review comment about
thomcom Mar 16, 2023
1e57a33
Accidentally committed a regression and address @trxcllnt suggestion.
thomcom Mar 16, 2023
3769cb1
Refactoring to support various comments. Version 2.5 of API?
thomcom Mar 16, 2023
092ce8c
Added doc at request of Michael
thomcom Mar 16, 2023
f93a273
Pass a intersects and within binpreds that were broken.
thomcom Mar 17, 2023
c92bc74
Clean up point_in_polygon after return.
thomcom Mar 17, 2023
98a4f7a
Merge branch 'branch-23.04' into feature/use-quadtree-for-contains-pr…
thomcom Mar 17, 2023
4b38979
Try to fix issue with Docs CI error.
thomcom Mar 17, 2023
40bb07e
Merge branch 'feature/use-quadtree-for-contains-properly' of github.c…
thomcom Mar 17, 2023
a81feb1
Try to correct formatting in geoseries.py for CI
thomcom Mar 17, 2023
f0e80c1
Add detail to postprocess and BinaryPredicate class.
thomcom Mar 17, 2023
75f7449
Merge branch 'branch-23.04' into feature/use-quadtree-for-contains-pr…
thomcom Mar 17, 2023
382caef
Hopefully fix the annoying docs build error.
thomcom Mar 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion cpp/src/utility/point_in_polygon.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
#include <cudf/column/column_device_view.cuh>
#include <cudf/types.hpp>

#include <cuspatial/detail/utility/floating_point.cuh>

namespace cuspatial {
namespace detail {

Expand All @@ -32,6 +34,7 @@ inline __device__ bool is_point_in_polygon(T const x,
cudf::column_device_view const& poly_points_y)
{
bool in_polygon = false;
bool is_colinear = false;
thomcom marked this conversation as resolved.
Show resolved Hide resolved
uint32_t poly_begin = poly_offsets.element<uint32_t>(poly_idx);
uint32_t poly_end = poly_idx < poly_offsets.size() - 1
? poly_offsets.element<uint32_t>(poly_idx + 1)
Expand All @@ -54,10 +57,26 @@ inline __device__ bool is_point_in_polygon(T const x,
bool y_in_bounds = y_between_ay_by || y_between_by_ay; // is y in range [by, ay]
T run = x1 - x0;
T rise = y1 - y0;
T rise_to_point = y - y0;

// The endpoint of the line segment is the same, and the segment degenerates to a point.
// This can happen in polygon vertices when the first and last vertex of the ring are
// the same. In this scenario, do not attempt ray casting on a degenerate point.
T constexpr zero = 0.0;
if (float_equal(run, zero) && float_equal(rise, zero)) continue;

T rise_to_point = y - y0;
T run_to_point = x - x0;

is_colinear = float_equal(run * rise_to_point, run_to_point * rise);
if (is_colinear) { break; }
thomcom marked this conversation as resolved.
Show resolved Hide resolved

if (y_in_bounds && x < (run / rise) * rise_to_point + x0) { in_polygon = not in_polygon; }
}
// If points are on the polygon edge, they are not contained in the polygon.
if (is_colinear) {
thomcom marked this conversation as resolved.
Show resolved Hide resolved
in_polygon = false;
break;
}
thomcom marked this conversation as resolved.
Show resolved Hide resolved
}

return in_polygon;
Expand Down
158 changes: 126 additions & 32 deletions python/cuspatial/cuspatial/core/binpreds/binpreds.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,20 @@

from abc import ABC, abstractmethod

import cupy as cp

import cudf

from cuspatial.core._column.geocolumn import GeoColumn
from cuspatial.core.binpreds.contains import contains_properly
from cuspatial.core.binpreds.contains import (
byte_limited_contains_properly,
quadtree_contains_properly,
)
from cuspatial.utils.column_utils import (
contains_only_linestrings,
contains_only_multipoints,
contains_only_polygons,
has_multipolygons,
has_same_geometry,
)

Expand Down Expand Up @@ -129,6 +135,14 @@ def __call__(self) -> cudf.Series:


class ContainsProperlyBinpred(BinaryPredicate):
def __init__(self, lhs, rhs, align=True, allpairs=False):
super().__init__(lhs, rhs, align=align)
if allpairs:
self.allpairs = True
self.align = False
else:
self.allpairs = False
thomcom marked this conversation as resolved.
Show resolved Hide resolved

def preprocess(self, lhs, rhs):
"""Preprocess the input GeoSeries to ensure that they are of the
correct type for the predicate."""
Expand All @@ -154,11 +168,14 @@ def preprocess(self, lhs, rhs):
point_indices = geom.point_indices()
from cuspatial.core.geoseries import GeoSeries

final_rhs = GeoSeries(
GeoColumn._from_points_xy(xy_points._column)
).points
final_rhs = GeoSeries(GeoColumn._from_points_xy(xy_points._column))
return (lhs, final_rhs, point_indices)

def _should_use_quadtree(self):
isVoid marked this conversation as resolved.
Show resolved Hide resolved
return (
len(self.lhs) >= 32 or has_multipolygons(self.lhs) or self.allpairs
thomcom marked this conversation as resolved.
Show resolved Hide resolved
)

def _op(self, lhs, points):
"""Compute the contains_properly relationship between two GeoSeries.
A feature A contains another feature B if no points of B lie in the
Expand All @@ -168,23 +185,71 @@ def _op(self, lhs, points):
raise TypeError(
"`.contains` can only be called with polygon series."
)
if self._should_use_quadtree():
point_result = quadtree_contains_properly(
points,
lhs,
)
else:
point_result = byte_limited_contains_properly(points, lhs)
thomcom marked this conversation as resolved.
Show resolved Hide resolved
return point_result

# call pip on the three subtypes on the right:
point_result = contains_properly(
points.x,
points.y,
lhs.polygons.part_offset,
lhs.polygons.ring_offset,
lhs.polygons.x,
lhs.polygons.y,
def _postprocess_quadtree_result(self, point_indices, point_result):
# If there are more than 31 polygons, the point indices are
# returned as a dataframe with two columns: part_index and
# point_index.

# This complex block of code is to create a dataframe that
# contains the polygon index and the point index for each
# point in the polygon. Quadtree pip returns the _part_index_
# of the polygon, not the polygon index.
rings_to_parts = cp.array(self.lhs.polygons.part_offset)
part_sizes = rings_to_parts[1:] - rings_to_parts[:-1]
parts_map = cudf.Series(
cp.arange(len(part_sizes)), name="part_index"
).repeat(part_sizes)
parts_df = parts_map.reset_index(drop=True).reset_index()
# Mapping of parts to polygons
parts_to_geoms = cp.array(self.lhs.polygons.geometry_offset)
geometry_sizes = parts_to_geoms[1:] - parts_to_geoms[:-1]
geometry_map = cudf.Series(
cp.arange(len(geometry_sizes)), name="polygon_index"
).repeat(geometry_sizes)
thomcom marked this conversation as resolved.
Show resolved Hide resolved
geom_df = geometry_map.reset_index(drop=True)
geom_df.index.name = "part_index"
thomcom marked this conversation as resolved.
Show resolved Hide resolved
geom_df = geom_df.reset_index()

# Replace the part index with the polygon index
part_result = parts_df.merge(point_result, on="part_index")
# Replace the polygon index with the row index
result = geom_df.merge(part_result, on="part_index")
result = result[["polygon_index", "point_index"]]
result = result.drop_duplicates()
# Replace the polygon index with the original index
result["polygon_index"] = result["polygon_index"].replace(
cudf.Series(self.lhs.index, index=cp.arange(len(self.lhs.index)))
thomcom marked this conversation as resolved.
Show resolved Hide resolved
)
return point_result
# Using allpairs for all requests with more than 31 polygons.
thomcom marked this conversation as resolved.
Show resolved Hide resolved
if not self.allpairs:
if len(result) == 0:
return cudf.Series([False] * len(self.lhs))
final_result = cudf.Series([False] * len(point_indices))
thomcom marked this conversation as resolved.
Show resolved Hide resolved
grouped = result.groupby("polygon_index").count() == len(
point_indices
)
final_result.loc[grouped.index] = True
return final_result
else:
return result

def postprocess(self, point_indices, point_result):
"""Postprocess the output GeoSeries to ensure that they are of the
correct type for the predicate."""
result = cudf.DataFrame({"idx": point_indices, "pip": point_result})
df_result = result
def _postprocess_brute_force_result(self, point_indices, point_result):
# If there are 31 or fewer polygons in the input, the result
# is a dataframe with one row per point and one column per
# polygon.
Comment on lines +388 to +390
Copy link
Member

@harrism harrism Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option would just be to change this output format before you consume it here. For the small-polygon-count version, just convert the bitmask into the same format output as the quadtree (all pairs): one row per intersection, containing a polygon index and a point index.

Then contains_properly only needs to convert the results if the user requests pairwise instead of all-pairs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out it is really easy to change the output format with a cudf.stack() call, so I'm investigating this approach to reduce code-paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've finished that refactor, it makes solving the discrete math problem at the end harder but I'm ok with it for the current iteration of the architecture.


# Result can be:
# A Dataframe of booleans with n_points rows and up to 31 columns.
result = point_result
# Discrete math recombination
if (
contains_only_linestrings(self.rhs)
Expand All @@ -193,15 +258,45 @@ def postprocess(self, point_indices, point_result):
):
# process for completed linestrings, polygons, and multipoints.
# Not necessary for points.
result["idx"] = point_indices
df_result = (
result.groupby("idx").sum().sort_index()
== result.groupby("idx").count().sort_index()
)
point_result = cudf.Series(
df_result["pip"], index=cudf.RangeIndex(0, len(df_result))
)
point_result.name = None
return point_result
result = df_result

final_result = cudf.Series([False] * len(self.lhs))

if len(result.columns) > 1:
final_result[result.index] = cp.diag(result.values)
else:
final_result[result.index] = result[result.columns[0]]
thomcom marked this conversation as resolved.
Show resolved Hide resolved
final_result.name = None
return final_result

def postprocess(self, point_indices, point_result):
"""Postprocess the output GeoSeries to ensure that they are of the
correct type for the predicate.

Postprocess for contains_properly has to handle multiple input and
output configurations.

The input can be a single polygon, a single multipolygon, or a
GeoSeries containing a mix of polygons and multipolygons.

The input to postprocess is `point_indices`, which can be either a
cudf.DataFrame with one row per point and one column per polygon or
a cudf.DataFrame containing the point index and the part index for
each point in the polygon.
"""
if self._should_use_quadtree():
return self._postprocess_quadtree_result(
point_indices, point_result
)
else:
thomcom marked this conversation as resolved.
Show resolved Hide resolved
return self._postprocess_brute_force_result(
point_indices, point_result
)


class OverlapsBinpred(ContainsProperlyBinpred):
Expand All @@ -211,15 +306,14 @@ def postprocess(self, point_indices, point_result):
# TODO: Maybe change this to intersection
if not has_same_geometry(self.lhs, self.rhs):
return cudf.Series([False] * len(self.lhs))
result = cudf.DataFrame({"idx": point_indices, "pip": point_result})
df_result = result
partial_result = result.groupby("idx").sum()
df_result = (partial_result > 0) & (partial_result < len(point_result))
point_result = cudf.Series(
df_result["pip"], index=cudf.RangeIndex(0, len(df_result))
)
point_result.name = None
return point_result
point_result["point_index"] = point_indices
hits = point_result.groupby("point_index").sum()
size = point_result.groupby("point_index").count()
partial_overlap = hits != size
non_empty = size > 0
at_least_one_overlap = hits > 0
group_result = partial_overlap & non_empty & at_least_one_overlap
return group_result


class IntersectsBinpred(ContainsProperlyBinpred):
Expand Down
Loading