-
Notifications
You must be signed in to change notification settings - Fork 470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trivial fix to bboxHexRadius to guarantee containment #279
Conversation
This looks like it should be fine, but fails the polyfill tests... |
It's possible that the test itself encoded a bad fill and was not noticed. |
Looking into test failures... |
I updated the test to pass based on @dfellis ' comment. One thing I can't figure out is that the numbers for https://github.com/uber/h3/blob/master/src/apps/testapps/testPolyfill.c When I visualize the polygon using What could explain the discrepancy? |
For the exact issue you're pointing out between h3 and h3-py, I think you accidentally didn't convert the polygon's radians to degrees. The C library takes lat, lng in radians, not degrees (and has a helper function to convert As for why the algorithm implementation is apparently broken here and you had to update the number of hexagons for the test, that's basically my fault. The current polyfill algorithm is a hack I put together a few years ago until we got line support and could implement a more efficient fill algorithm based on it (we do have h3Line now, but it has some issues crossing multiple icosahedron faces, so still not quite there, yet). This algorithm takes the (lat,lng) bounding box of the polygon and then operates on a circle centered on the center of the bounding box with a radius to just fit the bounding box within. Then it generates a kRing cluster of hexagons that could contain that circle assuming that the hexagons are all uniform in shape and have the average shape. The issue is that near the pentagons the distortion rises dramatically and the kRing may be oblong instead of regular. Your change effectively makes that algorithm think that the hexagons are much smaller than they are. We should probably check the minimum edge length in H3 instead of the average to find the real constant, which I think might be more like 1.2 or something. I don't know if the performance gain in finding that exact constant there is worth it, though, since in both cases we'll be slowing things down and making it consume more memory, and especially if we can just replace this entire memory-wasting algorithm with a much smaller and faster one based on h3Line. |
Ah, yes, that makes sense, thanks. And thanks for the explanation on polyfill. |
@nrabinowitz @dfellis tests pass, changelog updated. Anything more? If not, can this be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me, and I think we could land it as is. I do think it would benefit from at least one test for polygons not previously filled - for example, does this fix #136, and can we add a test for that polygon to verify?
I'd also love it if you could show the results of make benchmarks
after this change, just so we have a sense of the impact on performance. I'd rather we were correct than fast, but it would be good to understand the consequence.
@nrabinowitz output from With scaling 1.5:
With scaling 1.0:
Performance slows by a bit over 2x. |
I think all comments have been addressed:
Anything else? |
@kwmsmith Thanks so much for the follow-up here. I feel like ideally we'd be able to have an actual mathematically derived constant here, since if we could choose, say, All that said, I'd rather be correct than fast, and I think we should probably land this as is and address performance later. |
That's my strong preference -- I can swallow a 2x perf degradation. But I can't swallow incorrect polyfill results :) @dfellis mentioned re-implementing polyfill using |
This is referring to adapting the implementation we had in a prior version of H3, when we had a simpler hex grid. Honestly, we're a bit out from it, and I suspect this is a v4 feature.
|
I don't really know when we'll get that done. I only have my spare time to work on this, anymore, since I'm not at Uber. The dependency graph on that is:
So not impossible, and if I could dedicate all of my time on this, I could probably get it in a week, but since I can't, I don't really know what the ETA on that would be. |
FYI @dfellis and @kwmsmith - I'm looking into an alternate algorithm, prototyped here: https://observablehq.com/@nrabinowitz/spidering-polyfill This doesn't depend on |
My thoughts: That does seem like a mostly improved approach versus what we have now, assuming we fix an issue with it I'll go into more depth later. It's probably slower than the speed H3Line + IJ-based filling could do (because of the repeated k-ring generation), but I'm not sure if it would be significantly slower. The The current implementation doesn't seem like anything but the first starting point is used, because it recurses in depth from that point (though I suppose it's possible that holes in the geometry could produce fully-isolated geometries and that it would be necessary to iterate them all), but I think a breadth-based search algorithm would be better: A while loop of searching for neighbors in the geofence and add them to the set, then repeat the process only for the neighbors found to be in the polygon and add their matching neighbors to the set and repeat. The recursion depth would be lower (as each iteration finds more of the interior hexagons) making a stack overflow less likely (actually, it should be possible to implement without any recursive function calls, just three Now that I'm thinking about it more, the recursion-less breadth-based search should also be faster, because the two sets for finding neighbors and the previous matches that should be tested can be allocated once at the beginning, and then at the end of the loop just reassigning the the new matches memory as the old matches variable and zeroing the old matches and reassigning to the new matches variable should work perfectly, while the recursive version has to allocate small chunks of memory on each step. @nrabinowitz do you mind if I try to implement that? I think this is a great 80/20 rule type situation versus the "best" H3Line + IJ fill approach and I think it would only take me a day, which is much more motivating to me. :) Unless you want to do it, or you disagree with my comments above? |
@dfellis I presume the sets are fixed length arrays? Are they just the size of |
I think that would be safest. If the polygon was guaranteed to be convex you could cut it down to just the number of hexagons needed just for the outer rim of the bounding box (as the guaranteed maximum), but a concavity in the polygon means instead of the average of 3 hexagons could be up to the whole 6 hexagons in the k-ring, and enough back-and-forth could cause an intermediate set to exceed the bounding box rim set. |
@dfellis I'm torn here, because I'd like to implement this myself but also realize that I won't have time to do so in the next few weeks. So I think it would be great if you want to take a shot at it. A few notes:
Let me know if you want to discuss it any further. If you don't get to this relatively soon, I might find time to pick it up, but I'd ping you first. |
I'll have some time tomorrow to take a stab at it. I'll let you know how far along I get.
First point, thank you for that. That looks better, but still not quite what I mentioned above (that one's going to expand outward from the first point as far as it can, then it will go to the next point and try to expand, etc. I'm imagining something that takes all the points first, expands each of them and checks them, then takes that new set and expands them outward, etc). I'm not sure if there's any performance benefit on the way I was envisioning it and this one, though. As to your question about a chunk of memory versus linked lists, that's assuming that we're going to use the same hash function as we've used before. I think here an algorithm that's more expensive to insert but much cheaper to iterate is the best idea, since the vast majority of the time I expect the number of output cells to be roughly equal to the number of input cells, which are based on the number of points defining the polygon (in the beginning I bet output ~= 3x input, but it will eventually level off and the last few iterations will have an output significantly smaller than the input as the k-rings keep running into already-found hexagons). The final output set should probably be hashed normally since it's focused on
That's why I proposed
Including all vertices seems perfectly fine.
Here's my discussion. :) Going to stop here for the day, though, and I'll try to get to prototyping something out tomorrow. |
@ajfriend helped identify a failure case here: Because you can't traverse between contained cells, you wouldn't get them with this algo :(. |
I'm pretty sure a k = 2 approach to neighbor traversal will solve this problem. The algorithm would find the first and last hexagons, and at k = 2, we definitely catch these (because they are X, Y grid aligned and since this issue only occurs if the polygon is narrow and very straight in an X, Y coordinate system, k = 2 should be what's needed to find it). Can you get me the polygon here so I can add a test case to my branch? I would hate to lose the 3-4x performance improvement because of this. |
This one was literally sketched on top of a screenshot, but I could come up with a real polygon. I think k=2 won't do it, sadly. I think if you take a very long, very narrow triangle, center it on a given hex so that the hex center is contained, then rotate it at different angles, you can find angles w/r/t the grid where you'd have an arbitrarily large buffer of non-traversable cells. I'm also bummed by this, and not sure whether we should care about these edge cases or not. The challenge is not only "Are we correct for a single weird polygon?", but "Are we correct for a set of contiguous polygons?" - in the latter case, poor handling of slivers could end up with "dead" cells in the set, clearly within the boundary of the set, but not assigned to any polygon. |
So I see what you're getting at now. This also indicts the H3Line approach outlined above since this polygon can't even draw a line between the points and maintain the contiguous-polygons-produce-contiguous-H3-Sets behavior. But I do still have an idea for a solution here, and it involves H3Line. The algo uses the polygon's points to get likely starting hexagons to "spider" from, but there's no reason why that has to be the starting set. We could execute H3Line operations between the points in the polygons to get a larger set of hexagons to test from. This would let us get all of these hexagons because while we can't traverse from the points, we can traverse from places along the line segments. It would only fail where a line segment is crossing an icosahedron (we can fall back to just the points for now and fix up H3Line later), and there would be problems with polygons that are large enough that a single segment is affected by the spherical vs cartesian projection differences (but that would also require that the polygon is one of these degenerate polygons as well at that large of a scale, which seems very unlikely and unworthy of supporting). I would need to see how putting H3Line into the "spider" algorithm affects performance. I expect it to hurt it some, but I'm not sure by how much. |
trivial fix to bboxHexRadius to guarantee containment
Fixes #136 , at least in my usecases.