Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: h3_coverage or similar / Documentation update warning of centroid behavior #158

Closed
jmealo opened this issue Oct 10, 2024 · 9 comments

Comments

@jmealo
Copy link

jmealo commented Oct 10, 2024

See interactive demo here showing the stark difference between h3_coverage and h3_polygon_to_cells:
https://h3-snow.streamlit.app/#h3-in-snowflake

Many users of h3-pg have use cases outside of Uber's and the upstream h3 library. It's unclear if they will ever add this functionality, however, it doesn't mean that we couldn't on our end :)

Problem:
Most folks using h3_polygon_to_cells are expecting behavior closer to h3_coverage. One is accurate/slow, the other is inaccurate/fast. I believe both should be available and the differences should be highlighted.

Follow-up tasks:
It might make sense to try to see if the upstream h3 library would be willing to implement this or accept a PR? I'm assuming that they're not interested as SnowFlake implemented it themselves and it doesn't appear necessary for Uber's use case.

An upstream issue opened asking specifically about PostGIS:
uber/h3#842

Resolution 10:
Screenshot 2024-10-10 at 12 48 24 PM

Resolution 7:
Screenshot 2024-10-10 at 12 48 10 PM

@zachasme
Copy link
Owner

Hi @jmealo!

Uber actually merged "support for full containment and overlapping modes in polygonToCellsExperimental" (PR: uber/h3#796), back in November, but I'm still waiting for a release so I can include it in my bindings. I believe that will help with this specific use-case.

Until then, you would have to build h3-pg manually with the following changes:

@dfellis
Copy link

dfellis commented Oct 11, 2024

The reason why it hasn't been released yet is because libfuzzer can detect invalid memory accesses possible with the API on malformed inputs that we haven't figured out, yet (rather, we figured out some of them, but not all of them, yet).

So I would warn against using it on untrusted data, but it should be perfectly fine for actual usage.

@jmealo
Copy link
Author

jmealo commented Oct 11, 2024

@dfellis: Thanks for the context. I'm seeing very odd behavior with invalid inputs in general (but I don't think it's as serious as accessing invalid memory):
uber/h3#926

@dfellis
Copy link

dfellis commented Oct 11, 2024

Right. I haven't commented on that one because it's kinda the Geospatial equivalent of a Klein bottle so I feel like the behavior of the code (making the rest of the world part of the polygon) is more "correct" than interpreting it as two conjoined polygons, but I also know that interpretation is less useful, so I am torn.

@jmealo
Copy link
Author

jmealo commented Oct 11, 2024

@zachasme:
#159

I have this working on my branch. On a test set of 3 million locations, at a resolution of 7, I had 47 points where it failed my test, those points look like they were right on the edge of the hex geometry.

It was still 99.9987% accurate.

image

At a resolution of 9, the 47 drops down to 6 points.

It's 2.5 seconds for 7 and 2.8 seconds for 9 :D not too bad

@jmealo
Copy link
Author

jmealo commented Oct 11, 2024

@zachasme: If you try to feed this invalid geometry, you can/will get an error and then there's the issue of possible invalid memory access mentioned by @dfellis. ST_IsValid is so cheap to run, I think it might make sense just to call it internally and raise an exception ourselves rather than risk it.

I'd be interested in the most performant way to call ST_IsValid() from the C-side of things 🤔 ... or if we should just wrap it in the SQL layer?

@zachasme
Copy link
Owner

@dfellis Thanks for the heads-up :-) Would you consider releasing a new version of H3 core with the new polygon to cells algorithm as polygonToCellsExperimental, as it currently is on the master branch? What are you doing for own bindings (h3-js, h3-py)?

@jmealo Thanks for moving on this. I don't know how to do ST_IsValid from the C-side, currently I'm doing a lot of SQL layer PostGIS calls in the h3-postgis extension. Let's continue that discussion in #157.

@dfellis
Copy link

dfellis commented Oct 14, 2024

We are planning to release H3 4.2.0 very soon (within the next couple of days) and polygonToCellsExperimental will be there, but the current plan is to continue ignoring it in the bindings until the fuzzer problem is resolved, then it will replace the current polygonToCells.

We are thinking about potentially keeping the old algorithm around for a minor release as polygonToCellsLegacy or something like that, but we aren't sure about that. We know the new one fixes bugs with polygons around the poles, is faster, and uses less memory, but we're uncomfortable making it the main algorithm until we resolve the memory access issue, and maybe there is someone dependent on the implementation details of the current polygonToCells.

But in either case, you will have access to it soon, but be careful on what kind of data you use it for, at the moment.

@zachasme
Copy link
Owner

That's great to hear @dfellis! I'll keep the bindings in sync with whatever you release 👍 Thanks for the good work on H3!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants