Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check latitude and longitude validity in h3_geo_to_h3 function #41

Merged
merged 5 commits into from
Aug 27, 2020

Conversation

trylinka
Copy link
Contributor

@trylinka trylinka commented Aug 14, 2020

h3_geo_to_h3 function takes to input point. If we use point, we consider that point is measured into degree values. The main problem that there is no checking that point is within longitude and latitude values. This validation check input coordinates to see if their values make a sense in degrees.

but got %f.", geo->x added for checking in logs value that function got.

@trylinka trylinka force-pushed the h3_geo_to_h3-coordinates-check branch 5 times, most recently from e668ef6 to cbae81e Compare August 14, 2020 16:32
@@ -36,6 +36,9 @@ h3_geo_to_h3(PG_FUNCTION_ARGS)
H3Index idx;
GeoCoord location;

ASSERT_EXTERNAL(geo->x >= -180 && geo->x <= 180, "Longitude must be between -180 and 180 degrees inclusive, but got %f.", geo->x);
ASSERT_EXTERNAL(geo->y >= -90 && geo->y <= 90, "Latitude must be between -90 and 90 degrees inclusive, but got %f.", geo->y);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybefabs can help reduce the number of checks

ASSERT_EXTERNAL(fabs(geo->x) <= 180, "Longitude must be between -180 and 180 degrees inclusive, but got %f.", geo->x);
ASSERT_EXTERNAL(fabs(geo->y) <= 90, "Latitude must be between -90 and 90 degrees inclusive, but got %f.", geo->y);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @AbelVM,

These two compile to almost the same in -O2: https://godbolt.org/z/nGnGPn - but two-check+or has an option to exit a bit faster, after first check fails. abs() gets morphed into two checks anyway.

If you merge the two into one single check they become absolutely the same machine code:
https://godbolt.org/z/hEW4TG

So doesn't really matter.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, C is not exactly my jam and in PL/pgSQL that is a classic optimization indeed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Anything in PL/pgSQL is a postgres function call (including operator) which is a heavy operation. In C, simple comparisons are single assembly instructions (=fast) and function calls are often left as function calls (=slow). I am not sure but believe that with JIT enabled the same effect could be observed in Postgres too (in SQL, not PL/pgSQL though).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some optimization fences in Postgres planner and, sadly, JIT can render your query even way slower, especially in parallelized plans, so it depends on the very case. That's why JIT is switched on and off in a heuristic-like way in pg12... ¯\(ツ)

@Komzpa
Copy link
Contributor

Komzpa commented Aug 15, 2020

Got bitten by this a couple times. Feeding a geometry in spherical mercator works but produces insane hex id, breaking all the math further without notice. A failure on the spot would be much better.

@zachasme zachasme self-assigned this Aug 24, 2020
@zachasme zachasme added the enhancement 🚀 New feature or request label Aug 24, 2020
@zachasme
Copy link
Owner

Thanks for the PR, @trylinka! 👍

The core library already does lat/lng normalization (see uber/h3#287) such that:

geo_to_h3(lat, lng) == geo_to_h3(lat,       lng + 360 )
                    == geo_to_h3(lat + 360, lng       )

We're trying to keep these postgres bindings functionally as close as possible to the core library as well as the other bindings. Merging this as default behavior would break code depending on the wrapping done by the core library.

However, I'd be happy to merge this if you put the assertion behind a flag, like we did for h3_to_geo_boundary and the extend_at_meridian flag.

@Komzpa can you give an example producing an insane hex id?

@Komzpa
Copy link
Contributor

Komzpa commented Aug 24, 2020

@zachasme the problem here lies outside of normalization problem and is more of ecosystem compatibility one. It is very common to have coordinates in metric system instead of degrees (like EPSG:3857, EPSG:3395 or other flavour of Mercator), and there are no safeguards in place to prevent this from being treated as meters by h3 library. So you can group by h3_geo_to_h3('SRID=3857; POINT (1000 10000)') and it will finish successfully. Resulting map makes no sense though, being almost completely covered in hexagons. This PR adds a desired safeguard.

Do you see a different way to implement it? I see that pg_h3 is using postgres point so checking the srid cannot be reliably performed, and it looks like sacrificing wraparound for this check is a good idea.

@zachasme
Copy link
Owner

zachasme commented Aug 24, 2020

Okay that makes sense.

What if we put the safeguard inside the PostGIS wrapped h3_geo_to_h3(geometry, integer)? Maybe even enforce degree coordinates instead of checking if a value is within a valid range? Right now it is a simple typecast:

CREATE OR REPLACE FUNCTION h3_geo_to_h3(geometry, resolution integer) RETURNS h3index
    AS $$ SELECT h3_geo_to_h3($1::point, $2); $$ IMMUTABLE STRICT PARALLEL SAFE LANGUAGE SQL;

But we could add a SRID check or something similar:

CREATE OR REPLACE FUNCTION h3_geo_to_h3(geometry, resolution integer) RETURNS h3index
    IMMUTABLE STRICT PARALLEL SAFE AS $$
        BEGIN
            IF ST_SRID($1) <> 4326 THEN
                RAISE EXCEPTION 'h3_geo_to_h3 using geometry must have srid 4326';
            END IF;
            RETURN h3_geo_to_h3($1::point, $2);
        END;
    $$ LANGUAGE PLPGSQL;

That would solve the ecosystem compatibility problem for users operating on PostGIS types, while still directly exposing the core library functions using native point types (which we need for one of our use-cases).

@Komzpa
Copy link
Contributor

Komzpa commented Aug 26, 2020

I'd say it will be a good way, except that for performance reasons the current best performing way to use h3_pg is select h3_geo_to_h3(geom::point, 8) (as it evades problem with inlining not working sometimes and minimizes function calls count).

PL/PGSQL will be very slow. I was thinking about injecting an ST_Transform(geom, 4326) into the chain - it will break support for srid=0 geometries, but then will magically work right for all the SRIDs. And geography cast can skip the check.

The beautiful perfect world way would be to get a way from PostGIS to read the types in other extensions and make the function in C, or, maybe, do the licensing dance and bundle it as part of postgis. The first way is lengthier but is also desired by extensions like MobilityDB.

I'd say that in practice I haven't observed a valid dataset that has 180 wraparound problems, so I'd merge this PR and mark the rest of ideas as some kind of roadmap :) A compromise can be to limit the wraparound check to only latitude - there is no reasonable way to go to lat=-91 (in contrast to lon=-181) and that alone will catch a lot of issues already.

@AbelVM
Copy link

AbelVM commented Aug 26, 2020

Taking this issue in the core library into account looks like this check is not needed, actually

@zachasme
Copy link
Owner

I've removed the check in C, and added an ST_Transform to the geometry function (as suggested by Kompza).

That means meters are converted to degrees but srid=0 throws:

\set meter ST_SetSRID(ST_Point(6196902.235389061,1413172.0833316022), 3857)
\set degree ST_SetSRID(ST_Point(55.6677199224442,12.592131261648213), 4326)

SELECT h3_geo_to_h3(:meter, :resolution) = '8a63a9a99047fff';
 t

SELECT h3_geo_to_h3(:degree, :resolution) = '8a63a9a99047fff';
 t

SELECT h3_geo_to_h3(ST_Point(6196902, 1413172), 1);
  -- ERROR:  ST_Transform: Input geometry has unknown (0) SRID

What are your thought on this solution @Kompza, @AbelVM, @trylinka?

@Komzpa
Copy link
Contributor

Komzpa commented Aug 26, 2020

I would love this to also throw an error:

SELECT h3_geo_to_h3(:meter ::point, :resolution) = '8a63a9a99047fff';

Simple solution will be to keep the Y check from original PR too, as proper looping across pole seemingly not implemented anyway:

    ASSERT_EXTERNAL(geo->y >= -90 && geo->y <= 90, "Latitude must be between -90 and 90 degrees inclusive, but got %f.", geo->y);

@AbelVM
Copy link

AbelVM commented Aug 26, 2020

I'd go with just a documented enforcement of using 4326 for the input, and avoid extra SRID checks or reprojections that will hit the performance even if the input is actually 4326. Checking the ranges for lat-lon should be enough. My two cents.

@zachasme
Copy link
Owner

zachasme commented Aug 26, 2020

@Komzpa Looping across pole should be working fine:

\set lon 55.6677199224442
\set lat 12.592131261648213
SELECT h3_geo_to_h3(POINT(:lon,       :lat), 7)
     = h3_geo_to_h3(POINT(:lon + 360, :lat), 7);
 t

SELECT h3_geo_to_h3(POINT(:lon, :lat      ), 7)
     = h3_geo_to_h3(POINT(:lon, :lat + 360), 7);
 t

I still think it's a bad idea to diverge from core library behavior in the bindings.

I think the best solution for now is to merge the original PR behind a flag:

h3_geo_to_h3(POINT(360, 12), 7) --wraps
h3_geo_to_h3(POINT(360, 12), 7, true) --throws

@AbelVM Do you have an estimate for the performance hit for ST_Transform to the same SRID (so no reprojection)?

@zachasme zachasme force-pushed the h3_geo_to_h3-coordinates-check branch from 09c0543 to df714bb Compare August 26, 2020 12:25
@AbelVM
Copy link

AbelVM commented Aug 26, 2020

Even if it does not reproject when same SRID, it checks whether is the same or not, so that has a cost. The code for ST_transform lives at https://github.com/postgis/postgis/blob/master/postgis/lwgeom_transform.c#L47

@zachasme zachasme merged commit 5e1c433 into zachasme:master Aug 27, 2020
@zachasme
Copy link
Owner

I'll keep an eye on your core library PR @Komzpa. 😃 For now, I've merged the original validation behind a strict boolean flag passed to all variants of the h3_geo_to_h3 function.

h3_geo_to_h3(POINT(360, 12), 7) --wraps
h3_geo_to_h3(POINT(360, 12), 7, true) --throws

@Komzpa
Copy link
Contributor

Komzpa commented Aug 27, 2020

I'd say it will make sense to flip the default to be strict, and to disable it by passing strict=false - we're fighting a lack of user's attention and without attention you will skip the strict flag the same way.

@zachasme
Copy link
Owner

Okay I'm releasing v3.7.0 soon, and I've removed the flag argument in favor of a GUC setting.

Adding h3.strict = on to postgresql.conf should enforce input validation.

That's a bit better than having to remember the flag every time 😄

@Komzpa
Copy link
Contributor

Komzpa commented Oct 4, 2020

Yay!

18:45:58 [gis] > alter system set h3.strict to on;
ALTER SYSTEM
Time: 2,129 ms
18:48:15 [gis] > select pg_reload_conf();
┌────────────────┐
│ pg_reload_conf │
├────────────────┤
│ t              │
└────────────────┘
(1 row)

Time: 0,883 ms

18:48:59 [gis] > show h3.strict;
┌───────────┐
│ h3.strict │
├───────────┤
│ on        │
└───────────┘
(1 row)

Time: 0,337 ms
18:49:05 [gis] > select h3_geo_to_h3('(200,200)'::point,8);
ERROR:  38000: Longitude must be between -180 and 180 degrees inclusive, but got 200.000000.
LOCATION:  h3_geo_to_h3, indexing.c:41
Time: 0,467 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🚀 New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants