Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Pygeofilter supporting various CRS other than 4326? #66

Closed
Alex-NRCan opened this issue Jan 23, 2023 · 4 comments · Fixed by #88
Closed

Is Pygeofilter supporting various CRS other than 4326? #66

Alex-NRCan opened this issue Jan 23, 2023 · 4 comments · Fixed by #88

Comments

@Alex-NRCan
Copy link

Summary

Hi! I've been working on pygeoapi for the past couple weeks and a recent PR #964 added the possibility to use CQL to query features inside the pygeoapi PostgreSQL provider ! :)

The implementation of this new feature uses pygeofilter as it references the to_filter function to translate ECQL AST to a Django Query expression. The call to to_filter from pygeoapi can be found here.

While using this new pygeoapi CQL feature, I have come across an issue that could be coming from pygeofilter? I'm creating this issue more as a discussion topic, because I'd like to know more information about the way pygeofilter is meant to be used with geometries using various SRID. I'm interested to know in order to see if I can work around my issue or if it's something that might bring an idea to someone working in this current repository.

Context

In order to provide context, here's an example of my usage in pygeoapi and the steps to reproduce my issue:

  1. In pygeoapi, via a service request handler, I'm receiving a CQL written as such: INTERSECTS(geom, POLYGON((-103.7402 55.4461, -103.7402 55.8272, -103.2597 55.8272, -103.2597 55.4461, -103.7402 55.4461))).

  2. pygeoapi starts by using the parse function to parse the CQL string. To the best of my understanding, this uses Lark to parse the string and generate a result ref: Lark's parse. The call to pygeofilter's parse from pygeoapi can be found here

  3. Then, in pygeoapi, this result is sent to the aforementioned to_filter function in order to finally send the cql filters results to an sqlalchemy query like so: session.query(self.table_model).filter(cql_filters). This call from pygeoapi can be found here.

  4. Everything seems to be great and I'm pretty convinced that, if the CQL has a geometry that has its coordinates in SRID=4326 or maybe the same SRID as the data, everything would be working fine. That being said, I believe that I've stumbled on this particular issue, because our data is stored in SRID=3978, not in SRID=4326(!). So, when I execute this CQL on our point layer, in 3978, sqlachemy understandably fails and gives this error: sqlalchemy.exc.InternalError: (psycopg2.errors.InternalError_) ST_Intersects: Operation on mixed SRID geometries (Point, 3978) != (Polygon, 4326) and [SQL: SELECT {....} FROM {....} WHERE ST_Intersects({geom_field}, ST_GeomFromEWKT(%(ST_GeomFromEWKT_1)s)).

  5. Therefore, the first thing I've tried (even if that's not what we want in pygeoapi) is to send a CQL with a POLYGON that's using coordinates in SRID=3978 (the same SRID as the data) just to attempt to get results from the query. I used this CQL: INTERSECTS(geom, POLYGON((-574624 735257, -574624 870115, -417433 870115, -417433 735257, -574624 735257))). It failed with the same error message. This was telling me something was probably forcing it in SRID=4326.

  6. I then tried to send a CQL in which I was explicitly writing the SRID of the POLYGON using a PostGIS EWKT format as such: INTERSECTS(geom, SRID=3978;POLYGON((-574624 735257, -574624 870115, -417433 870115, -417433 735257, -574624 735257))) to attempt to force the geometry in 3978. It was still giving me the same error about a 4326 incompatibility.

  7. At this point, I thought either (1) pygeofilter is not meant to be used with a SRID other than 4326, or (2) it's not supporting the EWKT format. I was thinking I had to abandon using to_filter altogether and try to find a different way to make it work for CQL inside pygeoapi. However, before doing so, I jumped in GitHub to try to understand the implementation of to_filter and something caught my attention. Is it possible that, with all the formatting that happens (impressive work btw!), the function parse_geometry is called at some point?

image

If so, this might explain the issue that I'm experiencing. This parse_geometry function starts by calling as_shape** from the pygeoif library and then calls to_wkt** again from the pygeoif library. Afterwards, it uses a regular expression to search for a SRID={some_number}; in the geometry WKT string to determine what kind of SRID should be used in ST_GeomFromEWKT to be returned. Even though, using rhs variable on the geometry, I validate that the SRID 3978 is defined (Geometry(geometry={'type': 'Polygon', 'coordinates': (((-574624.0, 735257.0), (-574624.0, 870115.0), (-417433.0, 870115.0), (-417433.0, 735257.0), (-574624.0, 735257.0)),), 'crs': {'type': 'name', 'properties': {'name': 'urn:ogc:def:crs:EPSG::3978'}}})); after the object goes through as_shape and to_wkt it loses its SRID. Actually, my understanding is that a WKT never has a SRID - when a WKT has a SRID it's called an EWKT - so I think that's normal that a SRID can't be read ever? In my particular case, it seems the SRID={number}; is never found and the default SRID=4326; is used, all the time, in the response, possibly causing my issue.

My question, finally! :), is it possible to use pygeofilter CQL on geometries that are in other SRIDs? Maybe the parse_geometry function should have an additional parameter like the similar parse_bbox function?

Thank you for taking the time to read!

TL;DR

Is it possible the function parse_geometry always responds with an SRID=4326 when returning the ST_GeomFromEWKT? Making pygeofilter exclusively(?) work with geometries in SRID=4326 projection? Is there a way to work with other SRID values for a geometry in pygeofilter CQL? Maybe the parse_geometry function should have an additional parameter like the similar parse_bbox function?

Footnotes

** pygeofilter (and by extension pygeoapi) is referencing an old version of pygeoif (version==0.7 which dates back from sometime near 2012 maybe?). This is on purpose as it seems that the latest version of pygeoif has caused issues recently and so this commit addresses this. That is why the references above point to other places in pygeoif. Indeed, the function as_shape in pygeoif's Geometry class doesn't exist anymore. It's now in PyGeoif's Factories class. Also, the function to_wkt in pygeoif's Geometry class has been renamed to wkt.

@Alex-NRCan
Copy link
Author

Alex-NRCan commented Mar 23, 2023

We've noticed that pygeoapi has recently unpinned PR the pygeoif version of 0.7 which makes this issue a bit more relevant as it's happening on the latest code of pygeofilter, rather than on old code.

I'm therefore updating the issue with this message.

The new source code of pygeofilter being used by pygeoapi is now this:
image

Unfortunately for us, that code still doesn't work for the recent cql implementation in PyGeoAPI PR #964

I'm copying another implementation of the parse_geometry that works better for us (and shouldn't break the current implementation) in case it might interest somebody:

def parse_geometry(geom):
    # Read the wkt
    wkt = shape(geom).wkt

    # Default srid (like og code)
    sridtxt = "SRID=4326;"

    # If the crs is defined in the geom json object
    if 'crs' in geom and \
       'properties' in geom['crs'] and \
       'name' in geom['crs']['properties']:
        # Read the srid
        srid = re.search(r'\d+$', geom['crs']['properties']['name'])

        # If the srid number was read format the sridtxt
        if srid:
            sridtxt = f"SRID={srid.group(0)};"

    return func.ST_GeomFromEWKT(f"{sridtxt}{wkt}")

That being said, even with this implementation, pygeoapi still struggles when querying data other than 4326, due to its cql implementation (or limitation(?)). So, I'm thinking of moving the issue over to pygeoapi instead and we could close this one with or without the suggested parse_geometry implementation.

Thank you for reading!

@bitner
Copy link
Contributor

bitner commented Apr 26, 2023

@Alex-NRCan The OGC CQL2 specification https://portal.ogc.org/files/96288 expects any geometry filters to come in as WGS 84.

If the OGC API supports "Part 2" of the specification, then an additional filter-crs property is required (https://portal.ogc.org/files/96288#filter-filter-crs) in order to specify the input crs.

@Alex-NRCan
Copy link
Author

Alex-NRCan commented Apr 26, 2023

@bitner Thank you for your reply!

Yes indeed, it'd be great to be able to use such a filter-crs parameter in pygeoapi. Maybe it'll come soon, now that a filter parameter (for the cql) has been added? That said, even if it would become the case and filter-crs would be added, I believe that the issue that I'm pointing out in here would remain - at least as far as cql is concerned. Indeed, at the time that I created the issue, the cql code wasn't able to use shapes in another crs, because pygeofilter was always assuming 4326 (as it's hardcoded and I couldn't make the condition work differently, except for my suggested code change above).

Here at National Resources Canada, we're using forked code for the spatial queries and I'm not using the filter parameter, because we're working with 3978 (not 4326). In our fork, I'm using parameters that I've called geom and geom-crs which are basically the equivalent of filter and filter-crs, to support geometries in another crs. However, it doesn't support full-blown cql, just spatial filtering, which is fine for our needs.

@ricardogsilva
Copy link
Member

@bitner @Alex-NRCan

I'm looking into this with the intent of improving the existent pygeoapi support for CQL filtering and it seems to me that there is indeed a bug with pygeofilter not preserving a geometry's CRS information when evaluating an expression for sqlalchemy - either that or I'm reading it wrong 😛 - bear with me for a bit:

Lets say I have a postgis table with a geom geometry column that uses epsg:3004 - this is thus my storage CRS (in OGC API - Features part 2 parlance)

Now I'm trying to use a filter that specifies a point in the native CRS of the data, like this:

$ from pygeofilter.parsers import ecql
$ parsed_expression = "DWITHIN(geometry, SRID=3004;POINT(2322807.3173735966 4631504.384268367), 10, meters)"
$ print(parsed_expression)
DistanceWithin(
    lhs=ATTRIBUTE geometry, 
    rhs=Geometry(
        geometry={
            'type': 'Point', 
            'coordinates': (2322807.3173735966, 4631504.384268367), 
            'crs': {
                'type': 'name', 
                'properties': {'name': 'urn:ogc:def:crs:EPSG::3004'}
            }
        }
    ), 
    distance=10, 
    units=Token('METERS', 'meters')
)

As you can see, pygeofilter is able to parse the text and correctly keeps a reference to the CRS in the generated pygeofilter.ast.Geometry instance.

However, when I evaluate this to a sqlalchemy expression, pygeofilter looses this information, and I get this:

from pygeofilter.backends.sqlalchemy.evaluate import to_filter

# the definition of `table_field_mappings` is not relevant here, so I omitted it
$ db_filter = to_filter(parsed_expression, table_field_mappings)
$ print(db_filter.compile(compile_kwargs={"literal_binds": True}))
ST_DWithin(
    atlante.beni_confiscati.geometria, 
    ST_GeomFromEWKT(
        'SRID=4326;POINT (2322807.3173735966 4631504.384268367)'
    ), 
    10
)

As you can see pygeofilter disregarded its own information about the CRS of the point I originally used and sticked SRID=4326 in there instead - this query fails miserably, as postgis complains about mismatched geometry CRSs 😢 .

As pointed out by @Alex-NRCan , the problem lies in the implementation of pygeofilter.backends.sqlalchemy.filters.parse_geometry():

def parse_geometry(geom):
wkt = shape(geom).wkt
search = re.search(r"SRID=(\d+);", wkt)
sridtxt = "" if search else "SRID=4326;"
return func.ST_GeomFromEWKT(f"{sridtxt}{wkt}")

This function calls pygeoif.shape(), which discards the CRS information, and then tries to find a SRID in the result - however, this will always fail, as pygeoif discarded the CRS when converting to its own representation.

This seems fixable and I'm willing to submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants