Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WKB Support for PostGIS Geometry Columns #5580

Closed
njanakiev opened this issue Oct 17, 2020 · 15 comments
Closed

Add WKB Support for PostGIS Geometry Columns #5580

njanakiev opened this issue Oct 17, 2020 · 15 comments
Labels
enhancement New feature or request

Comments

@njanakiev
Copy link

Hello, thanks for the great project!

While trying to access geometry columns with the PostgreSQL connector, I wasn't able to access the PostGIS geometry column inside a table:

SELECT geometry 
FROM  postgresql.public.countries 
LIMIT 5;
Column 'geometry' cannot be resolved

The same query works with Python together Pandas and SQLAlchemy:

import sqlalchemy
import pandas as pd

engine = sqlalchemy.create_engine("postgres://user:pass@localhost:5432/db")

df = pd.read_sql("""
  SELECT geometry
  FROM  postgresql.public.countries 
  LIMIT 5;
""", engine)

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   geometry  5 non-null      object
dtypes: object(1)
memory usage: 168.0+ bytes

As far as I understand PostGIS stores geometries in the WKB format including more metadata like SRID:

import shapely.wkb

g = df['geometry'][0]
geom = shapely.wkb.loads(bytes.fromhex(g))

I think this would be a great addition to do spatial joins with federated queries by simply treating the geometry columns as WKB without the projection for example. PostGIS differentiates between Geometries and Geography. Geometry columns are more common from what I know.

Tested on Presto version 344 (with Java 11)

@findepi findepi added the enhancement New feature or request label Oct 17, 2020
@lkraider
Copy link

lkraider commented Sep 13, 2021

GeoAlchemy2 has wrapper classes for parsing and handling them (eg. WKBElement)

@ninexy
Copy link

ninexy commented Sep 18, 2021

i have the same erro too. can i have some way to read postgis geometry by trino?

@hashhar
Copy link
Member

hashhar commented Sep 18, 2021

@slhmy
Copy link

slhmy commented Nov 2, 2021

As far as I understand PostGIS stores geometries in the WKB format including more metadata like SRID:

It's true.

And If data SRID is not necessary, should we implement this by simply adding ST_AsBinary around Geometry type column when it's reading into memory?

I achieved this by overriding buildSql method in PostgreSqlClient.java:

    @Override
    public PreparedStatement buildSql(ConnectorSession session, Connection connection, JdbcSplit split, JdbcTableHandle table, List<JdbcColumnHandle> columns)
            throws SQLException
    {
        Map<String, String> supposedColumnExpressions = new HashMap<>();
        for (JdbcColumnHandle column : columns) {
            JdbcTypeHandle jdbcTypeHandle = column.getJdbcTypeHandle();
            if (jdbcTypeHandle.getJdbcTypeName().isPresent() && jdbcTypeHandle.getJdbcTypeName().get().equals("geometry")) {
                String columnName = column.getColumnName();
                log.info("Find geometry type, changing '%s' to '%s'", columnName, "ST_AsBinary(\"" + columnName + "\")");
                supposedColumnExpressions.put(columnName, "ST_AsBinary(\"" + columnName + "\")");
            }
        }

        ImmutableMap.Builder<String, String> builder = ImmutableMap.builder();
        for (Map.Entry<String, String> entry : supposedColumnExpressions.entrySet()) {
            builder.put(entry.getKey(), entry.getValue());
        }
        Map<String, String> columnExpressions = builder.build();

        PreparedQuery preparedQuery = prepareQuery(session, connection, table, Optional.empty(), columns, columnExpressions, Optional.of(split));
        return new QueryBuilder(this).prepareStatement(session, connection, preparedQuery);
    }

To see the difference, I have a simple query: SELECT * FROM postgis.public.points; with unsupported-type-handling=CONVERT_TO_VARCHAR on in catalog properties.
This is the output before changing:

trino> SELECT * FROM postgis.public.points;
 fid |   name    |                    geom                    
-----+-----------+--------------------------------------------
   1 | big mom   | 0101000000000000000000F03F000000000000F03F 
   2 | small dad | 0101000000000000000000F03F000000000000F03F 
(2 rows)

This is the output after changing:

trino> SELECT * FROM postgis.public.points;
 fid |   name    |                     geom                     
-----+-----------+----------------------------------------------
   1 | big mom   | \x0101000000000000000000f03f000000000000f03f 
   2 | small dad | \x0101000000000000000000f03f000000000000f03f 
(2 rows)

The result of geom has a small change which I think this illustrates ST_AsBinary function works.
Now I'm trying to link the type to Trino's Geometry. But I failed to get 'Geometry' Type by typeManager, check: #9845.

If you have any suggestions, feel free to comment.

@slhmy
Copy link

slhmy commented Nov 9, 2021

Hi, some more progress here(keep up with the following changes).
Currently I try more to implement Geometry reading by:

Adding ColumnMapping function in PostgreSqlClient.java(need to import geospatial plugin)

    private static ColumnMapping geometryColumnMapping()
    {
        return ColumnMapping.sliceMapping(
                GEOMETRY,
                (resultSet, columnIndex) -> stGeomFromBinary(wrappedBuffer(resultSet.getBytes(columnIndex))),
                (statement, index, value) -> { throw new TrinoException(NOT_SUPPORTED, "Geometry type is not supported for INSERT"); },
                DISABLE_PUSHDOWN);
    }

And adding lines in toColumnMapping function in PostgreSqlClient.java

        switch (jdbcTypeName) {
            case "geometry":
                return Optional.of(geometryColumnMapping());
            case "money":
                return Optional.of(moneyColumnMapping());

Finally I got result with a little trick like:

trino> SELECT geom FROM postg.public.points;
    geom     
-------------
 POINT (1 1) 
 POINT (1 1) 
(2 rows)

Query 20211109_024831_00006_d6k7e, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0.25 [2 rows, 0B] [8 rows/s, 0B/s]

The reason for my trick is related with the type verifying process, specifically related with lines in JdbcRecordCursor in jdbc plugin:

                verify(
                        columnHandle.getColumnType().equals(columnMapping.getType()),
                        "Type mismatch: column handle has type %s but %s is mapped to %s",
                        columnHandle.getColumnType(), columnHandle.getJdbcTypeHandle(), columnMapping.getType());

I can't pass this verification until I override GeometryType's equal method.
If I don't do that I will be caught with error: Type mismatch: column handle has type Geometry but JdbcTypeHandle{..} is mapped to Geometry

I guess there is some problem in GeometryType for it's never used in a `columnHandle, I'll update clues in #9845 to follow up with the above problem.
There is also a dialogue in slack channel: https://app.slack.com/client/TFKPX15NC/CGB0QHWSW/thread/CGB0QHWSW-1636012275.260300

@findepi @hashhar Sorry for bothering, but I have been working on the following problem for days and didn't get significant progress. I think with this problem solved we'll be very close to the final solution.

@slhmy
Copy link

slhmy commented Nov 10, 2021

I closed #9845. I found getting GEOMETRY type by typeManager is the right usage:

this.jsonType = typeManager.getType(new TypeSignature(JSON));

Now I'm really close to the final solution. One more thing to pass through is to pass postgres-connector's test.
GEOMETRY type haven't been register to testing environment.
After this problem is solved I'll consider creating a PR for this issue. After merge, we'll have WKB read ability to PostGIS Geometry Columns.

@slhmy
Copy link

slhmy commented Nov 13, 2021

@njanakiev, @lerenah I've created a PR for this issue.
But I suppose it won't be merged very quickly. You can try the connector in my branch.

@lerenah
Copy link

lerenah commented Dec 7, 2021

@njanakiev, @lerenah I've created a PR for this issue. But I suppose it won't be merged very quickly. You can try the connector in my branch.

Thanks, will give it a shot!

@DeoLeung
Copy link

DeoLeung commented May 6, 2022

any luck the geometry pr will be merged soon? we really want to try it out with our postgis databases :)

@slhmy
Copy link

slhmy commented May 6, 2022

any luck the geometry pr will be merged soon? we really want to try it out with our postgis databases :)

If possible I will give some time to retry solving this issue. The last PR is holding for too long time, I my self don't think it could be successfully merged.

@DeoLeung
Copy link

any luck the geometry pr will be merged soon? we really want to try it out with our postgis databases :)

If possible I will give some time to retry solving this issue. The last PR is holding for too long time, I my self don't think it could be successfully merged.

we patched your pr and played a bit. yes it shows geometry better, we could do some cross db sjoin now :)
btw, it's there any spatial function on where/join push down implemented? it seems scanning the whole table

@slhmy
Copy link

slhmy commented May 27, 2022

btw, it's there any spatial function on where/join push down implemented? it seems scanning the whole table

No, currently these pushdowns should be implemented by yourself. Filter expression can be easily implemented in JDBC connectors if you understand the usage of expression rewriter in #7994.
But these functionality could be very limited and risky currently.

I start focusing on these issues this time last year, but moved to another work this year. And since I can see your company in your profile. I guess you are working on similar thing like last year I do. To support better spatial query ability, Trino definitely has a long way to go, so you should be very careful to think about the cost if you decide to use Trino.

@umartin
Copy link
Contributor

umartin commented Nov 21, 2024

I think this issue could be closed since PR #24053 was merged

@hashhar
Copy link
Member

hashhar commented Nov 21, 2024

Thanks for reminding @umartin, indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging a pull request may close this issue.

9 participants