Questions about native encodings with geometry_types #239

bcb44-esri · 2024-07-09T00:45:52Z

bcb44-esri
Jul 9, 2024

I'm implementing a geoparquet reader using the native encodings and had some questions about the behavior of the encodings when different geometry types are set

union geometry types - based on the discussion here (Add GeoArrow encoding as an option to the specification #189 (comment)) it seems like union types are not supported in the new native encoding. How should we handle that if we have a geometry_type that is a union but the encoding is a native type? Right now I'm erroring out but wanted to make sure that's the intended behavior
z values - does a 3d type in the geometry_type field (ie "POINT Z") add a required z field to the parquet point encoding?

general question - We use M values a lot and one thing that's going to stop us from using the new native encodings is that those aren't supported. Are there any plans to support that? I know they're also not supported on WKB either but since that's a well defined spec, we're just writing them out anyway and letting wkb parsers figure it out since it seems like most can.

paleolimbot · 2024-07-09T06:00:55Z

paleolimbot
Jul 9, 2024
Collaborator

How should we handle that if we have a geometry_type that is a union but the encoding is a native type?

When writing a geometry column that contains more than one type of geometry, the single-geometry encodings (do we call them "native" anywhere?) aren't appropriate and I would expect that to error. (Apologies if I missed the point there).

Are there any plans to support that?

I am not aware of any discussions about that, but the GeoArrow spec on which it is based does support M coordinates and I don't think there would be any debate about how we would store them. geoarrow-pyarrow, for example, implements that extension:

import pyarrow as pa
from geoarrow import pyarrow as ga
from geoarrow.pyarrow import io

tab = pa.table({"geom": ga.as_geoarrow(["POINT ZM (1 2 3 4)"])})
io.write_geoparquet_table(tab, "out.parquet", geometry_encoding=io.geoparquet_encoding_geoarrow())

io.read_geoparquet_table("out.parquet").schema.field("geom").type.storage_type
#> StructType(struct<x: double, y: double, z: double, m: double>)

It would be helpful to document your use case for M coordinates...I happen to agree that supporting them is important (for completeness with existing specifications), but I struggle to find real-world use cases for Parquet where this matters (not because it doesn't, but because I am more of a tool developer than a geospatial data user these days!)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about native encodings with geometry_types #239

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Questions about native encodings with geometry_types #239

bcb44-esri Jul 9, 2024

Replies: 1 comment

paleolimbot Jul 9, 2024 Collaborator

bcb44-esri
Jul 9, 2024

paleolimbot
Jul 9, 2024
Collaborator