Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-2471: Add GEOMETRY and GEOGRAPHY logical types #240
base: master
Are you sure you want to change the base?
PARQUET-2471: Add GEOMETRY and GEOGRAPHY logical types #240
Changes from 34 commits
5c9e110
5ef28cd
ecd8cc2
d81dacb
80f4051
0db6d9f
e817af4
f78f7bd
1aaaca8
ee5b2df
19cc081
16c5868
56a65de
f28b282
5127702
298ab64
41c6394
d86abe4
c7a4f4c
f20f685
dbf9d54
b4296aa
9bcea6e
99f0403
dbb78cf
25df0ff
d349727
9ea6559
011de45
6425a3c
7d8ffa5
1502458
9f53c9e
a4f79ca
9fac5e7
eb260e6
bf84d7e
3845edf
41d9cd7
082e31d
df4f18e
9c9c916
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: 'geometry' may be misleading as its both geometry and geography. how about 'A bounding box value has at least...'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A related question: should we rename the
GeometryStatistics
toGeospatialStatistics
to avoid confusion?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Iceberg PR, the
crs
andcrs_encoding
are merged to a singlecrs
field and formatted as$type:$content
, wheretype
allowssrid
andprojjson
. Should we follow the same?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only thing that I am hesitant to keep it consistent with the Iceberg PR. The mapping between them follows the table below:
$type
crs_encoding
$content
crs
The reason is to provide a chance for non-Iceberg use cases to write full projjson content to the
crs
field. But this is not a strong opinion. Let me know what you think.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy with either (they encode equivalent information); however, I think separating them specifically for Parquet is a little nicer. I do like that there is a clear way for a writer to write an arbitrary string if it doesn't have the means to validate that it received PROJJSON (or doesn't have the ability to generate it). I also don't feel strongly about that 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! I think the current way in Parquet is better and we can keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm strongly against the current type parameters. The CRS should not be embedded, it should be referenced like the spec in Iceberg. There's no need to duplicate these, require support for specific formats, or make type annotations huge. A CRS reference ID is cleaner, along with a way to store the CRS definition in file metadata if that's desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm neutral conceptually (it's all the same information, it's just a matter of where to put it). The decision to put it as part of the LogicalType struct just makes a cleaner implementation (i.e., no need to add a reference to the global metadata to existing function signatures) to support conversions that need the CRS (almost all of them, as far as I know).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the
crs
field to be a simple string without restriction. Let me know what you think. @rdblue @paleolimbotThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: looks like this is not used.