-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use table schema from the table handle #14076
Use table schema from the table handle #14076
Conversation
Looks like there are some other places we do
I think only the first one really matters |
@alexjo2144 this is actually a point that I wanted to bring in this PR - |
Yeah I think we should prefer the schema in the Handle. The pattern I was looking for here were methods which called |
let's make sure cleanups like this and the bug fix come in separate commits |
I created a separate PR to avoid cluttering the current changes |
Due to internal caching within the method `org.apache.iceberg.ManifestGroup.planFiles` the returned file scan tasks may contain an invalid split schema string. Rely on the table schema from the table handle while reading from AVRO data files.
In the context of the dealing with an Iceberg table with a structure which evolves over time (columns are added / dropped) in case of performing a snapshot/time travel query, the schema of the output matches the corresponding schema of the table snapshot queried.
5940942
to
009b735
Compare
@@ -312,7 +312,7 @@ else if (identity.getId() == TRINO_MERGE_PARTITION_DATA) { | |||
partitionSpec.specId(), | |||
split.getPartitionDataJson(), | |||
split.getFileFormat(), | |||
split.getSchemaAsJson().map(SchemaParser::fromJson), | |||
SchemaParser.fromJson(table.getTableSchemaJson()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to internal caching within the method
org.apache.iceberg.ManifestGroup.planFiles
the returned file scan tasks may contain an invalid split
schema string.
is it testable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it it testable through io.trino.plugin.iceberg.TestIcebergAvroConnectorTest
.
I was reluctant on squashing the two commits of this PR because they address different issues.
The test io.trino.plugin.iceberg.TestIcebergAvroConnectorTest
covers both of the issues.
ImmutableMap.Builder<String, ColumnHandle> columnHandles = ImmutableMap.builder(); | ||
for (IcebergColumnHandle columnHandle : getColumns(icebergTable.schema(), typeManager)) { | ||
for (IcebergColumnHandle columnHandle : getColumns(SchemaParser.fromJson(table.getTableSchemaJson()), typeManager)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good change.
However, it looks like we call SchemaParser.fromJson(tableHandle.getTableSchemaJson())
multiple times on one table handle. Am i right?
SchemaParser.fromJson
does cache internally (on a static field).
This isn't ideal, and we could better, caching within table handle object. Not sure it matters though -- depends how frequently this is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we switch to SchemaParser.fromJson(JsonNode)
fromJson(JsonUtil.mapper().readValue(jsonKey, JsonNode.class))
?
Description
In the context of the dealing with an Iceberg table with a structure which evolves over time (columns are added / dropped) in case of performing a snapshot/time travel query, the schema of the output matches the corresponding schema of the table snapshot queried.
Fixes #14064
Relates to #12786
Non-technical explanation
In the context of time travel queries, use the table schema corresponding to the snapshot of the table queried
for retrieving the columns of the output.
Release notes
( ) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: