Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Iceberg files with missing Field IDs #9959

Merged
merged 1 commit into from
Dec 3, 2021

Conversation

alexjo2144
Copy link
Member

@alexjo2144 alexjo2144 commented Nov 15, 2021

Based off of the Dereference Pushdown PR: #8129

Last commit is new.

In progress PR for documenting the Iceberg NameMapping JSON: apache/iceberg#3556

more discussion on #9843

@alexjo2144 alexjo2144 requested review from findepi and phd3 November 15, 2021 20:06
@cla-bot cla-bot bot added the cla-signed label Nov 16, 2021
@alexjo2144 alexjo2144 force-pushed the iceberg/missing-field-ids branch 2 times, most recently from 84d52a5 to 1dc6a09 Compare November 16, 2021 20:12
@alexjo2144 alexjo2144 force-pushed the iceberg/missing-field-ids branch from 1dc6a09 to 9ab109f Compare November 16, 2021 20:48
Copy link
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good generally, can you please rebase?

.forEach(column -> columnsById.put(getIcebergFieldId(column), column));
.forEach(column -> {
String fieldId = (column.getAttributes().get(ORC_ICEBERG_ID_KEY));
if (fieldId != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain the motivation for this? Shouldn't setMissingFieldIds throw for the null case before this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can happen if the column does not have an id and the default name mapping is not present. @findepi suggested maybe we should fail in the case rather than having the column values be null #9843 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with returning null here, similar to Spark.

@alexjo2144 alexjo2144 force-pushed the iceberg/missing-field-ids branch from 9ab109f to b95d365 Compare November 30, 2021 21:29
@alexjo2144
Copy link
Member Author

Rebased and comments addressed, thanks for the review @phd3

@alexjo2144 alexjo2144 force-pushed the iceberg/missing-field-ids branch from b95d365 to d1062c6 Compare December 1, 2021 20:40
Copy link
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexjo2144 LGTM, can you please squash? will merge if there're no other comments.

@alexjo2144 alexjo2144 force-pushed the iceberg/missing-field-ids branch from d1062c6 to 9d6fb5e Compare December 2, 2021 15:48
@alexjo2144
Copy link
Member Author

Squashed and rebased for merge conflicts. Thanks @phd3

@phd3 phd3 merged commit 0ad7b95 into trinodb:master Dec 3, 2021
@github-actions github-actions bot added this to the 365 milestone Dec 3, 2021
@alexjo2144 alexjo2144 deleted the iceberg/missing-field-ids branch May 26, 2022 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants