Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark-3.3: Bug Fix for Reading Metadata tables with Snapshot ID #6980

Merged
merged 7 commits into from
Mar 6, 2023

Conversation

sungwy
Copy link
Contributor

@sungwy sungwy commented Mar 1, 2023

fixes #6978

Will backport the solution to 3.2 and 3.1 if this is approved

@github-actions github-actions bot added the spark label Mar 1, 2023
Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding the issue. I think I've seen this before but didn't dig deep and didn't realize its related to schema evolution.

I think its root caused by : #1508. and maybe it's been around for different versions.

I do think we should implement that pr's feature for metadata tables (time travel using the right schema). It will matter for tables like files_table, where readable_metrics columncould be different based on different schemas.

That being said, I don't oppose fixing the bug now, but will like if we can raise an issue to track it.

Added my comments in the code.

@szehon-ho
Copy link
Collaborator

Also FYI @aokolnychyi , @RussellSpitzer

@nastra nastra self-requested a review March 2, 2023 06:35
@sungwy sungwy changed the title Spark-3.3: Bug Fix for Reading Metadata tables with Snapshot Spark-3.3: Bug Fix for Reading Metadata tables with Snapshot ID Mar 2, 2023
Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually looks good to me, one more comment on the test. Thanks for the changes!

@sungwy
Copy link
Contributor Author

sungwy commented Mar 2, 2023

@szehon-ho I think that should cover all of the requested changes.. Please let me know if this is good to approve! Since this is a pretty important bug fix, I'm hoping we could slot it in for the next release...
I can create ones for Spark-3.2 and Spark-3.1 once you sign off on this PR

@szehon-ho
Copy link
Collaborator

szehon-ho commented Mar 2, 2023

Makes sense, I added it to Iceberg 1.2 milestone. I made a follow up issue to implement this feature for some tables that will be affected , like files table. #6991.

As other reviewers also commented, so will leave a chance for them to take another look.

List<Record> expectedFiles =
expectedEntries(table, FileContent.DATA, entriesTableSchema, expectedDataManifests, null);

Assert.assertEquals("actualFiles size should be 1", 2, actualFiles.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: One more fix here for the assert message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate all your help with these PRs @szehon-ho . Looking forward to 1.2.0 release and being able to work more with metadata tables :)

@szehon-ho szehon-ho merged commit 12bcffb into apache:master Mar 6, 2023
@szehon-ho
Copy link
Collaborator

Merged, thanks @syun64 . We can track any other discussion in follow up , if any

krvikash pushed a commit to krvikash/iceberg that referenced this pull request Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reading as of Snapshot ID fails on Metadata Tables after Iceberg Table Schema Update
2 participants