-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark-3.3: Bug Fix for Reading Metadata tables with Snapshot ID #6980
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding the issue. I think I've seen this before but didn't dig deep and didn't realize its related to schema evolution.
I think its root caused by : #1508. and maybe it's been around for different versions.
I do think we should implement that pr's feature for metadata tables (time travel using the right schema). It will matter for tables like files_table, where readable_metrics columncould be different based on different schemas.
That being said, I don't oppose fixing the bug now, but will like if we can raise an issue to track it.
Added my comments in the code.
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java
Outdated
Show resolved
Hide resolved
...3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java
Outdated
Show resolved
Hide resolved
...3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java
Outdated
Show resolved
Hide resolved
spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/SimpleExtraColumnRecord.java
Outdated
Show resolved
Hide resolved
Also FYI @aokolnychyi , @RussellSpitzer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually looks good to me, one more comment on the test. Thanks for the changes!
...3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java
Show resolved
Hide resolved
...3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java
Show resolved
Hide resolved
...3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java
Show resolved
Hide resolved
@szehon-ho I think that should cover all of the requested changes.. Please let me know if this is good to approve! Since this is a pretty important bug fix, I'm hoping we could slot it in for the next release... |
Makes sense, I added it to Iceberg 1.2 milestone. I made a follow up issue to implement this feature for some tables that will be affected , like files table. #6991. As other reviewers also commented, so will leave a chance for them to take another look. |
List<Record> expectedFiles = | ||
expectedEntries(table, FileContent.DATA, entriesTableSchema, expectedDataManifests, null); | ||
|
||
Assert.assertEquals("actualFiles size should be 1", 2, actualFiles.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: One more fix here for the assert message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate all your help with these PRs @szehon-ho . Looking forward to 1.2.0 release and being able to work more with metadata tables :)
Merged, thanks @syun64 . We can track any other discussion in follow up , if any |
fixes #6978
Will backport the solution to 3.2 and 3.1 if this is approved