Spark-3.3: Bug Fix for Reading Metadata tables with Snapshot ID #6980

sungwy · 2023-03-01T23:01:42Z

Will backport the solution to 3.2 and 3.1 if this is approved

szehon-ho

Thanks for finding the issue. I think I've seen this before but didn't dig deep and didn't realize its related to schema evolution.

I think its root caused by : #1508. and maybe it's been around for different versions.

I do think we should implement that pr's feature for metadata tables (time travel using the right schema). It will matter for tables like files_table, where readable_metrics columncould be different based on different schemas.

That being said, I don't oppose fixing the bug now, but will like if we can raise an issue to track it.

Added my comments in the code.

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java

...3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java

spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/SimpleExtraColumnRecord.java

szehon-ho · 2023-03-02T01:03:26Z

Also FYI @aokolnychyi , @RussellSpitzer

szehon-ho

This actually looks good to me, one more comment on the test. Thanks for the changes!

...3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java

sungwy · 2023-03-02T22:35:16Z

@szehon-ho I think that should cover all of the requested changes.. Please let me know if this is good to approve! Since this is a pretty important bug fix, I'm hoping we could slot it in for the next release...
I can create ones for Spark-3.2 and Spark-3.1 once you sign off on this PR

szehon-ho · 2023-03-02T22:45:51Z

Makes sense, I added it to Iceberg 1.2 milestone. I made a follow up issue to implement this feature for some tables that will be affected , like files table. #6991.

As other reviewers also commented, so will leave a chance for them to take another look.

szehon-ho · 2023-03-03T18:04:27Z

...3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java

+    List<Record> expectedFiles =
+        expectedEntries(table, FileContent.DATA, entriesTableSchema, expectedDataManifests, null);
+
+    Assert.assertEquals("actualFiles size should be 1", 2, actualFiles.size());


Minor: One more fix here for the assert message

Appreciate all your help with these PRs @szehon-ho . Looking forward to 1.2.0 release and being able to work more with metadata tables :)

szehon-ho · 2023-03-06T01:00:37Z

Merged, thanks @syun64 . We can track any other discussion in follow up , if any

…on (apache#6980)

bug fix for reading schema of metadata tables with snapshot id

baca1b5

github-actions bot added the spark label Mar 1, 2023

sungwy added 2 commits March 1, 2023 18:09

bug fix for reading schema of metadata tables with snapshot id

1a2d6b3

bug fix for reading schema of metadata tables with snapshot id

016220d

szehon-ho reviewed Mar 2, 2023

View reviewed changes

spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/SimpleExtraColumnRecord.java Outdated Show resolved Hide resolved

bug fix for reading schema of metadata tables with snapshot id

7ffcfdb

nastra self-requested a review March 2, 2023 06:35

sungwy mentioned this pull request Mar 2, 2023

Reading as of Snapshot ID fails on Metadata Tables after Iceberg Table Schema Update #6978

Closed

sungwy changed the title ~~Spark-3.3: Bug Fix for Reading Metadata tables with Snapshot~~ Spark-3.3: Bug Fix for Reading Metadata tables with Snapshot ID Mar 2, 2023

szehon-ho reviewed Mar 2, 2023

View reviewed changes

...3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java Show resolved Hide resolved

szehon-ho reviewed Mar 2, 2023

View reviewed changes

...3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java Show resolved Hide resolved

bug fix for reading schema of metadata tables with snapshot id

3175c26

szehon-ho reviewed Mar 2, 2023

View reviewed changes

...3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMetadataTables.java Show resolved Hide resolved

bug fix for reading schema of metadata tables with snapshot id

60f1859

szehon-ho approved these changes Mar 2, 2023

View reviewed changes

szehon-ho added this to the Iceberg 1.2.0 milestone Mar 2, 2023

szehon-ho mentioned this pull request Mar 2, 2023

Implement Time Travel with correct schema for Metadata Tables #6991

Closed

This was referenced Mar 3, 2023

Spark-3.1: Bug Fix for Reading Metadata tables with Snapshot ID #6993

Merged

Spark-3.2: Bug Fix for Reading Metadata tables with Snapshot ID #6994

Merged

szehon-ho reviewed Mar 3, 2023

View reviewed changes

log message

ed36dde

szehon-ho merged commit 12bcffb into apache:master Mar 6, 2023

krvikash pushed a commit to krvikash/iceberg that referenced this pull request Mar 16, 2023

Spark 3.3: Bug Fix for Metadata table Time Travel with Schema Evoluti…

3aed199

…on (apache#6980)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark-3.3: Bug Fix for Reading Metadata tables with Snapshot ID #6980

Spark-3.3: Bug Fix for Reading Metadata tables with Snapshot ID #6980

sungwy commented Mar 1, 2023

szehon-ho left a comment •

edited

Loading

szehon-ho commented Mar 2, 2023

szehon-ho left a comment

sungwy commented Mar 2, 2023

szehon-ho commented Mar 2, 2023 •

edited

Loading

szehon-ho Mar 3, 2023

sungwy Mar 3, 2023

szehon-ho commented Mar 6, 2023

Spark-3.3: Bug Fix for Reading Metadata tables with Snapshot ID #6980

Spark-3.3: Bug Fix for Reading Metadata tables with Snapshot ID #6980

Conversation

sungwy commented Mar 1, 2023

szehon-ho left a comment • edited Loading

Choose a reason for hiding this comment

szehon-ho commented Mar 2, 2023

szehon-ho left a comment

Choose a reason for hiding this comment

sungwy commented Mar 2, 2023

szehon-ho commented Mar 2, 2023 • edited Loading

szehon-ho Mar 3, 2023

Choose a reason for hiding this comment

sungwy Mar 3, 2023

Choose a reason for hiding this comment

szehon-ho commented Mar 6, 2023

szehon-ho left a comment •

edited

Loading

szehon-ho commented Mar 2, 2023 •

edited

Loading