Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Fix drop partition field and schema field error #11387

Closed
wants to merge 5 commits into from

Conversation

bknbkn
Copy link
Contributor

@bknbkn bknbkn commented Oct 24, 2024

fix #11314 and #10234 and #10487

In the previous code, each spec reads the latest schema. After deleting the corresponding field in schema, the historical spec cannot find the corresponding field in the current schema, and an error will occur.

If the user accidentally triggers this error, the entire table will be unreadable and no effective rollback method.

This patch persist the schema id for each spec. Spec can read the corresponding schema by schema id. So they can always find the corresponding field, and when schema updated, the current spec will apply latest schema. The rest of the spec's schema remains unchanged

But even if the partition field is deleted in the V1 table, it will not be deleted in the spec. Instead, its transform value is converted to void. Therefore, the latest spec is still not compatible with the latest schema which dropped the corresponding field.

So I prefer to forbid V1 table to drop columns which have been used as partition fields (for this purpose, I updated the PartitionSpec#checkCompatibility method so that it can detect the compatibility of void transform)

For V2 tables, it is safe to delete the corresponding column after deleting the partition field

@github-actions github-actions bot added the Specification Issues that may introduce spec changes. label Oct 24, 2024
@bknbkn
Copy link
Contributor Author

bknbkn commented Oct 24, 2024

cc @Fokko @nastra @amogh-jahagirdar Do you have time to help review it? Thank you very much.

Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Nov 24, 2024
Copy link

github-actions bot commented Dec 1, 2024

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Dec 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API core spark Specification Issues that may introduce spec changes. stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't select table If drop the corresponding column after replacing or dropping partition spec field
1 participant