Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Iceberg deletes when an entire file can be removed #14198

Merged
merged 1 commit into from
Sep 21, 2022

Conversation

alexjo2144
Copy link
Member

@alexjo2144 alexjo2144 commented Sep 19, 2022

Description

This reverts commit d435549.

There were some minor conflicts due to the change to TrinoFileSystem

Replace the feature where if a whole file is deleted by a single operation, remove it from the Iceberg manifest rather than writing position deletes.

Non-technical explanation

Improve performance of large delete operations.

Related PRs

Original changes: #12197
Revert: #13485
Iceberg core issue: apache/iceberg#5442
Iceberg dep. version bump: #14074

Release notes

( ) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Sep 19, 2022
@alexjo2144 alexjo2144 requested a review from findepi September 19, 2022 18:41
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we perhaps have a $files-based test that shows that this actually works as expected?

@alexjo2144 alexjo2144 force-pushed the iceberg/whole-file-deletion branch from 7cb32ba to 560f01e Compare September 20, 2022 20:37
@alexjo2144
Copy link
Member Author

@findepi just pushed a fixup to one of the analyze tests. Mind taking a look at the changed expected values? I guess the stats change because in the update the old data file is removed from the manifest instead of having row level deletes tagged on to it.

Can we perhaps have a $files-based test that shows that this actually works as expected?

This is essentially what the tests in TestIcebergV2 are doing, they just hit the Iceberg API directly instead of querying $files. But I can add them if you want.

@findepi
Copy link
Member

findepi commented Sep 20, 2022

Can we perhaps have a $files-based test that shows that this actually works as expected?

This is essentially what the tests in TestIcebergV2 are doing, they just hit the Iceberg API directly instead of querying $files. But I can add them if you want.

👍

@findepi findepi force-pushed the iceberg/whole-file-deletion branch from 560f01e to 482013d Compare September 20, 2022 21:33
@findepi
Copy link
Member

findepi commented Sep 20, 2022

squashed

@findepi
Copy link
Member

findepi commented Sep 20, 2022

@alexjo2144 do you remember what RNs we had for this?

@alexjo2144
Copy link
Member Author

alexjo2144 commented Sep 21, 2022

https://github.com/trinodb/trino/pull/12440/files#diff-df9cc346e1952819e76311f4991919bf7e2d96f0b548e9d85d3e9771afb6a4f3R38

## Iceberg connector
* Improve query performance of reads after `DELETE` removes all rows from a file. ({issue}`12197`)

@findepi findepi merged commit 581842e into trinodb:master Sep 21, 2022
@findepi findepi mentioned this pull request Sep 21, 2022
@github-actions github-actions bot added this to the 397 milestone Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants