Improve Iceberg deletes when an entire file can be removed #12197

alexjo2144 · 2022-04-29T23:02:46Z

Description

If a delete would remove all rows from an individual file,
remove the whole file, rather than writing a position delete.

This does not include situations where a whole file is deleted
across multiple row-level passes. All rows must be deleted by
one delete operation.

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Iceberg connector

How would you describe this change to a non-technical end user or system administrator?

Improves performance when deleting all rows in a file.

Related issues, pull requests, and links

Fixes Improve Iceberg Deletes/Update when an entire file is changed #12057

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Iceberg
* Avoid writing positional deletes when all rows in a file have been deleted. ({issue}`12057`)

alexjo2144 · 2022-05-02T19:23:33Z

Hmm so that doesn't work, but we have a few options:

Change the beginDelete SPI method to include the Constraint being deleted
This would let us filter out files that are fully delete in beginDelete and skip reading them
Change Split generation for DELETE operations against Iceberg to only generate one Split per file, and disable any RowGroup file level row filtering
This would mean still reading the entire file, but we could recognize that a file was fully deleted in finishDelete. That has the benefit of working with complex predicates, and factoring in previous deletes against the file

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

alexjo2144 · 2022-05-11T21:02:31Z

@findinpath @findepi I pushed a new approach here, accumulating the number of deleted rows during the writing of Position Delete files, and comparing that to the file's record count. PTAL

Piotr, we had talked offline about using a mechanism similar to IcebergSplitSource#getTableExecuteSplitsInfo but that would have required SPI changes just to pass the record count around. I can try it if you'd like, but for now the record count is just passed through the Splits and Fragments to finishDelete. Let me know if you'd rather make the SPI changes though.

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

findepi · 2022-05-19T14:18:16Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

+                }
+            }
+            catch (IOException e) {
+                log.warn(e, "Failed to clean up uncommitted position delete files");


why not propagate here?

probably because we create delete files for this few lines below?
Tough I'd expect propagation too.

Failing to delete a file that is not going to be committed didn't seem like enough of a problem to warrant failing the query.

If we have limited fs permissions and can't delete files, for example, we would still be able to write deletes.

This would eventually get picked up by a remove_orphan_files collection

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java

...rino-iceberg/src/main/java/io/trino/plugin/iceberg/delete/IcebergPositionDeletePageSink.java

...o-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergNodeLocalDynamicSplitPruning.java

findepi · 2022-05-19T14:23:35Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

+        assertThat(query("SELECT * FROM " + tableName)).returnsEmptyResult();
+        assertThat(this.loadTable(tableName).newScan().planFiles()).hasSize(1);
+    }
+


@findinpath @homar please review this class

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

homar · 2022-05-20T12:01:53Z

Correct me if I'm wrong but here we do create position delete file pretty much the standard way but then instead of commiting it we delete entire data file if delete file contains all the rows from it(based on the row count) right ?
If so, this doesn't have a positive impact on performance of deletetion - more like negative tough not big, but have positive impact on future reads from this table - right ?

I know preparing benchmark env is in progress but do we plan to benchmark it without this change? I am just curious how big of an impact it has.

alexjo2144 · 2022-05-20T16:51:45Z

@homar right, the delete itself is not any faster but read time is improved. Doing it this way means that the next read does not need to do any I/O for the deleted file vs the old way which would read the entire data-file AND the entire position-delete-file.

I don't have any benchmarks for this but I would be very surprised if it wasn't an improvement.

alexjo2144 · 2022-05-20T16:52:47Z

AC Thanks for the reviews

homar · 2022-05-22T09:08:40Z

@homar right, the delete itself is not any faster but read time is improved. Doing it this way means that the next read does not need to do any I/O for the deleted file vs the old way which would read the entire data-file AND the entire position-delete-file.

I don't have any benchmarks for this but I would be very surprised if it wasn't an improvement.

I would also be very surprised. I am just wondering how much of an impact it could have :)

findepi · 2022-05-23T12:28:02Z

@alexjo2144 please rebase, there is a conflict. I will re-review after that

alexjo2144 · 2022-05-23T14:27:02Z

Rebased, thanks

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/CommitTaskData.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

findepi · 2022-05-24T08:16:17Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

+        assertUpdate(
+                Session.builder(getSession()).setCatalogSessionProperty("iceberg", "orc_writer_max_stripe_rows", "5").build(),
+                "CREATE TABLE " + tableName + " WITH (format = 'ORC') AS SELECT * FROM tpch.tiny.nation", 25);
+        this.loadTable(tableName).updateProperties().set(SPLIT_SIZE, "100").commit();


what for?

is 100 ok numher? we have 25 rows only

It's 100 bytes, not rows. I'll add a comment but this ensures each ORC stripe gets a Split by itself

private long getQuerySplits(QueryId queryId) { QueryStats stats = getDistributedQueryRunner().getCoordinator().getQueryManager().getFullQueryInfo(queryId).getQueryStats(); long numberOfSplits = stats.getOperatorSummaries() .stream() .filter(summary -> summary.getOperatorType().equals("ScanFilterAndProjectOperator")) .mapToLong(OperatorStats::getTotalDrivers) .sum(); return numberOfSplits; }

ResultWithQueryId<MaterializedResult> deletionResult = getDistributedQueryRunner().executeWithQueryId(getSession(), "DELETE FROM " + tableName + " WHERE regionkey < 10"); long deletionSplits = getQuerySplits(deletionResult.getQueryId());

I was hoping to see in the query stats that there are multiple splits for the file, but this wasn't the case.
I checked via debug and indeed there are actually ~ 62 splits of maximum 100B.

Any idea how we could retrieve the number of splits in the test case?

There are some Delta tests that check the number of splits, but they are a bit finicky / often flaky so I didn't include one here.

alexjo2144 · 2022-05-24T16:08:50Z

AC and rebased for conflicts, thanks @findepi

findinpath · 2022-05-24T20:27:52Z

Impressive work 👍

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/CommitTaskData.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplit.java

findinpath · 2022-05-24T19:35:31Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

-        if (!table.getEnforcedPredicate().isAll()) {
-            rowDelta.conflictDetectionFilter(toIcebergExpression(table.getEnforcedPredicate()));
-        }
+        Map<String, List<CommitTaskData>> deletesByFilePath = commitTasks.stream()


finishWrite method is now ~ 150 lines long.
Please consider a refactoring of the method to smaller building blocks in order to ensure a good readability in the weeks/months to come.

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

findinpath · 2022-05-24T20:26:41Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

+        assertUpdate(
+                Session.builder(getSession()).setCatalogSessionProperty("iceberg", "orc_writer_max_stripe_rows", "5").build(),
+                "CREATE TABLE " + tableName + " WITH (format = 'ORC') AS SELECT * FROM tpch.tiny.nation", 25);
+        this.loadTable(tableName).updateProperties().set(SPLIT_SIZE, "100").commit();


private long getQuerySplits(QueryId queryId) { QueryStats stats = getDistributedQueryRunner().getCoordinator().getQueryManager().getFullQueryInfo(queryId).getQueryStats(); long numberOfSplits = stats.getOperatorSummaries() .stream() .filter(summary -> summary.getOperatorType().equals("ScanFilterAndProjectOperator")) .mapToLong(OperatorStats::getTotalDrivers) .sum(); return numberOfSplits; }

ResultWithQueryId<MaterializedResult> deletionResult = getDistributedQueryRunner().executeWithQueryId(getSession(), "DELETE FROM " + tableName + " WHERE regionkey < 10"); long deletionSplits = getQuerySplits(deletionResult.getQueryId());

I was hoping to see in the query stats that there are multiple splits for the file, but this wasn't the case.
I checked via debug and indeed there are actually ~ 62 splits of maximum 100B.

Any idea how we could retrieve the number of splits in the test case?

If a delete would remove all rows from an individual file, remove the whole file, rather than writing a position delete. This does not include situations where a whole file is deleted across multiple row-level passes. All rows must be deleted by one delete operation.

alexjo2144 · 2022-05-25T15:06:12Z

Added some partitioned table tests. Thanks for the suggestion @findinpath

cla-bot bot added the cla-signed label Apr 29, 2022

alexjo2144 force-pushed the iceberg/whole-file-delete branch from 065dbc7 to ef0b7ae Compare May 2, 2022 18:21

alexjo2144 changed the title ~~Improve Iceberg deletion of entire files~~ Improve Iceberg deletes when an entire file can be removed May 2, 2022

alexjo2144 requested review from findepi and findinpath May 2, 2022 18:26

findinpath reviewed May 3, 2022

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java Outdated Show resolved Hide resolved

alexjo2144 force-pushed the iceberg/whole-file-delete branch from ef0b7ae to b7bc6c0 Compare May 11, 2022 20:56

alexjo2144 requested a review from findinpath May 11, 2022 21:02

findepi approved these changes May 19, 2022

View reviewed changes

alexjo2144 force-pushed the iceberg/whole-file-delete branch from b7bc6c0 to 4c9f852 Compare May 20, 2022 16:49

alexjo2144 force-pushed the iceberg/whole-file-delete branch 2 times, most recently from 0891f20 to 3b47b18 Compare May 20, 2022 20:22

alexjo2144 force-pushed the iceberg/whole-file-delete branch from 3b47b18 to 98bdfd4 Compare May 23, 2022 14:26

alexjo2144 force-pushed the iceberg/whole-file-delete branch from 98bdfd4 to 2981709 Compare May 23, 2022 17:58

findepi approved these changes May 24, 2022

View reviewed changes

findepi mentioned this pull request May 24, 2022

Allow executing optimize procedure for Iceberg v2 table #12351

Merged

alexjo2144 force-pushed the iceberg/whole-file-delete branch 2 times, most recently from e767c6c to b0a7e4c Compare May 24, 2022 16:03

alexjo2144 force-pushed the iceberg/whole-file-delete branch from b0a7e4c to 815f2bf Compare May 24, 2022 18:18

findinpath reviewed May 24, 2022

View reviewed changes

alexjo2144 force-pushed the iceberg/whole-file-delete branch from 815f2bf to a8cc9be Compare May 25, 2022 15:05

findepi added the performance label May 25, 2022

findepi merged commit 9ddaa60 into trinodb:master May 25, 2022

findepi mentioned this pull request May 25, 2022

Release notes for 382 #12425

Closed

github-actions bot added this to the 382 milestone May 25, 2022

mosabua mentioned this pull request May 25, 2022

Add Trino 382 release notes #12440

Merged

alexjo2144 deleted the iceberg/whole-file-delete branch May 26, 2022 13:18

alexjo2144 restored the iceberg/whole-file-delete branch September 19, 2022 18:34

alexjo2144 mentioned this pull request Sep 19, 2022

Improve Iceberg deletes when an entire file can be removed #14198

Merged

alexjo2144 deleted the iceberg/whole-file-delete branch September 21, 2022 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Iceberg deletes when an entire file can be removed #12197

Improve Iceberg deletes when an entire file can be removed #12197

alexjo2144 commented Apr 29, 2022 •

edited

Loading

alexjo2144 commented May 2, 2022 •

edited

Loading

alexjo2144 commented May 11, 2022

findepi May 19, 2022

homar May 20, 2022

alexjo2144 May 20, 2022

findepi May 19, 2022

homar commented May 20, 2022 •

edited

Loading

alexjo2144 commented May 20, 2022 •

edited

Loading

alexjo2144 commented May 20, 2022

homar commented May 22, 2022

findepi commented May 23, 2022

alexjo2144 commented May 23, 2022

findepi May 24, 2022

alexjo2144 May 24, 2022

findinpath May 24, 2022

alexjo2144 May 25, 2022

alexjo2144 commented May 24, 2022

findinpath commented May 24, 2022

findinpath May 24, 2022

findepi May 25, 2022

findinpath May 24, 2022

alexjo2144 commented May 25, 2022

Improve Iceberg deletes when an entire file can be removed #12197

Improve Iceberg deletes when an entire file can be removed #12197

Conversation

alexjo2144 commented Apr 29, 2022 • edited Loading

Description

Related issues, pull requests, and links

Documentation

Release notes

alexjo2144 commented May 2, 2022 • edited Loading

alexjo2144 commented May 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

homar commented May 20, 2022 • edited Loading

alexjo2144 commented May 20, 2022 • edited Loading

alexjo2144 commented May 20, 2022

homar commented May 22, 2022

findepi commented May 23, 2022

alexjo2144 commented May 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexjo2144 commented May 24, 2022

findinpath commented May 24, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexjo2144 commented May 25, 2022

alexjo2144 commented Apr 29, 2022 •

edited

Loading

alexjo2144 commented May 2, 2022 •

edited

Loading

homar commented May 20, 2022 •

edited

Loading

alexjo2144 commented May 20, 2022 •

edited

Loading