Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Iceberg deletes when an entire file can be removed #12197

Merged
merged 1 commit into from
May 25, 2022

Conversation

alexjo2144
Copy link
Member

@alexjo2144 alexjo2144 commented Apr 29, 2022

Description

If a delete would remove all rows from an individual file,
remove the whole file, rather than writing a position delete.

This does not include situations where a whole file is deleted
across multiple row-level passes. All rows must be deleted by
one delete operation.

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Iceberg connector

How would you describe this change to a non-technical end user or system administrator?

Improves performance when deleting all rows in a file.

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Iceberg
* Avoid writing positional deletes when all rows in a file have been deleted. ({issue}`12057`)

@cla-bot cla-bot bot added the cla-signed label Apr 29, 2022
@alexjo2144 alexjo2144 force-pushed the iceberg/whole-file-delete branch from 065dbc7 to ef0b7ae Compare May 2, 2022 18:21
@alexjo2144 alexjo2144 changed the title Improve Iceberg deletion of entire files Improve Iceberg deletes when an entire file can be removed May 2, 2022
@alexjo2144 alexjo2144 requested review from findepi and findinpath May 2, 2022 18:26
@alexjo2144
Copy link
Member Author

alexjo2144 commented May 2, 2022

Hmm so that doesn't work, but we have a few options:

  1. Change the beginDelete SPI method to include the Constraint being deleted
    This would let us filter out files that are fully delete in beginDelete and skip reading them

  2. Change Split generation for DELETE operations against Iceberg to only generate one Split per file, and disable any RowGroup file level row filtering
    This would mean still reading the entire file, but we could recognize that a file was fully deleted in finishDelete. That has the benefit of working with complex predicates, and factoring in previous deletes against the file

@alexjo2144 alexjo2144 force-pushed the iceberg/whole-file-delete branch from ef0b7ae to b7bc6c0 Compare May 11, 2022 20:56
@alexjo2144
Copy link
Member Author

@findinpath @findepi I pushed a new approach here, accumulating the number of deleted rows during the writing of Position Delete files, and comparing that to the file's record count. PTAL

Piotr, we had talked offline about using a mechanism similar to IcebergSplitSource#getTableExecuteSplitsInfo but that would have required SPI changes just to pass the record count around. I can try it if you'd like, but for now the record count is just passed through the Splits and Fragments to finishDelete. Let me know if you'd rather make the SPI changes though.

@alexjo2144 alexjo2144 requested a review from findinpath May 11, 2022 21:02
}
}
catch (IOException e) {
log.warn(e, "Failed to clean up uncommitted position delete files");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not propagate here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably because we create delete files for this few lines below?
Tough I'd expect propagation too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing to delete a file that is not going to be committed didn't seem like enough of a problem to warrant failing the query.

If we have limited fs permissions and can't delete files, for example, we would still be able to write deletes.

This would eventually get picked up by a remove_orphan_files collection

assertThat(query("SELECT * FROM " + tableName)).returnsEmptyResult();
assertThat(this.loadTable(tableName).newScan().planFiles()).hasSize(1);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findinpath @homar please review this class

@homar
Copy link
Member

homar commented May 20, 2022

Correct me if I'm wrong but here we do create position delete file pretty much the standard way but then instead of commiting it we delete entire data file if delete file contains all the rows from it(based on the row count) right ?
If so, this doesn't have a positive impact on performance of deletetion - more like negative tough not big, but have positive impact on future reads from this table - right ?

I know preparing benchmark env is in progress but do we plan to benchmark it without this change? I am just curious how big of an impact it has.

@alexjo2144 alexjo2144 force-pushed the iceberg/whole-file-delete branch from b7bc6c0 to 4c9f852 Compare May 20, 2022 16:49
@alexjo2144
Copy link
Member Author

alexjo2144 commented May 20, 2022

@homar right, the delete itself is not any faster but read time is improved. Doing it this way means that the next read does not need to do any I/O for the deleted file vs the old way which would read the entire data-file AND the entire position-delete-file.

I don't have any benchmarks for this but I would be very surprised if it wasn't an improvement.

@alexjo2144
Copy link
Member Author

AC Thanks for the reviews

@alexjo2144 alexjo2144 force-pushed the iceberg/whole-file-delete branch 2 times, most recently from 0891f20 to 3b47b18 Compare May 20, 2022 20:22
@homar
Copy link
Member

homar commented May 22, 2022

@homar right, the delete itself is not any faster but read time is improved. Doing it this way means that the next read does not need to do any I/O for the deleted file vs the old way which would read the entire data-file AND the entire position-delete-file.

I don't have any benchmarks for this but I would be very surprised if it wasn't an improvement.

I would also be very surprised. I am just wondering how much of an impact it could have :)

@findepi
Copy link
Member

findepi commented May 23, 2022

@alexjo2144 please rebase, there is a conflict. I will re-review after that

@alexjo2144 alexjo2144 force-pushed the iceberg/whole-file-delete branch from 3b47b18 to 98bdfd4 Compare May 23, 2022 14:26
@alexjo2144
Copy link
Member Author

Rebased, thanks

@alexjo2144 alexjo2144 force-pushed the iceberg/whole-file-delete branch from 98bdfd4 to 2981709 Compare May 23, 2022 17:58
assertUpdate(
Session.builder(getSession()).setCatalogSessionProperty("iceberg", "orc_writer_max_stripe_rows", "5").build(),
"CREATE TABLE " + tableName + " WITH (format = 'ORC') AS SELECT * FROM tpch.tiny.nation", 25);
this.loadTable(tableName).updateProperties().set(SPLIT_SIZE, "100").commit();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what for?

is 100 ok numher? we have 25 rows only

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 100 bytes, not rows. I'll add a comment but this ensures each ORC stripe gets a Split by itself

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private long getQuerySplits(QueryId queryId)
    {
        QueryStats stats = getDistributedQueryRunner().getCoordinator().getQueryManager().getFullQueryInfo(queryId).getQueryStats();
        long numberOfSplits = stats.getOperatorSummaries()
                .stream()
                .filter(summary -> summary.getOperatorType().equals("ScanFilterAndProjectOperator"))
                .mapToLong(OperatorStats::getTotalDrivers)
                .sum();
        return numberOfSplits;
    }
 ResultWithQueryId<MaterializedResult> deletionResult = getDistributedQueryRunner().executeWithQueryId(getSession(), "DELETE FROM " + tableName + " WHERE regionkey < 10");
        long deletionSplits = getQuerySplits(deletionResult.getQueryId());

I was hoping to see in the query stats that there are multiple splits for the file, but this wasn't the case.
I checked via debug and indeed there are actually ~ 62 splits of maximum 100B.

Any idea how we could retrieve the number of splits in the test case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some Delta tests that check the number of splits, but they are a bit finicky / often flaky so I didn't include one here.

@alexjo2144 alexjo2144 force-pushed the iceberg/whole-file-delete branch 2 times, most recently from e767c6c to b0a7e4c Compare May 24, 2022 16:03
@alexjo2144
Copy link
Member Author

AC and rebased for conflicts, thanks @findepi

@alexjo2144 alexjo2144 force-pushed the iceberg/whole-file-delete branch from b0a7e4c to 815f2bf Compare May 24, 2022 18:18
@findinpath
Copy link
Contributor

Impressive work 👍

if (!table.getEnforcedPredicate().isAll()) {
rowDelta.conflictDetectionFilter(toIcebergExpression(table.getEnforcedPredicate()));
}
Map<String, List<CommitTaskData>> deletesByFilePath = commitTasks.stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finishWrite method is now ~ 150 lines long.
Please consider a refactoring of the method to smaller building blocks in order to ensure a good readability in the weeks/months to come.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up

assertUpdate(
Session.builder(getSession()).setCatalogSessionProperty("iceberg", "orc_writer_max_stripe_rows", "5").build(),
"CREATE TABLE " + tableName + " WITH (format = 'ORC') AS SELECT * FROM tpch.tiny.nation", 25);
this.loadTable(tableName).updateProperties().set(SPLIT_SIZE, "100").commit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private long getQuerySplits(QueryId queryId)
    {
        QueryStats stats = getDistributedQueryRunner().getCoordinator().getQueryManager().getFullQueryInfo(queryId).getQueryStats();
        long numberOfSplits = stats.getOperatorSummaries()
                .stream()
                .filter(summary -> summary.getOperatorType().equals("ScanFilterAndProjectOperator"))
                .mapToLong(OperatorStats::getTotalDrivers)
                .sum();
        return numberOfSplits;
    }
 ResultWithQueryId<MaterializedResult> deletionResult = getDistributedQueryRunner().executeWithQueryId(getSession(), "DELETE FROM " + tableName + " WHERE regionkey < 10");
        long deletionSplits = getQuerySplits(deletionResult.getQueryId());

I was hoping to see in the query stats that there are multiple splits for the file, but this wasn't the case.
I checked via debug and indeed there are actually ~ 62 splits of maximum 100B.

Any idea how we could retrieve the number of splits in the test case?

If a delete would remove all rows from an individual file,
remove the whole file, rather than writing a position delete.

This does not include situations where a whole file is deleted
across multiple row-level passes. All rows must be deleted by
one delete operation.
@alexjo2144 alexjo2144 force-pushed the iceberg/whole-file-delete branch from 815f2bf to a8cc9be Compare May 25, 2022 15:05
@alexjo2144
Copy link
Member Author

Added some partitioned table tests. Thanks for the suggestion @findinpath

@findepi findepi merged commit 9ddaa60 into trinodb:master May 25, 2022
@findepi findepi mentioned this pull request May 25, 2022
@github-actions github-actions bot added this to the 382 milestone May 25, 2022
@alexjo2144 alexjo2144 deleted the iceberg/whole-file-delete branch May 26, 2022 13:18
@alexjo2144 alexjo2144 restored the iceberg/whole-file-delete branch September 19, 2022 18:34
@alexjo2144 alexjo2144 deleted the iceberg/whole-file-delete branch September 21, 2022 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Improve Iceberg Deletes/Update when an entire file is changed
4 participants