Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta-lake CheckpointEntryIterator is slow when reading large checkpoint files #17405

Closed
jkylling opened this issue May 9, 2023 · 0 comments · Fixed by #17408
Closed

Delta-lake CheckpointEntryIterator is slow when reading large checkpoint files #17405

jkylling opened this issue May 9, 2023 · 0 comments · Fixed by #17408
Assignees
Labels
delta-lake Delta Lake connector performance

Comments

@jkylling
Copy link
Contributor

jkylling commented May 9, 2023

When reading checkpoints in delta logs with a large amount of remove entries, the CheckpointEntryIterator in the trino-delta-lake plugin uses a lot of time to scan through mostly null entries.

Remove entries are retained in delta log checkpoints as tombstones to aid the VACUUM operation. For delta tables with streaming writes the remove entries can make up the bulk of large checkpoint files (> 300 MB).

The current CheckpointEntryIterator implementation adds a filter on the entry type column being not null. Unfortunately, this filter is unused by the Parquet reader, as the entry type column is a complex type, for which the Parquet reader is unable to use column statistics. By using predicate push down on nested fields, we could instead filter on a specific nested field of the entry type for which the Parquet reader is able to utilize statistics. For instance, for the add entry type, we could filter on the add.path column being not null.

On a test checkpoint with 84 parts and a total size of 300 MB, filtering on add.path reduces the time to collect the active file set from ~40 seconds to ~1 seconds (running locally, with the checkpoints on disk).

Created from this slack conversation.

Suggested implementation

Replace the TupleDomain.withColumnDomains(ImmutableMap.of(getOnlyElement(columns), Domain.notNull(getOnlyElement(columns).getType())));
with TupleDomain.withColumnDomains(ImmutableMap.of(column, Domain.notNull(column.getType())));, where

column = new DeltaLakeColumnHandle(
                column.getBaseColumnName(),
                column.getBaseType(),
                column.getBaseFieldId(),
                column.getBasePhysicalColumnName(),
                column.getBasePhysicalType(),
                REGULAR,
                Optional.of(new DeltaLakeColumnProjectionInfo(type,
                        ImmutableList.of(-1),
                        ImmutableList.of(field)))
        ).toHiveColumnHandle();

We use the fields below:

entry type field with not null filter
txn version
add path
remove path
metadata id
protocol minReaderVersion
commit version

I'd be happy to contribute a PR, but may need some guidance on how to properly test this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
delta-lake Delta Lake connector performance
Development

Successfully merging a pull request may close this issue.

1 participant