Delta-lake CheckpointEntryIterator is slow when reading large checkpoint files #17405

jkylling · 2023-05-09T10:33:05Z

When reading checkpoints in delta logs with a large amount of remove entries, the CheckpointEntryIterator in the trino-delta-lake plugin uses a lot of time to scan through mostly null entries.

Remove entries are retained in delta log checkpoints as tombstones to aid the VACUUM operation. For delta tables with streaming writes the remove entries can make up the bulk of large checkpoint files (> 300 MB).

The current CheckpointEntryIterator implementation adds a filter on the entry type column being not null. Unfortunately, this filter is unused by the Parquet reader, as the entry type column is a complex type, for which the Parquet reader is unable to use column statistics. By using predicate push down on nested fields, we could instead filter on a specific nested field of the entry type for which the Parquet reader is able to utilize statistics. For instance, for the add entry type, we could filter on the add.path column being not null.

On a test checkpoint with 84 parts and a total size of 300 MB, filtering on add.path reduces the time to collect the active file set from ~40 seconds to ~1 seconds (running locally, with the checkpoints on disk).

Created from this slack conversation.

Suggested implementation

Replace the TupleDomain.withColumnDomains(ImmutableMap.of(getOnlyElement(columns), Domain.notNull(getOnlyElement(columns).getType())));
with TupleDomain.withColumnDomains(ImmutableMap.of(column, Domain.notNull(column.getType())));, where

column = new DeltaLakeColumnHandle(
                column.getBaseColumnName(),
                column.getBaseType(),
                column.getBaseFieldId(),
                column.getBasePhysicalColumnName(),
                column.getBasePhysicalType(),
                REGULAR,
                Optional.of(new DeltaLakeColumnProjectionInfo(type,
                        ImmutableList.of(-1),
                        ImmutableList.of(field)))
        ).toHiveColumnHandle();

We use the fields below:

entry type	field with not null filter
txn	version
add	path
remove	path
metadata	id
protocol	minReaderVersion
commit	version

I'd be happy to contribute a PR, but may need some guidance on how to properly test this.

The text was updated successfully, but these errors were encountered:

jkylling added performance delta-lake Delta Lake connector labels May 9, 2023

jkylling mentioned this issue May 9, 2023

Use predicates which are pushed down to Parquet reader in CheckpointEntryIterator #17408

Merged

findinpath assigned jkylling May 10, 2023

raunaqmorarka closed this as completed in #17408 Jul 9, 2023

raunaqmorarka mentioned this issue Jul 9, 2023

Release notes for 422 #18161

Closed

jkylling mentioned this issue Oct 19, 2023

Support OR-ed condition in Delta checkpoint iterator #19439

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delta-lake CheckpointEntryIterator is slow when reading large checkpoint files #17405

Delta-lake CheckpointEntryIterator is slow when reading large checkpoint files #17405

jkylling commented May 9, 2023 •

edited

Loading

Delta-lake CheckpointEntryIterator is slow when reading large checkpoint files #17405

Delta-lake CheckpointEntryIterator is slow when reading large checkpoint files #17405

Comments

jkylling commented May 9, 2023 • edited Loading

Suggested implementation

jkylling commented May 9, 2023 •

edited

Loading