You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When reading checkpoints in delta logs with a large amount of remove entries, the CheckpointEntryIterator in the trino-delta-lake plugin uses a lot of time to scan through mostly null entries.
The current CheckpointEntryIterator implementation adds a filter on the entry type column being not null. Unfortunately, this filter is unused by the Parquet reader, as the entry type column is a complex type, for which the Parquet reader is unable to use column statistics. By using predicate push down on nested fields, we could instead filter on a specific nested field of the entry type for which the Parquet reader is able to utilize statistics. For instance, for the add entry type, we could filter on the add.path column being not null.
On a test checkpoint with 84 parts and a total size of 300 MB, filtering on add.path reduces the time to collect the active file set from ~40 seconds to ~1 seconds (running locally, with the checkpoints on disk).
Replace the TupleDomain.withColumnDomains(ImmutableMap.of(getOnlyElement(columns), Domain.notNull(getOnlyElement(columns).getType())));
with TupleDomain.withColumnDomains(ImmutableMap.of(column, Domain.notNull(column.getType())));, where
When reading checkpoints in delta logs with a large amount of remove entries, the CheckpointEntryIterator in the trino-delta-lake plugin uses a lot of time to scan through mostly null entries.
Remove entries are retained in delta log checkpoints as tombstones to aid the VACUUM operation. For delta tables with streaming writes the remove entries can make up the bulk of large checkpoint files (> 300 MB).
The current CheckpointEntryIterator implementation adds a filter on the entry type column being not null. Unfortunately, this filter is unused by the Parquet reader, as the entry type column is a complex type, for which the Parquet reader is unable to use column statistics. By using predicate push down on nested fields, we could instead filter on a specific nested field of the entry type for which the Parquet reader is able to utilize statistics. For instance, for the
add
entry type, we could filter on theadd.path
column being not null.On a test checkpoint with 84 parts and a total size of 300 MB, filtering on
add.path
reduces the time to collect the active file set from ~40 seconds to ~1 seconds (running locally, with the checkpoints on disk).Created from this slack conversation.
Suggested implementation
Replace the
TupleDomain.withColumnDomains(ImmutableMap.of(getOnlyElement(columns), Domain.notNull(getOnlyElement(columns).getType())));
with
TupleDomain.withColumnDomains(ImmutableMap.of(column, Domain.notNull(column.getType())));
, whereWe use the fields below:
I'd be happy to contribute a PR, but may need some guidance on how to properly test this.
The text was updated successfully, but these errors were encountered: