-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dynamic filter to prune Iceberg splits based on partition values #9193
Use dynamic filter to prune Iceberg splits based on partition values #9193
Conversation
session.getTimeZoneKey(), | ||
maxSplitsPerSecond, | ||
maxOutstandingSplits, | ||
new BoundedExecutor(executorService, splitLoaderConcurrency), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this needs to be configurable, might just want to hardcode the thread count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe iceberg is using multiple threads internally, but isn't the split loading is still single threaded per split source on Trino side ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we hard-code splitLoaderConcurrency to 1 given that we're not parallelising split loading in IcebergSplitSource ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That, or we could just pass the ExecutorService straight through without wrapping in BoundedExecutor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not completely single threaded because the AsyncQueue uses the executor to do some of its operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't setting splitLoaderConcurrency
to 1 result in a deadlock, as we would use one thread for:
addExceptionCallback(ResumableTasks.submit(executor, loaderTask), this::onLoaderFailure);
And then there would be no threads left for splitQueue
handling:
new ThrottledAsyncQueue<>(maxSplitsPerSecond, maxOutstandingSplits, executor);
2ba2e75
to
67934a5
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
.../trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergDynamicPartitionPruning.java
Outdated
Show resolved
Hide resolved
session.getTimeZoneKey(), | ||
maxSplitsPerSecond, | ||
maxOutstandingSplits, | ||
new BoundedExecutor(executorService, splitLoaderConcurrency), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe iceberg is using multiple threads internally, but isn't the split loading is still single threaded per split source on Trino side ?
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitManager.java
Outdated
Show resolved
Hide resolved
return TaskStatus.finished(); | ||
} | ||
|
||
public boolean partitionPassesDynamicFilter(Map<Integer, String> partitionKeys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have common code for this and TableStatisticsMaker#dataFileMatches
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also look at file level Domain for the columns for which we have dynamic filter and prune splits based on that ?
We don't necessarily have to restrict ourselves to partitioned columns here.
497985f
to
3d85856
Compare
Comments address, thanks @raunaqmorarka |
3d85856
to
e34a1f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please share tpch/tpcds benchmark results after applying current round of comments
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
private void checkLoaderException() | ||
{ | ||
if (loaderException != null) { | ||
if (loaderException instanceof TrinoException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For IOException maybe we should throw TrinoException(ICEBERG_FILESYSTEM_ERROR, e)
Though I don't know what kind of exceptions can be produced here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. The none of the Iceberg TableScan methods throw a checked IOException, but it seems reasonable that they could.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergSplitLoader.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitLoader.java
Outdated
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestDynamicPartitionPruning.java
Outdated
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestDynamicPartitionPruning.java
Outdated
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestDynamicPartitionPruning.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(skim)
@Provides | ||
public ExecutorService createIcebergExecutor(CatalogName catalogName) | ||
{ | ||
return newCachedThreadPool(daemonThreadsNamed("iceberg-" + catalogName + "-%s")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should cleanly forbid catalog names with %s
(or sanitize catalogName in places like this)
preexisting (for hive), no change requested here
} | ||
catch (IOException e) { | ||
throw new UncheckedIOException(e); | ||
checkLoaderException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to try fail close?
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The split queuing adds significant complexity. Why do we need this feature? It was added to Hive because someone wanted to reduce load on HDFS, but I haven’t heard anyone ask about this for Iceberg.
e34a1f8
to
3823f26
Compare
It's not only about load, but also performance. DF brings huge perf improvements. Therefore if Iceberg is much slower than Hive connector, it won't be a feasible alternative. |
@alexjo2144 would it be possible to get macro benchmarks result? |
Dynamic filtering is obviously something we want. But I don't see why it is tied with split throttling. If we want to block, we can simply return the DF future from ConnectorSplitBatch EMPTY_BATCH = new ConnectorSplitBatch(ImmutableList.of(), false);
long startTime; // time of first getNextBatch() call
long elapsed = NANOS.toMillis(System.nanoTime() - startTime);
long remainingWait = waitTime.toMillis() - elapsed;
if ((remainingWait > 0) && dynamicFilter.isBlocked()) {
return dynamicFilter.isBlocked()
.thenApply(ignored -> EMPTY_BATCH)
.completeOnTimeout(EMPTY_BATCH, remainingWait, MILLIS);
} |
I can take the throttling part out. It only got included because I was trying to mimic some of the Hive pattern and used |
@electrum In hive connector, although we've implemented the blocking for DF, we keep it turned off by default to avoid any regressions for small queries. If we keep it turned off by default here as well, then would that just result in all the splits getting generated and scheduled up front ? If yes, then we would have to set some non-zero default for the DF blocking timeout and that can potentially slow down some short running queries or queries which waited on DF but the DF wasn't useful. |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
.map(IcebergSplit.class::cast) | ||
.forEach(splits::add); | ||
} | ||
assertThat(splits.build().size()).isGreaterThan(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can assert that you actually waited ~2s
here.
Btw can you make the test without waiting? Maybe using artificial Ticker
instead wall time.
You may also verify that future returned from splitSource.getNextBatch
gets unblocked when you complete future returned from DynamicFilter.isBlocked
Ignore if too painful
d2c9e08
to
103a99d
Compare
103a99d
to
fcc241c
Compare
All set, thanks for the comments @losipiuk |
CI #9620 (reopened) |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
column.getType(), | ||
fromByteBuffer(type, lowerBounds.get(fieldId)), | ||
fromByteBuffer(type, upperBounds.get(fieldId)), | ||
mayContainNulls); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can realize all-null column when scanTask.file().valueCounts()
matches scanTask.file().nullValueCounts()
at given key. then we could create Domain.onlyNull
for such case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's maybe do this in follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya, we do that in TupleDomainParquetPredicate as well
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
fcc241c
to
0924252
Compare
Discussed offline, but I removed the Executor and any async handling, it does not seem to be useful. |
CI Flake: #7224 |
Changes LGTM |
For #4115