-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid recomputing partition dynamic filters unnecessarily #11048
Avoid recomputing partition dynamic filters unnecessarily #11048
Conversation
8ed5bc1
to
98b76e8
Compare
private final List<HiveColumnHandle> partitionColumns; | ||
|
||
@Nullable | ||
private volatile Boolean finalResult; // value is null until the dynamic filter no longer needs to be evaluated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use AtomicReference<Boolean>
instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am fine with current state
plugin/trino-hive/src/main/java/io/trino/plugin/hive/util/PartitionMatchSupplier.java
Outdated
Show resolved
Hide resolved
} | ||
if (dynamicFilter.isComplete()) { | ||
// Evaluate the dynamic filter once and store the result as a constant | ||
return new ConstantBooleanSupplier(partitionMatches(partitionColumns, dynamicFilter.getCurrentPredicate(), hivePartition)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matches = partitionMatches(partitionColumns, dynamicFilter.getCurrentPredicate(), hivePartition);
return () -> matches;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I avoided using lambdas for this and the case where no partition columns are referenced by the dynamic filter so that the returned BooleanSupplier
interface will be at most, bi-morphic at usage sites instead of accidentally megamorphic as a result returning instances corresponding to two different lambdas in addition to PartitionMatchSupplier
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth code-commenting, eg adorn ConstantBooleanSupplier with a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW wouldn't it be a pretty low hanging fruit for the JVM to optimize () -> const
lambdas?
Maybe the JVM already creates such a class under the covers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the generated class at the definition site must specifically implement whatever interface is required (in this case, BooleanSupplier
but in other places, it could be Supplier<Boolean>
or Callable<Boolean>
, etc) you couldn't produce a single global () -> const
lambda, but you could presumably produce one per target type. However, you also need to dedupe on the constant value itself (ie: () -> true
and () -> false
) which might be reasonable and beneficial for BooleanSupplier
with boolean constants, but not for most other constant returning lambda types in general. I doubt that the JVM does this, for that reason- but I could be wrong.
That said, even if there were such an optimization in hotspot, we still have three potential implementations that we might produce:
() -> const
(potentially 2x, for() -> true
and() -> false
)() -> variable
PartitionMatchSupplier
Since hotspot will only devirtualize method dispatch at usage sites with 2 or fewer target methods, we don't really have a choice but to avoid () -> const
in favor of the more general () -> variable
form, and declaring the class manually makes that choice more explicit to the intent.
plugin/trino-hive/src/main/java/io/trino/plugin/hive/util/PartitionMatchSupplier.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/util/PartitionMatchSupplier.java
Outdated
Show resolved
Hide resolved
private final List<HiveColumnHandle> partitionColumns; | ||
|
||
@Nullable | ||
private volatile Boolean finalResult; // value is null until the dynamic filter no longer needs to be evaluated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am fine with current state
cc @sopel39 |
98b76e8
to
0db28dc
Compare
0db28dc
to
3829e85
Compare
Too low level for RNs imo. |
Description
Avoids re-evaluating dynamic filter for partition pruning on each split in a partition once the dynamic filter can no longer be narrowed or after the partition has been filtered.
General information
Minor improvement to split production latency when dynamic filters are present
This change only affects the hive connector
No such description should be necessary to a non-technical end user.
Related issues, pull requests, and links
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: