Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid recomputing partition dynamic filters unnecessarily #11048

Merged

Conversation

pettyjamesm
Copy link
Member

Description

Avoids re-evaluating dynamic filter for partition pruning on each split in a partition once the dynamic filter can no longer be narrowed or after the partition has been filtered.

General information

Is this change a fix, improvement, new feature, refactoring, or other?

Minor improvement to split production latency when dynamic filters are present

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

This change only affects the hive connector

How would you describe this change to a non-technical end user or system administrator?

No such description should be necessary to a non-technical end user.

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Feb 15, 2022
@pettyjamesm pettyjamesm requested a review from findepi February 15, 2022 17:36
@pettyjamesm pettyjamesm force-pushed the memoize-partition-dynamic-filter-result branch from 8ed5bc1 to 98b76e8 Compare February 15, 2022 20:25
private final List<HiveColumnHandle> partitionColumns;

@Nullable
private volatile Boolean finalResult; // value is null until the dynamic filter no longer needs to be evaluated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use AtomicReference<Boolean> instead ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am fine with current state

}
if (dynamicFilter.isComplete()) {
// Evaluate the dynamic filter once and store the result as a constant
return new ConstantBooleanSupplier(partitionMatches(partitionColumns, dynamicFilter.getCurrentPredicate(), hivePartition));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches = partitionMatches(partitionColumns, dynamicFilter.getCurrentPredicate(), hivePartition);
return () -> matches;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I avoided using lambdas for this and the case where no partition columns are referenced by the dynamic filter so that the returned BooleanSupplier interface will be at most, bi-morphic at usage sites instead of accidentally megamorphic as a result returning instances corresponding to two different lambdas in addition to PartitionMatchSupplier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth code-commenting, eg adorn ConstantBooleanSupplier with a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW wouldn't it be a pretty low hanging fruit for the JVM to optimize () -> const lambdas?
Maybe the JVM already creates such a class under the covers?

cc @skrzypo987 @martint

Copy link
Member Author

@pettyjamesm pettyjamesm Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the generated class at the definition site must specifically implement whatever interface is required (in this case, BooleanSupplier but in other places, it could be Supplier<Boolean> or Callable<Boolean>, etc) you couldn't produce a single global () -> const lambda, but you could presumably produce one per target type. However, you also need to dedupe on the constant value itself (ie: () -> true and () -> false) which might be reasonable and beneficial for BooleanSupplier with boolean constants, but not for most other constant returning lambda types in general. I doubt that the JVM does this, for that reason- but I could be wrong.

That said, even if there were such an optimization in hotspot, we still have three potential implementations that we might produce:

  • () -> const (potentially 2x, for () -> true and () -> false)
  • () -> variable
  • PartitionMatchSupplier

Since hotspot will only devirtualize method dispatch at usage sites with 2 or fewer target methods, we don't really have a choice but to avoid () -> const in favor of the more general () -> variable form, and declaring the class manually makes that choice more explicit to the intent.

private final List<HiveColumnHandle> partitionColumns;

@Nullable
private volatile Boolean finalResult; // value is null until the dynamic filter no longer needs to be evaluated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am fine with current state

@findepi
Copy link
Member

findepi commented Feb 16, 2022

cc @sopel39

@pettyjamesm pettyjamesm force-pushed the memoize-partition-dynamic-filter-result branch from 98b76e8 to 0db28dc Compare February 16, 2022 14:18
@pettyjamesm pettyjamesm force-pushed the memoize-partition-dynamic-filter-result branch from 0db28dc to 3829e85 Compare February 16, 2022 17:28
@findepi findepi merged commit 7a60c35 into trinodb:master Feb 18, 2022
@findepi
Copy link
Member

findepi commented Feb 18, 2022

Too low level for RNs imo.

@github-actions github-actions bot added this to the 372 milestone Feb 18, 2022
@pettyjamesm pettyjamesm deleted the memoize-partition-dynamic-filter-result branch February 18, 2022 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants