-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use bloom filter for evaluating dynamic filters on strings #24528
base: master
Are you sure you want to change the base?
Conversation
core/trino-main/src/main/java/io/trino/sql/gen/columnar/BloomFilter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/gen/columnar/BloomFilter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/gen/columnar/BloomFilter.java
Outdated
Show resolved
Hide resolved
137c6e9
to
0edc826
Compare
core/trino-main/src/main/java/io/trino/sql/gen/columnar/BloomFilter.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/gen/columnar/BloomFilter.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/gen/columnar/BloomFilter.java
Outdated
Show resolved
Hide resolved
bloom = new long[bloomSize]; | ||
bloomSizeMask = bloomSize - 1; | ||
for (Slice value : values) { | ||
long hashCode = XxHash64.hash(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slice has a hashCode that is using XxHash64 already (and is memoized). Just value.hashCode()
} | ||
|
||
@VisibleForTesting | ||
public boolean contains(Slice data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is only used in test. Can we use contains(data, offset, length) in test instead?
|
||
private boolean contains(Slice data, int offset, int length) | ||
{ | ||
long hashCode = XxHash64.hash(data, offset, length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just data.hashCode(offset, length)
;
|
||
private static long bloomMask(long hashCode) | ||
{ | ||
// returned mask sets 3 bits based on portions of given hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// returned mask sets 3 bits based on portions of given hash
// Extract 38th to 43rd bits
return (1L << ((hashCode >> 21) & 63)) |
// Extract 32nd to 37th bits
(1L << ((hashCode >> 27) & 63)) |
// Extract 26th to 31st bits
(1L << ((hashCode >> 33) & 63));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the convention in existing code has been to keep the operator at start of each new line
e.g.
TrinoFileStatus that = (TrinoFileStatus) o;
return isDirectory == that.isDirectory
&& length == that.length
&& modificationTime == that.modificationTime
&& blockLocations.equals(that.blockLocations)
&& path.equals(that.path);
@@ -119,7 +119,7 @@ public Expression toPredicate(TupleDomain<Symbol> tupleDomain) | |||
return IrUtils.combineConjuncts(toPredicateConjuncts(tupleDomain)); | |||
} | |||
|
|||
public List<Expression> toPredicateConjuncts(TupleDomain<Symbol> tupleDomain) | |||
private List<Expression> toPredicateConjuncts(TupleDomain<Symbol> tupleDomain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had made it public for usage in DynamicPageFilter, now I don't need it so made it private again
@@ -132,7 +132,7 @@ public List<Expression> toPredicateConjuncts(TupleDomain<Symbol> tupleDomain) | |||
.collect(toImmutableList()); | |||
} | |||
|
|||
private Expression toPredicate(Domain domain, Reference reference) | |||
public Expression toPredicate(Domain domain, Reference reference) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using it in DynamicPageFilter
Could you please add a high-level description about where the oprimizations proposed in this PR would apply. |
27017ce
to
13b8ccd
Compare
BenchmarkDynamicPageFilter.filterPages (filterSize) (inputDataSet) (inputNullChance) (nonNullsSelectivity) (nullsAllowed) Mode Cnt Before Score After Score Units 100 VARCHAR_RANDOM 0.05 0.2 false thrpt 20 145.858 ± 4.541 590.506 ± 28.510 ops/s 1000 VARCHAR_RANDOM 0.05 0.2 false thrpt 20 136.995 ± 2.395 596.036 ± 22.694 ops/s 10000 VARCHAR_RANDOM 0.05 0.2 false thrpt 20 136.990 ± 5.284 594.118 ± 15.764 ops/s 100000 VARCHAR_RANDOM 0.05 0.2 false thrpt 20 114.591 ± 7.307 587.445 ± 9.818 ops/s 1000000 VARCHAR_RANDOM 0.05 0.2 false thrpt 20 43.234 ± 1.621 578.800 ± 15.694 ops/s 5000000 VARCHAR_RANDOM 0.05 0.2 false thrpt 20 40.018 ± 2.245 464.153 ± 20.914 ops/s
13b8ccd
to
d8b44ff
Compare
Description
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: