-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add streaming TopN rank() implementation #6333
Conversation
@@ -103,7 +103,7 @@ | |||
public static final String MAX_RECURSION_DEPTH = "max_recursion_depth"; | |||
public static final String USE_MARK_DISTINCT = "use_mark_distinct"; | |||
public static final String PREFER_PARTIAL_AGGREGATION = "prefer_partial_aggregation"; | |||
public static final String OPTIMIZE_TOP_N_ROW_NUMBER = "optimize_top_n_row_number"; | |||
public static final String OPTIMIZE_TOP_N_RANKING = "optimize_top_n_ranking"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have equivalent of @LegacyConfig
for session toggles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findepi, are you saying that we have an equivalent of @LegacyConfig, and that we should use it for this PR? Or that someone should implement an equivalent at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not aware that we have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that's what I understood too, but wasn't sure if I missed something. It sounds like a reasonable feature request. Would you think that this blocks any of this PR? If not, we can file an issue for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it blocks here. But would be nice to have. Would you be able to implement this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might take a look at this if I get some time, but no guarantees
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments (up to Rename TopNRowNumber* => TopNRanking*
)
presto-main/src/main/java/io/prestosql/util/LongLong2LongOpenCustomBigHashMap.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/util/LongLong2LongOpenCustomBigHashMap.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/TopNRankingOperator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/TopNRankingOperator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/optimizations/PlanNodeDecorrelator.java
Outdated
Show resolved
Hide resolved
...main/java/io/prestosql/sql/planner/iterative/rule/PushPredicateThroughProjectIntoWindow.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/optimizations/WindowFilterPushDown.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/util/LongLong2LongOpenCustomBigHashMap.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed Add GroupedTopNRankAccumulator for streaming rank
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/io/prestosql/operator/TestGroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/io/prestosql/operator/TestGroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only Add optimizer capability to produce streaming topN rank() plans
for review
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankBuilder.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/PageWithPositionEqualsAndHash.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankBuilder.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankBuilder.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/SimplePageWithPositionEqualsAndHash.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankBuilder.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/io/prestosql/operator/TestGroupedTopNRankBuilder.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/analyzer/FeaturesConfig.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/optimizations/WindowFilterPushDown.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/optimizations/WindowFilterPushDown.java
Outdated
Show resolved
Hide resolved
...main/java/io/prestosql/sql/planner/iterative/rule/PushPredicateThroughProjectIntoWindow.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/optimizations/WindowFilterPushDown.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great job! I've added comments, but overall it looks good!
presto-main/src/main/java/io/prestosql/operator/GroupedTopNRankAccumulator.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/io/prestosql/operator/TestGroupedTopNRankBuilder.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/io/prestosql/sql/planner/optimizations/TestWindowFilterPushDown.java
Outdated
Show resolved
Hide resolved
88c87fa
to
fca1c1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % comment about NaN
peer groups
presto-main/src/test/java/io/prestosql/operator/TestTopNRankingOperator.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/operator/SimplePageWithPositionEqualsAndHash.java
Outdated
Show resolved
Hide resolved
08f40f1
to
d8d24ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract Fix TopNRowNumberOperator incorrectly swapped types
as separate PR
@@ -577,11 +577,11 @@ private IntegrityStats verifyHeapIntegrity(long groupId, long heapNodeIndex) | |||
verify(actualPeerGroupCount == peerGroupCount, "Recorded peer group count does not match actual"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove HACK
from commit message. It's just maintaining current semantics. In fact, I would just squash it, but if it would make it easier to revert later on, we could keep it as separate commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it is maintaining current semantics, it is completely at odds with the established design invariants of this class's data structure and API, and it NEEDS to be rolled back asap for this class to become reasonable and cohesive. It happens to work today, but only accidentally. All comments and standard programming expectations on relationship between equals and compare are silently violated here in unexpected ways. The other classes don't have this problem because they use a single comparison method and data structure -- here we need two coordinated data structures that don't agree anymore, and hence my apprehension about this. It can be very error prone, which is why I added the integrity checks to the system to enforce these invariants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All comments and standard programming expectations on relationship between equals and compare are silently violated here in unexpected ways
Hard relationship is only between equals
and hashCode
. compare
and equals
do not have strict relationship. Consider example, nulls
:
- are
nulls
equal? no (equals(null, null)==false
) - are
nulls
placed in same place in global ordering? yes (compare(null, null)==0
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what i mean is that the design of this specific data structure optimization has this requirement to make sense. In plain java, you are correct, but in this case, the strategies need to be consistent for any input we are providing to this optimization.
Anyways, I can change the commit message, but we need to get out of this state asap, because I don't even trust myself adding more code here until the invariants are re-established.
@@ -33,7 +33,6 @@ | |||
import static com.google.common.base.Preconditions.checkState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create separate PR. This commit is unrelated to rank improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sopel39, this is a prerequisite for this refactor. The rank changes are dependent on these changes to function correctly in the Builder refactor.
.build(); | ||
|
||
TopNRowNumberOperatorFactory operatorFactory = new TopNRowNumberOperatorFactory( | ||
0, | ||
new PlanNodeId("test"), | ||
ImmutableList.of(BIGINT, DOUBLE), | ||
ImmutableList.of(VARCHAR, DOUBLE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would that fail with previous TopNRowNumberOperatorCode
code? Was it because BIGINT
and DOUBLE
comparisons were compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This did not fail before, which was the problem -- when it should have. BIGINT and DOUBLE comparisons at the binary are entirely the same, except when it comes to the special Double values like NaN etc. It was previously using BIGINT to process DOUBLE data, and silently succeeding. VARCHAR makes this much more apparent.
Benchmarks comparison-rank.pdf
|
3f35fa6
to
50d4676
Compare
Does anyone know about the web-ui-checks failure? Did I forget to update some files? |
I'm pretty sure these are intermittent |
TopNRowNumberOperator was previously incorrectly using the output type order for the SimplePageWithPositionComparator strategy, when the channels were all defined in terms of the input types. This issue is not visible in production code because the LocalExecutionPlanner always puts the outputs in the same order as the inputs, but this means that the current set of tests were accidentally correct. The tests have been updated to fail if this occurs.
LongLong2LongOpenCustomBigHashMap originally uses the value zero to represent keys that haven't been mapped yet (fastutil calls these null keys in their code). However, this means that the custom HashStrategy will sometimes be asked to check equality on zero valued keys, even though a zero value key may not exist from the strategies perspective. To help callers better disambiguate this situation, we now allow the callers to configure the null keys to be used on instance creation.
Renames: GroupedTopNBuilder => GroupedTopNRowNumberBuilder BenchmarkGroupedTopNBuilder => BenchmarkGroupedTopNRowNumberBuilder TestGroupedTopNBuilder => TestGroupedTopNRowNumberBuilder
Generalizing TopNRowNumber components as a more generic top N ranking system to allow inclusions of rank and dense_rank
Provides the template to quickly enable streaming topn RANK and DENSE_RANK, but does not enable them yet.
WindowFilterPushDown was previously too loose inchecking for rank bounds between 0 to N when comparing with a TopN operator. All rank values start at 1.
The default Window implementation uses equalsNullSafe rather than the expected IS NOT DISTINCT FROM semantics to determine peer groups. This means values such as NaN, positive/negative zero, and nested null structure types will be incorrectly treated as separate peer groups. We are putting in this temporary hack to retain compatibility with the current window behavior, but will need to revert this after it gets fixed.
merged, thanks! |
No description provided.