Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add streaming TopN rank() implementation #6333

Merged
merged 15 commits into from
Jan 5, 2021
Merged

Conversation

erichwang
Copy link
Contributor

No description provided.

@cla-bot cla-bot bot added the cla-signed label Dec 14, 2020
@erichwang erichwang requested review from sopel39 and dain December 14, 2020 10:55
@@ -103,7 +103,7 @@
public static final String MAX_RECURSION_DEPTH = "max_recursion_depth";
public static final String USE_MARK_DISTINCT = "use_mark_distinct";
public static final String PREFER_PARTIAL_AGGREGATION = "prefer_partial_aggregation";
public static final String OPTIMIZE_TOP_N_ROW_NUMBER = "optimize_top_n_row_number";
public static final String OPTIMIZE_TOP_N_RANKING = "optimize_top_n_ranking";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have equivalent of @LegacyConfig for session toggles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi, are you saying that we have an equivalent of @LegacyConfig, and that we should use it for this PR? Or that someone should implement an equivalent at some point?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not aware that we have

Copy link
Contributor Author

@erichwang erichwang Dec 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's what I understood too, but wasn't sure if I missed something. It sounds like a reasonable feature request. Would you think that this blocks any of this PR? If not, we can file an issue for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it blocks here. But would be nice to have. Would you be able to implement this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might take a look at this if I get some time, but no guarantees

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments (up to Rename TopNRowNumber* => TopNRanking* )

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed Add GroupedTopNRankAccumulator for streaming rank

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only Add optimizer capability to produce streaming topN rank() plans for review

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great job! I've added comments, but overall it looks good!

@erichwang erichwang force-pushed the topnrank branch 4 times, most recently from 88c87fa to fca1c1a Compare December 29, 2020 01:38
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comment about NaN peer groups

@erichwang erichwang force-pushed the topnrank branch 2 times, most recently from 08f40f1 to d8d24ae Compare December 30, 2020 00:16
@erichwang
Copy link
Contributor Author

erichwang commented Dec 30, 2020

Note: I dropped in a new commit:
5e4ff9e
to fix an existing bug in topn row number that was copied into the rank stuff.

And also a commit at the end:
d8d24ae
to bring back compatibility with window NaN peer groups. This last commit can be dropped if #6472 lands first.

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract Fix TopNRowNumberOperator incorrectly swapped types as separate PR

@@ -577,11 +577,11 @@ private IntegrityStats verifyHeapIntegrity(long groupId, long heapNodeIndex)
verify(actualPeerGroupCount == peerGroupCount, "Recorded peer group count does not match actual");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove HACK from commit message. It's just maintaining current semantics. In fact, I would just squash it, but if it would make it easier to revert later on, we could keep it as separate commit

Copy link
Contributor Author

@erichwang erichwang Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is maintaining current semantics, it is completely at odds with the established design invariants of this class's data structure and API, and it NEEDS to be rolled back asap for this class to become reasonable and cohesive. It happens to work today, but only accidentally. All comments and standard programming expectations on relationship between equals and compare are silently violated here in unexpected ways. The other classes don't have this problem because they use a single comparison method and data structure -- here we need two coordinated data structures that don't agree anymore, and hence my apprehension about this. It can be very error prone, which is why I added the integrity checks to the system to enforce these invariants.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All comments and standard programming expectations on relationship between equals and compare are silently violated here in unexpected ways

Hard relationship is only between equals and hashCode. compare and equals do not have strict relationship. Consider example, nulls:

  • are nulls equal? no (equals(null, null)==false)
  • are nulls placed in same place in global ordering? yes (compare(null, null)==0)

Copy link
Contributor Author

@erichwang erichwang Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what i mean is that the design of this specific data structure optimization has this requirement to make sense. In plain java, you are correct, but in this case, the strategies need to be consistent for any input we are providing to this optimization.

Anyways, I can change the commit message, but we need to get out of this state asap, because I don't even trust myself adding more code here until the invariants are re-established.

@@ -33,7 +33,6 @@
import static com.google.common.base.Preconditions.checkState;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create separate PR. This commit is unrelated to rank improvements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sopel39, this is a prerequisite for this refactor. The rank changes are dependent on these changes to function correctly in the Builder refactor.

.build();

TopNRowNumberOperatorFactory operatorFactory = new TopNRowNumberOperatorFactory(
0,
new PlanNodeId("test"),
ImmutableList.of(BIGINT, DOUBLE),
ImmutableList.of(VARCHAR, DOUBLE),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would that fail with previous TopNRowNumberOperatorCode code? Was it because BIGINT and DOUBLE comparisons were compatible?

Copy link
Contributor Author

@erichwang erichwang Dec 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This did not fail before, which was the problem -- when it should have. BIGINT and DOUBLE comparisons at the binary are entirely the same, except when it comes to the special Double values like NaN etc. It was previously using BIGINT to process DOUBLE data, and silently succeeding. VARCHAR makes this much more apparent.

@sopel39
Copy link
Member

sopel39 commented Dec 30, 2020

Benchmarks comparison-rank.pdf

q67 (the slowest query that takes 1/3 or tpcds 1tb walltime) is now fixed!

@erichwang erichwang force-pushed the topnrank branch 2 times, most recently from 3f35fa6 to 50d4676 Compare December 31, 2020 02:10
@erichwang
Copy link
Contributor Author

Does anyone know about the web-ui-checks failure? Did I forget to update some files?

@sopel39
Copy link
Member

sopel39 commented Jan 4, 2021

Does anyone know about the web-ui-checks failure? Did I forget to update some files?

I'm pretty sure these are intermittent

TopNRowNumberOperator was previously incorrectly using the output type
order for the SimplePageWithPositionComparator strategy, when the
channels were all defined in terms of the input types. This issue is not
visible in production code because the LocalExecutionPlanner always puts
the outputs in the same order as the inputs, but this means that the
current set of tests were accidentally correct. The tests have been
updated to fail if this occurs.
LongLong2LongOpenCustomBigHashMap originally uses the value zero to
represent keys that haven't been mapped yet (fastutil calls these null
keys in their code). However, this means that the custom HashStrategy
will sometimes be asked to check equality on zero valued keys, even
though a zero value key may not exist from the strategies perspective.
To help callers better disambiguate this situation, we now allow the
callers to configure the null keys to be used on instance creation.
Renames:
GroupedTopNBuilder => GroupedTopNRowNumberBuilder
BenchmarkGroupedTopNBuilder => BenchmarkGroupedTopNRowNumberBuilder
TestGroupedTopNBuilder => TestGroupedTopNRowNumberBuilder
Generalizing TopNRowNumber components as a more generic top N ranking
system to allow inclusions of rank and dense_rank
Provides the template to quickly enable streaming topn RANK and
DENSE_RANK, but does not enable them yet.
WindowFilterPushDown was previously too loose inchecking for rank bounds
between 0 to N when comparing with a TopN operator. All rank values
start at 1.
The default Window implementation uses equalsNullSafe rather than the
expected IS NOT DISTINCT FROM semantics to determine peer groups. This
means values such as NaN, positive/negative zero, and nested null
structure types  will be incorrectly treated as separate peer groups. We
are putting in this temporary hack to retain compatibility with the
current window behavior, but will need to revert this after it gets
fixed.
@sopel39
Copy link
Member

sopel39 commented Jan 5, 2021

@sopel39 sopel39 merged commit ade8af1 into trinodb:master Jan 5, 2021
@sopel39
Copy link
Member

sopel39 commented Jan 5, 2021

merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants