-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-44340][SQL] Define the computing logic through PartitionEvaluator API and use it in WindowGroupLimitExec #41899
Conversation
…tor API and use it in WindowGroupLimitExec
ping @cloud-fan cc @vinodkc |
cc @viirya |
} | ||
|
||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create the function in the match and then call `new WindowGroupLimitPartitionEvaluator(f) outside the match expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The CI failure looks unrelated. |
thanks, merging to master! |
@cloud-fan @viirya @vinodkc Thank you! |
} else { | ||
child.execute().mapPartitionsInternal { iter => | ||
val evaluator = evaluatorFactory.createEvaluator() | ||
evaluator.eval(0, iter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… correctly for WindowGroupLimitExec,WindowExec and WindowInPandasExec ### What changes were proposed in this pull request? This is a followup of #41899 and #41939, to set the partition index correctly even if it's not used for now. It also contains a few code cleanup. ### Why are the changes needed? future-proof ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? existing tests Closes #42208 from beliefer/SPARK-44340_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…indowGroupLimitExec ### What changes were proposed in this pull request? This is a followup of #41899, to set the partition index correctly even if it's not used for now. It also contains a few code cleanup. ### Why are the changes needed? future-proof ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? existing tests Closes #42233 from beliefer/SPARK-44340_followup_3.5. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…tor API and use it in WindowGroupLimitExec ### What changes were proposed in this pull request? `WindowGroupLimitExec` is updated to use the PartitionEvaluator API to do execution. ### Why are the changes needed? To define the computing logic and requires the caller side to explicitly list what needs to be serialized and sent to executors ### Does this PR introduce _any_ user-facing change? 'No'. Just update the inner implementation. ### How was this patch tested? Test cases updated & running benchmark manually. ``` [info] Java HotSpot(TM) 64-Bit Server VM 1.8.0_311-b11 on Mac OS X 10.16 [info] Intel(R) Core(TM) i7-9750H CPU 2.60GHz [info] Benchmark Top-K: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------------------------------------------------------- [info] ROW_NUMBER (PARTITION: , WindowGroupLimit: false) 10622 11266 617 2.0 506.5 1.0X [info] ROW_NUMBER (PARTITION: , WindowGroupLimit: true) 1712 1744 19 12.2 81.6 6.2X [info] ROW_NUMBER (PARTITION: PARTITION BY b, WindowGroupLimit: false) 23679 25107 NaN 0.9 1129.1 0.4X [info] ROW_NUMBER (PARTITION: PARTITION BY b, WindowGroupLimit: true) 6381 6527 95 3.3 304.3 1.7X [info] RANK (PARTITION: , WindowGroupLimit: false) 11492 11631 106 1.8 548.0 0.9X [info] RANK (PARTITION: , WindowGroupLimit: true) 2675 2920 118 7.8 127.5 4.0X [info] RANK (PARTITION: PARTITION BY b, WindowGroupLimit: false) 24208 24299 95 0.9 1154.3 0.4X [info] RANK (PARTITION: PARTITION BY b, WindowGroupLimit: true) 6347 6478 85 3.3 302.6 1.7X [info] DENSE_RANK (PARTITION: , WindowGroupLimit: false) 11288 11959 458 1.9 538.2 0.9X [info] DENSE_RANK (PARTITION: , WindowGroupLimit: true) 2684 2945 144 7.8 128.0 4.0X [info] DENSE_RANK (PARTITION: PARTITION BY b, WindowGroupLimit: false) 24316 25130 711 0.9 1159.5 0.4X [info] DENSE_RANK (PARTITION: PARTITION BY b, WindowGroupLimit: true) 6589 6925 383 3.2 314.2 1.6X ``` Closes apache#41899 from beliefer/SPARK-44340. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
… correctly for WindowGroupLimitExec,WindowExec and WindowInPandasExec ### What changes were proposed in this pull request? This is a followup of apache#41899 and apache#41939, to set the partition index correctly even if it's not used for now. It also contains a few code cleanup. ### Why are the changes needed? future-proof ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? existing tests Closes apache#42208 from beliefer/SPARK-44340_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
WindowGroupLimitExec
is updated to use the PartitionEvaluator API to do execution.Why are the changes needed?
To define the computing logic and requires the caller side to explicitly list what needs to be serialized and sent to executors
Does this PR introduce any user-facing change?
'No'.
Just update the inner implementation.
How was this patch tested?
Test cases updated & running benchmark manually.