-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow GpuWindowExec to partition on structs #4673
Allow GpuWindowExec to partition on structs #4673
Conversation
…moved non_gpu mark for sliding and tumbling window integration tests Signed-off-by: Navin Kumar <navink@nvidia.com>
Signed-off-by: Navin Kumar <navink@nvidia.com>
build |
Signed-off-by: Navin Kumar <navink@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this PR is way too generic. Ideally it would be something closer to the specific change being made like.
Allow GpuWindowExec to partition on structs.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
build |
…reflect new fallback logic given that single level struct is now supported Signed-off-by: Navin Kumar <navink@nvidia.com>
build |
Signed-off-by: Navin Kumar <navink@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a minor nitpick regarding indentation. Plus, copyright dates need to be updated on all source files. Barring that, LGTM!
I have been mulling over limiting the support for STRUCT
to "row-based" window specs. I have convinced myself that that is not required (or can at least be delayed until a subsequent PR). The rationale is recorded below, for future reference:
First, this PR applies to the PARTITION BY
clause, not the ORDER BY
. So it is orthogonal to whether STRUCT
rows are range-comparable.
Second, the feature that this change enables is opted into by the user. Specifically, the following construct described in #4626:
w = (Window.partitionBy(F.window("timestampGMT", "7 days")))
This construct converts a query that's ordinarily expressed with a RANGE
(BETWEEN INTERVAL 7 DAYS
) window to a ROWS
window, at the expense of increasing the group count.
The choice of inefficiency is between the user and Spark. The change in this PR simply enables it.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Navin Kumar <navink@nvidia.com>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ship it!
Fixes #4626.
This updates the type check for WindowExec's partitionBy parameter to ensure that the code will run on the GPU. The underlying requirements for these queries to run on the GPU has already been implemented previously (See #2877), so this code just unblocks the check. Also, the allow_non_gpu mark has been removed from the relevant time window tests, since these are no longer blocked from running on the GPU.