Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow GpuWindowExec to partition on structs #4673

Merged
merged 6 commits into from
Feb 2, 2022

Conversation

NVnavkumar
Copy link
Collaborator

Fixes #4626.

This updates the type check for WindowExec's partitionBy parameter to ensure that the code will run on the GPU. The underlying requirements for these queries to run on the GPU has already been implemented previously (See #2877), so this code just unblocks the check. Also, the allow_non_gpu mark has been removed from the relevant time window tests, since these are no longer blocked from running on the GPU.

…moved non_gpu mark for sliding and tumbling window integration tests

Signed-off-by: Navin Kumar <navink@nvidia.com>
Signed-off-by: Navin Kumar <navink@nvidia.com>
@NVnavkumar NVnavkumar requested a review from revans2 February 1, 2022 00:13
@NVnavkumar
Copy link
Collaborator Author

build

Signed-off-by: Navin Kumar <navink@nvidia.com>
@sameerz sameerz added the bug Something isn't working label Feb 1, 2022
@sameerz sameerz added this to the Jan 31 - Feb 11 milestone Feb 1, 2022
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this PR is way too generic. Ideally it would be something closer to the specific change being made like.

Allow GpuWindowExec to partition on structs.

@revans2 revans2 added task Work required that improves the product but is not user facing and removed bug Something isn't working labels Feb 1, 2022
@NVnavkumar NVnavkumar changed the title [WIP] Fix type check for WindowExec that prevents it from running on the GPU [WIP] Allow GpuWindowExec to partition on structs Feb 1, 2022
@NVnavkumar
Copy link
Collaborator Author

build

…reflect new fallback logic given that single level struct is now supported

Signed-off-by: Navin Kumar <navink@nvidia.com>
@NVnavkumar
Copy link
Collaborator Author

build

Signed-off-by: Navin Kumar <navink@nvidia.com>
revans2
revans2 previously approved these changes Feb 2, 2022
@NVnavkumar NVnavkumar marked this pull request as ready for review February 2, 2022 17:22
@NVnavkumar NVnavkumar changed the title [WIP] Allow GpuWindowExec to partition on structs Allow GpuWindowExec to partition on structs Feb 2, 2022
@NVnavkumar NVnavkumar requested a review from mythrocks February 2, 2022 17:23
mythrocks
mythrocks previously approved these changes Feb 2, 2022
Copy link
Collaborator

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a minor nitpick regarding indentation. Plus, copyright dates need to be updated on all source files. Barring that, LGTM!

I have been mulling over limiting the support for STRUCT to "row-based" window specs. I have convinced myself that that is not required (or can at least be delayed until a subsequent PR). The rationale is recorded below, for future reference:

First, this ​PR applies to the PARTITION BY clause, not the ORDER BY. So it is orthogonal to whether STRUCT rows are range-comparable.

Second, the feature that this change enables is opted into by the user. Specifically, the following construct described in #4626:

w = (Window.partitionBy(F.window("timestampGMT", "7 days")))

This construct converts a query that's ordinarily expressed with a RANGE (BETWEEN INTERVAL 7 DAYS) window to a ROWS window, at the expense of increasing the group count.
The choice of inefficiency is between the user and Spark. The change in this PR simply enables it.

@mythrocks mythrocks self-requested a review February 2, 2022 18:30
Signed-off-by: Navin Kumar <navink@nvidia.com>
@NVnavkumar NVnavkumar dismissed stale reviews from mythrocks and revans2 via e467001 February 2, 2022 18:58
@NVnavkumar
Copy link
Collaborator Author

build

Copy link
Collaborator

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] <WindowExec> cannot run on GPU because unsupported data types in 'partitionSpec'
4 participants