Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GpuBroadcastToCpuExec redundantly executes portion of plan being broadcast #3709

Closed
jlowe opened this issue Sep 29, 2021 · 2 comments
Closed
Labels
bug Something isn't working performance A performance related task/issue

Comments

@jlowe
Copy link
Contributor

jlowe commented Sep 29, 2021

GpuBroadcastToCpuExec is used to fixup a plan where a GPU broadcast is being reused for a CPU plan. However today it is re-executing the portion of the plan corresponding to the original broadcast, which not only wastes time but also artificially increases the reported metrics for that portion of the plan (e.g.: row counts from file reads will be a multiple of the actual number of rows in the file).

Ideally GpuBroadcastToCpuExec should reuse the existing GPU broadcast, but we also do not want to require the driver to have a GPU. Today deserializing a GPU broadcast automatically places the data in GPU memory as part of deserialization, but this is a use-case where we need the GPU broadcast data to remain in host memory after deserialization so the driver can work with it safely. One possible solution is to treat GPU broadcasts as we do GPU shuffles when using the legacy shuffle, i.e.: leave the data being transferred in host memory and update the plan with something similar to GpuShuffleCoalesceExec that expects its columnar input to be in host memory and it's sole job is to coalesce the data and put it on the GPU. For the GpuBroadcastToCpuExec use-case, we would deal with the host data representation directly in the driver.

@jlowe jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 29, 2021
@sameerz sameerz added performance A performance related task/issue and removed ? - Needs Triage Need team to review and classify labels Oct 5, 2021
@sperlingxx
Copy link
Collaborator

I believe this PR can be closed, since we have already replaced GpuBroadcastToCpuExec with GpuSubqueryBroadcastExec.

@jlowe
Copy link
Contributor Author

jlowe commented Jan 25, 2022

@sperlingxx have you actually verified this (e.g.: running a query that uses DPP on both the CPU and GPU)? I tried running NDS query 5 on partitioned data to verify myself, but that threw an exception, see #4625

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

3 participants