[BUG] GpuBroadcastToCpuExec redundantly executes portion of plan being broadcast #3709

jlowe · 2021-09-29T19:40:38Z

GpuBroadcastToCpuExec is used to fixup a plan where a GPU broadcast is being reused for a CPU plan. However today it is re-executing the portion of the plan corresponding to the original broadcast, which not only wastes time but also artificially increases the reported metrics for that portion of the plan (e.g.: row counts from file reads will be a multiple of the actual number of rows in the file).

Ideally GpuBroadcastToCpuExec should reuse the existing GPU broadcast, but we also do not want to require the driver to have a GPU. Today deserializing a GPU broadcast automatically places the data in GPU memory as part of deserialization, but this is a use-case where we need the GPU broadcast data to remain in host memory after deserialization so the driver can work with it safely. One possible solution is to treat GPU broadcasts as we do GPU shuffles when using the legacy shuffle, i.e.: leave the data being transferred in host memory and update the plan with something similar to GpuShuffleCoalesceExec that expects its columnar input to be in host memory and it's sole job is to coalesce the data and put it on the GPU. For the GpuBroadcastToCpuExec use-case, we would deal with the host data representation directly in the driver.

The text was updated successfully, but these errors were encountered:

sperlingxx · 2022-01-25T03:31:12Z

I believe this PR can be closed, since we have already replaced GpuBroadcastToCpuExec with GpuSubqueryBroadcastExec.

jlowe · 2022-01-25T15:39:29Z

@sperlingxx have you actually verified this (e.g.: running a query that uses DPP on both the CPU and GPU)? I tried running NDS query 5 on partitioned data to verify myself, but that threw an exception, see #4625

jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 29, 2021

sameerz added performance A performance related task/issue and removed ? - Needs Triage Need team to review and classify labels Oct 5, 2021

sperlingxx closed this as completed Jan 25, 2022

jlowe mentioned this issue Jan 27, 2022

Fix the incomplete capture of SubqueryBroadcast #4630

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] GpuBroadcastToCpuExec redundantly executes portion of plan being broadcast #3709

[BUG] GpuBroadcastToCpuExec redundantly executes portion of plan being broadcast #3709

jlowe commented Sep 29, 2021

sperlingxx commented Jan 25, 2022

jlowe commented Jan 25, 2022

[BUG] GpuBroadcastToCpuExec redundantly executes portion of plan being broadcast #3709

[BUG] GpuBroadcastToCpuExec redundantly executes portion of plan being broadcast #3709

Comments

jlowe commented Sep 29, 2021

sperlingxx commented Jan 25, 2022

jlowe commented Jan 25, 2022