Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GpuShuffleExchangeExec in v2 shims has inconsistent packaging #3609

Closed
jlowe opened this issue Sep 22, 2021 · 2 comments · Fixed by #4203
Closed

[BUG] GpuShuffleExchangeExec in v2 shims has inconsistent packaging #3609

jlowe opened this issue Sep 22, 2021 · 2 comments · Fixed by #4203
Assignees
Labels
bug Something isn't working build Related to CI / CD or cleanly building

Comments

@jlowe
Copy link
Contributor

jlowe commented Sep 22, 2021

GpuShuffleExchangeExec was moved into the v2 shim style in #3559, but this class appears in different packages depending upon which shim is being used:

$ find . -name GpuShuffleExchangeExec.scala | xargs grep ^package
./sql-plugin/src/main/311+-nondb/scala/com/nvidia/spark/rapids/shims/v2/GpuShuffleExchangeExec.scala:package com.nvidia.spark.rapids.shims.v2
./sql-plugin/src/main/311db/scala/org/apache/spark/rapids/shims/v2/GpuShuffleExchangeExec.scala:package org.apache.spark.sql.rapids.shims.v2
./sql-plugin/src/main/301db/scala/com/nvidia/spark/rapids/shims/v2/GpuShuffleExchangeExec.scala:package com.nvidia.spark.rapids.shims.v2
./sql-plugin/src/main/301until310-nondb/scala/com/nvidia/spark/rapids/shims/v2/GpuShuffleExchangeExec.scala:package com.nvidia.spark.rapids.shims.v2

This is working today because this class isn't referenced only in code within the specific shim which uses the appropriate package path. If we ever attempt to commonize code that references this class it will break the build since common code can't directly reference the shimmed class via a single package path (which is the main advantage of the v2 style shim).

@jlowe jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify build Related to CI / CD or cleanly building labels Sep 22, 2021
@Salonijain27 Salonijain27 removed the ? - Needs Triage Need team to review and classify label Sep 28, 2021
@res-life res-life self-assigned this Nov 22, 2021
@res-life
Copy link
Collaborator

@jlowe

One issue is the inconsistent of package and path of GpuShuffleExchangeExec in 312db directory

./sql-plugin/src/main/312db/scala/org/apache/spark/rapids/shims/v2/GpuShuffleExchangeExec.scala

package: org.apache.spark.sql.rapids.shims.v2
path:     org/apache/spark/rapids/shims/v2

Another issue:
org.apache.spark.MapOutputStatistics can't be accessed from outside of [spark] package

private[spark] class MapOutputStatistics

Is the following solution ok?
create one class named "GpuShuffleExchangeExecBase" in org.apache.spark package and create "GpuShuffleExchangeExec" which extends the base class in the expected package org.apache.spark.sql.rapids.shims.v2

@jlowe
Copy link
Contributor Author

jlowe commented Nov 23, 2021

create one class named "GpuShuffleExchangeExecBase" in org.apache.spark package

There's already an org.apache.spark.sql.rapids.execution.GpuShuffleExchangeExecBase which isn't shimmed, but I don't think that is going to help here. The Databricks version, and only the Databricks version, needs to override a method that has MapOutputStatistics in the method signature. That means there's no choice but to have that class be somewhere in the org.apache.spark.... hierarchy to access that class.

I think the proper solution is to move all the GpuShuffleExchangeExec shim v2 classes to org.apache.spark.rapids.shims.v2 so they're consistent. Yes, most implementations won't need to be in that package, but I think it's more important for the class paths to be consistent across shims in the v2 shim approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build Related to CI / CD or cleanly building
Projects
None yet
3 participants