-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26159] Codegen for LocalTableScanExec and RDDScanExec #23127
Conversation
Test build #99218 has finished for PR 23127 at commit
|
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
Show resolved
Hide resolved
Looks good. One more higher level question that can also be addressed in a follow-up. |
/** | ||
* Helper default should stop check code. | ||
*/ | ||
def shouldStopCheckCode: String = if (needStopCheck) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use it in more places. This can be done in folllowup.
LGTM |
retest this please |
Test build #99258 has finished for PR 23127 at commit
|
Test build #99259 has finished for PR 23127 at commit
|
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala
Show resolved
Hide resolved
@cloud-fan @rednaxelafx WDYT about how I patched it up? |
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
Outdated
Show resolved
Hide resolved
Test build #99275 has finished for PR 23127 at commit
|
Test build #99286 has finished for PR 23127 at commit
|
there are still 2 golden file test failures because of the plan change... |
@cloud-fan Thanks. Actually, I had to revert earlier updates because the plan no longer changes for LocalTableScanExec that is alone in a wholestagecodegen. |
Talked with @hvanhovell offline and set |
Test build #99320 has finished for PR 23127 at commit
|
Test build #99321 has finished for PR 23127 at commit
|
jenkins retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - @cloud-fan can you take another look?
Test build #99329 has finished for PR 23127 at commit
|
@juliuszsompolski It won't help, you need to fix python tests it seems
|
Test build #99339 has finished for PR 23127 at commit
|
thanks, merging to master! |
## What changes were proposed in this pull request? Implement codegen for `LocalTableScanExec` and `ExistingRDDExec`. Refactor to share code between `LocalTableScanExec`, `ExistingRDDExec`, `InputAdapter` and `RowDataSourceScanExec`. The difference in `doProduce` between these four was that `ExistingRDDExec` and `RowDataSourceScanExec` triggered adding an `UnsafeProjection`, while `InputAdapter` and `LocalTableScanExec` did not. In the new trait `InputRDDCodegen` I added a flag `createUnsafeProjection` which the operators set accordingly. Note: `LocalTableScanExec` explicitly creates its input as `UnsafeRows`, so it was obvious why it doesn't need an `UnsafeProjection`. But if an `InputAdapter` may take input that is `InternalRows` but not `UnsafeRows`, then I think it doesn't need an unsafe projection just because any other operator that is its parent would do that. That assumes that that any parent operator would always result in some `UnsafeProjection` being eventually added, and hence the output of the `WholeStageCodegen` unit would be `UnsafeRows`. If these assumptions hold, I think `createUnsafeProjection` could be set to `(parent == null)`. Note: Do not codegen `LocalTableScanExec` when it's the only operator. `LocalTableScanExec` has optimized driver-only `executeCollect` and `executeTake` code paths that are used to return `Command` results without starting Spark Jobs. They can no longer be used if the `LocalTableScanExec` gets optimized. ## How was this patch tested? Covered and used in existing tests. Closes apache#23127 from juliuszsompolski/SPARK-26159. Authored-by: Juliusz Sompolski <julek@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Implement codegen for
LocalTableScanExec
andExistingRDDExec
. Refactor to share code betweenLocalTableScanExec
,ExistingRDDExec
,InputAdapter
andRowDataSourceScanExec
.The difference in
doProduce
between these four was thatExistingRDDExec
andRowDataSourceScanExec
triggered adding anUnsafeProjection
, whileInputAdapter
andLocalTableScanExec
did not.In the new trait
InputRDDCodegen
I added a flagcreateUnsafeProjection
which the operators set accordingly.Note:
LocalTableScanExec
explicitly creates its input asUnsafeRows
, so it was obvious why it doesn't need anUnsafeProjection
. But if anInputAdapter
may take input that isInternalRows
but notUnsafeRows
, then I think it doesn't need an unsafe projection just because any other operator that is its parent would do that. That assumes that that any parent operator would always result in someUnsafeProjection
being eventually added, and hence the output of theWholeStageCodegen
unit would beUnsafeRows
. If these assumptions hold, I thinkcreateUnsafeProjection
could be set to(parent == null)
.Note: Do not codegen
LocalTableScanExec
when it's the only operator.LocalTableScanExec
has optimized driver-onlyexecuteCollect
andexecuteTake
code paths that are used to returnCommand
results without starting Spark Jobs. They can no longer be used if theLocalTableScanExec
gets optimized.How was this patch tested?
Covered and used in existing tests.