-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26277][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled #23224
Conversation
Test build #99699 has finished for PR 23224 at commit
|
retest this please |
Can we file a JIRA? I think it's not minor. |
@HyukjinKwon Thank you for your comments! I have filed a JIRA and updated the PR title accordingly. |
Test build #99711 has finished for PR 23224 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to check whole-stage codegen enabled inside the test?
@felixcheung Yes, that makes sense. I have added a commit to check that. |
val df = spark.range(10).filter('id < 5).toDF() | ||
testSparkPlanMetrics(df, 1, Map.empty, true) | ||
df.queryExecution.executedPlan.find(_.isInstanceOf[WholeStageCodegenExec]) | ||
.getOrElse(assert(false)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems test Sort metric
also has similar issue:
test("Sort metrics") {
// Assume the execution plan is
// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
val ds = spark.range(10).sort('id)
testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @viirya. Very good suggestions.
After investigation, besides whole-stage codegen related issue, I found another issue. #20560/SPARK-23375 introduced an optimizer rule to eliminate redundant Sort. For a test case named "Sort metrics" in SQLMetricsSuite
, because range is already sorted, sort is removed by the RemoveRedundantSorts
, which makes the test case meaningless. This seems to be a pretty different issue, so I opened a new PR. See #23258 for details.
Test build #99829 has finished for PR 23224 at commit
|
Test build #99858 has finished for PR 23224 at commit
|
retest this please |
Test build #99869 has finished for PR 23224 at commit
|
LGTM |
71e569b
to
fd7c63b
Compare
Test build #100596 has finished for PR 23224 at commit
|
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question, otherwise LGTM
*/ | ||
protected def testSparkPlanMetrics( | ||
df: DataFrame, | ||
expectedNumOfJobs: Int, | ||
expectedMetrics: Map[Long, (String, Map[String, Any])]): Unit = { | ||
expectedMetrics: Map[Long, (String, Map[String, Any])], | ||
enableWholeStage: Boolean = false): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? IIUC it is never set to true...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's set to true
in the new test that was introduced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no testSparkPlanMetricsWithPredicates
is used there, not testSparkPlanMetrics
. So for testSparkPlanMetrics
it is never used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes good point. Nothing calls this with a different value here. Yeah this should just pass false
, not take a new arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should not take any new value here and should pass nothing to testSparkPlanMetricsWithPredicates
, as false
is the default value there. So basically, no change in the testSparkPlanMetrics
method is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, currently testSparkPlanMetrics is always used with whole-stage codegen disabled. I'd argue It's possible we want to use testSparkPlanMetrics when whole-stage codegen is enabled, just like testSparkPlanMetricsWithPredicates. Do you mean we should limit change scope as possible as we can? Or shall we do something for the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we should do only changes which are strictly needed. If in the future we will need this, we will add the flag. But until then, we shouldn't add something which is not needed. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I have removed the unused flag from testSparkPlanMetrics
in the new commit.
Test build #100642 has finished for PR 23224 at commit
|
LGTM, thanks |
Merged to master |
…ith whole-stage codegen enabled ## What changes were proposed in this pull request? In `org.apache.spark.sql.execution.metric.SQLMetricsSuite`, there's a test case named "WholeStageCodegen metrics". However, it is executed with whole-stage codegen disabled. This PR fixes this by enable whole-stage codegen for this test case. ## How was this patch tested? Tested locally using exiting test cases. Closes apache#23224 from seancxmao/codegen-metrics. Authored-by: seancxmao <seancxmao@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
What changes were proposed in this pull request?
In
org.apache.spark.sql.execution.metric.SQLMetricsSuite
, there's a test case named "WholeStageCodegen metrics". However, it is executed with whole-stage codegen disabled. This PR fixes this by enable whole-stage codegen for this test case.How was this patch tested?
Tested locally using exiting test cases.