[SPARK-26277][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled #23224

seancxmao · 2018-12-05T06:40:42Z

What changes were proposed in this pull request?

In org.apache.spark.sql.execution.metric.SQLMetricsSuite, there's a test case named "WholeStageCodegen metrics". However, it is executed with whole-stage codegen disabled. This PR fixes this by enable whole-stage codegen for this test case.

How was this patch tested?

Tested locally using exiting test cases.

SparkQA · 2018-12-05T08:05:01Z

Test build #99699 has finished for PR 23224 at commit 021728c.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

heary-cao · 2018-12-05T10:54:06Z

retest this please

HyukjinKwon · 2018-12-05T11:40:20Z

Can we file a JIRA? I think it's not minor.

seancxmao · 2018-12-05T13:04:10Z

@HyukjinKwon Thank you for your comments! I have filed a JIRA and updated the PR title accordingly.

SparkQA · 2018-12-05T14:30:08Z

Test build #99711 has finished for PR 23224 at commit 021728c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung

is there a way to check whole-stage codegen enabled inside the test?

seancxmao · 2018-12-07T13:35:53Z

@felixcheung Yes, that makes sense. I have added a commit to check that.

viirya · 2018-12-07T15:06:04Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

+    val df = spark.range(10).filter('id < 5).toDF()
+    testSparkPlanMetrics(df, 1, Map.empty, true)
+    df.queryExecution.executedPlan.find(_.isInstanceOf[WholeStageCodegenExec])
+      .getOrElse(assert(false))


Seems test Sort metric also has similar issue:

test("Sort metrics") { // Assume the execution plan is // WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1)) val ds = spark.range(10).sort('id) testSparkPlanMetrics(ds.toDF(), 2, Map.empty) }

Thank you @viirya. Very good suggestions.

After investigation, besides whole-stage codegen related issue, I found another issue. #20560/SPARK-23375 introduced an optimizer rule to eliminate redundant Sort. For a test case named "Sort metrics" in SQLMetricsSuite, because range is already sorted, sort is removed by the RemoveRedundantSorts, which makes the test case meaningless. This seems to be a pretty different issue, so I opened a new PR. See #23258 for details.

SparkQA · 2018-12-07T17:06:37Z

Test build #99829 has finished for PR 23224 at commit 3da2279.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-12-08T08:05:02Z

Test build #99858 has finished for PR 23224 at commit 71e569b.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

heary-cao · 2018-12-08T10:05:41Z

retest this please

SparkQA · 2018-12-08T13:28:15Z

Test build #99869 has finished for PR 23224 at commit 71e569b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2018-12-09T03:25:13Z

LGTM

…ole-stage codegen enabled

seancxmao · 2018-12-31T16:16:06Z

Because #23258 has got merged, so I made changes to use testSparkPlanMetricsWithPredicates introduced in #23258 to check WholeStageCodegen metrics in a more reasonable way.

@srowen @mgaido91 Would you please take a look at this if you have time?

SparkQA · 2018-12-31T19:53:41Z

Test build #100596 has finished for PR 23224 at commit fd7c63b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-12-31T23:57:17Z

LGTM

mgaido91

One question, otherwise LGTM

mgaido91 · 2019-01-01T16:59:39Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala

   */
  protected def testSparkPlanMetrics(
      df: DataFrame,
      expectedNumOfJobs: Int,
-      expectedMetrics: Map[Long, (String, Map[String, Any])]): Unit = {
+      expectedMetrics: Map[Long, (String, Map[String, Any])],
+      enableWholeStage: Boolean = false): Unit = {


Is this needed? IIUC it is never set to true...

It's set to true in the new test that was introduced

no testSparkPlanMetricsWithPredicates is used there, not testSparkPlanMetrics. So for testSparkPlanMetrics it is never used.

Oh yes good point. Nothing calls this with a different value here. Yeah this should just pass false, not take a new arg?

I think this should not take any new value here and should pass nothing to testSparkPlanMetricsWithPredicates, as false is the default value there. So basically, no change in the testSparkPlanMetrics method is needed.

Yes, currently testSparkPlanMetrics is always used with whole-stage codegen disabled. I'd argue It's possible we want to use testSparkPlanMetrics when whole-stage codegen is enabled, just like testSparkPlanMetricsWithPredicates. Do you mean we should limit change scope as possible as we can? Or shall we do something for the future?

yes, we should do only changes which are strictly needed. If in the future we will need this, we will add the flag. But until then, we shouldn't add something which is not needed. Thanks.

I see. I have removed the unused flag from testSparkPlanMetrics in the new commit.

SparkQA · 2019-01-02T17:44:42Z

Test build #100642 has finished for PR 23224 at commit 00284b6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2019-01-02T20:54:03Z

LGTM, thanks

srowen · 2019-01-02T21:45:20Z

Merged to master

…ith whole-stage codegen enabled ## What changes were proposed in this pull request? In `org.apache.spark.sql.execution.metric.SQLMetricsSuite`, there's a test case named "WholeStageCodegen metrics". However, it is executed with whole-stage codegen disabled. This PR fixes this by enable whole-stage codegen for this test case. ## How was this patch tested? Tested locally using exiting test cases. Closes apache#23224 from seancxmao/codegen-metrics. Authored-by: seancxmao <seancxmao@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>

seancxmao changed the title ~~[MINOR][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled~~ [SPARK-26277][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled Dec 5, 2018

felixcheung reviewed Dec 7, 2018

View reviewed changes

viirya reviewed Dec 7, 2018

View reviewed changes

HyukjinKwon approved these changes Dec 10, 2018

View reviewed changes

seancxmao mentioned this pull request Dec 31, 2018

[SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metrics while Sort is missing #23258

Closed

seancxmao added 3 commits December 31, 2018 23:50

[MINOR][SQL][TEST] WholeStageCodegen metrics should be tested with wh…

59acb16

…ole-stage codegen enabled

check whole-stage codegen enabled

d410a9c

make test case more readable

fd7c63b

seancxmao force-pushed the codegen-metrics branch from 71e569b to fd7c63b Compare December 31, 2018 16:07

srowen approved these changes Dec 31, 2018

View reviewed changes

mgaido91 reviewed Jan 1, 2019

View reviewed changes

remove unused flag

00284b6

srowen approved these changes Jan 2, 2019

View reviewed changes

srowen closed this in d406548 Jan 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-26277][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled #23224

[SPARK-26277][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled #23224

seancxmao commented Dec 5, 2018

SparkQA commented Dec 5, 2018

heary-cao commented Dec 5, 2018

HyukjinKwon commented Dec 5, 2018

seancxmao commented Dec 5, 2018

SparkQA commented Dec 5, 2018

felixcheung left a comment

seancxmao commented Dec 7, 2018

viirya Dec 7, 2018

seancxmao Dec 8, 2018

SparkQA commented Dec 7, 2018

SparkQA commented Dec 8, 2018

heary-cao commented Dec 8, 2018

SparkQA commented Dec 8, 2018

felixcheung commented Dec 9, 2018

seancxmao commented Dec 31, 2018

SparkQA commented Dec 31, 2018

kiszk commented Dec 31, 2018

mgaido91 left a comment

mgaido91 Jan 1, 2019

srowen Jan 1, 2019

mgaido91 Jan 1, 2019

srowen Jan 1, 2019

mgaido91 Jan 1, 2019

seancxmao Jan 2, 2019

mgaido91 Jan 2, 2019

seancxmao Jan 2, 2019

SparkQA commented Jan 2, 2019

mgaido91 commented Jan 2, 2019

srowen commented Jan 2, 2019

[SPARK-26277][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled #23224

[SPARK-26277][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled #23224

Conversation

seancxmao commented Dec 5, 2018

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Dec 5, 2018

heary-cao commented Dec 5, 2018

HyukjinKwon commented Dec 5, 2018

seancxmao commented Dec 5, 2018

SparkQA commented Dec 5, 2018

felixcheung left a comment

Choose a reason for hiding this comment

seancxmao commented Dec 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Dec 7, 2018

SparkQA commented Dec 8, 2018

heary-cao commented Dec 8, 2018

SparkQA commented Dec 8, 2018

felixcheung commented Dec 9, 2018

seancxmao commented Dec 31, 2018

SparkQA commented Dec 31, 2018

kiszk commented Dec 31, 2018

mgaido91 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 2, 2019

mgaido91 commented Jan 2, 2019

srowen commented Jan 2, 2019