Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14796][SQL] Add spark.sql.optimizer.inSetConversionThreshold config option. #12562

Closed
wants to merge 2 commits into from
Closed

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Apr 21, 2016

What changes were proposed in this pull request?

Currently, OptimizeIn optimizer replaces In expression into InSet expression if the size of set is greater than a constant, 10.
This issue aims to make a configuration spark.sql.optimizer.inSetConversionThreshold for that.

After this PR, OptimizerIn is configurable.

scala> sql("select a in (1,2,3) from (select explode(array(1,2)) a) T").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [a#7 IN (1,2,3) AS (a IN (1, 2, 3))#8]
:     +- INPUT
+- Generate explode([1,2]), false, false, [a#7]
   +- Scan OneRowRelation[]

scala> sqlContext.setConf("spark.sql.optimizer.inSetConversionThreshold", "2")

scala> sql("select a in (1,2,3) from (select explode(array(1,2)) a) T").explain()
== Physical Plan ==
WholeStageCodegen
:  +- Project [a#16 INSET (1,2,3) AS (a IN (1, 2, 3))#17]
:     +- INPUT
+- Generate explode([1,2]), false, false, [a#16]
   +- Scan OneRowRelation[]

How was this patch tested?

Pass the Jenkins tests (with a new testcase)

@rxin
Copy link
Contributor

rxin commented Apr 21, 2016

Can we add a unit test in the appropriate optimizer suite?

We also need to come up with a better name. optimizer.minSetSize is not very intuitive.

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @rxin .
I'll add a OptimizeInSuite.scala for this.
For the name, could you give some advice?

@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #56492 has finished for PR 12562 at commit 1ac5857.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class OptimizeIn(conf: CatalystConf) extends Rule[LogicalPlan]

@dongjoon-hyun
Copy link
Member Author

Oh, sorry. There exists already OptimizeInSuite.scala. I'll fix soon.

@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #56511 has finished for PR 12562 at commit 69306cc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #56514 has finished for PR 12562 at commit 75e1577.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #56550 has finished for PR 12562 at commit 9df35bb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class OptimizeIn(conf: CatalystConf) extends Rule[LogicalPlan]

@@ -128,4 +131,21 @@ class OptimizeInSuite extends PlanTest {
comparePlans(optimized, correctAnswer)
}

test("OptimizedIn test: Use configuration.") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd give this a more descriptive name, and explicitly say setting the threshold for turning into InSet

@rxin
Copy link
Contributor

rxin commented Apr 21, 2016

maybe

inSetConversionThreshold?

@@ -17,11 +17,14 @@

package org.apache.spark.sql.catalyst.optimizer

import scala.collection.immutable.HashSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you use this import?

@dongjoon-hyun
Copy link
Member Author

Thank you so much, @rxin and @marmbrus !
I will update soon like the following according to the comments.

  • spark.sql.optimizer.minSetSize -> spark.sql.optimizer.inSetConversionThreshold
  • SQLConf.optimizerMinSetSize, CatalystConf.optimizerMinSetSize -> optimizerInSetConversionThreshold
  • SQLConf.OPTIMIZER_MIN_SET_SIZE -> SQLConf.OPTIMIZER_INSET_CONVERSION_THRESHOLD (also the description)
  • and other comments.

By the way, @marmbrus d you mean the duplication of value 10 in SQLConf.scala and SimpleCatalystConf?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-14796][SQL] Add spark.sql.optimizer.minSetSize config option. [SPARK-14796][SQL] Add spark.sql.optimizer.inSetConversionThreshold config option. Apr 21, 2016
@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #56573 has finished for PR 12562 at commit c7a6d9b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Apr 22, 2016

Hi, @rxin and @marmbrus .
How do you think about the updated PR? It's just second update.
If there is something to do more, please let me know.
Thank you.

@rxin
Copy link
Contributor

rxin commented Apr 22, 2016

Merging in master. Thanks.

@asfgit asfgit closed this in 3647120 Apr 22, 2016
@dongjoon-hyun
Copy link
Member Author

Thank you, @rxin and @marmbrus .

@dongjoon-hyun dongjoon-hyun deleted the SPARK-14796 branch May 12, 2016 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants