[SPARK-3349] [SQL] Output partitioning of limit should not be inherited from child #2262

ericl · 2014-09-04T01:52:26Z

This resolves https://issues.apache.org/jira/browse/SPARK-3349

AmplabJenkins · 2014-09-04T01:57:09Z

Can one of the admins verify this patch?

marmbrus · 2014-09-04T02:09:36Z

add to whitelist

SparkQA · 2014-09-04T02:14:43Z

QA tests have started for PR 2262 at commit ac32723.

This patch merges cleanly.

SparkQA · 2014-09-04T04:01:36Z

QA tests have finished for PR 2262 at commit ac32723.

This patch passes unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class BlockManagerMaster(
- class AttributeMap[A](baseMap: Map[ExprId, (Attribute, A)])

chenghao-intel · 2014-09-04T06:00:25Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala

@@ -97,6 +97,7 @@ case class Limit(limit: Int, child: SparkPlan)
  // partition local limit -> exchange into one partition -> partition local limit again

  override def output = child.output
+  override def outputPartitioning = SinglePartition


SinglePartition for LIMIT may cause performance issue for large number of records(in multiple partitions), do we really need to change this?

This is not changing the implementation, just correcting a bug that
prevents exchange operators from being inserted when we need them.
On Sep 3, 2014 11:00 PM, "Cheng Hao" notifications@github.com wrote:

In
sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala:

@@ -97,6 +97,7 @@ case class Limit(limit: Int, child: SparkPlan)
// partition local limit -> exchange into one partition -> partition local limit again

override def output = child.output

override def outputPartitioning = SinglePartition

SinglePartition for LIMIT may cause performance issue for large number of
records(in multiple partitions), do we really need to change this?

—
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/2262/files#r17096863.

OK, understood, thanks for explanation.

ericl · 2014-09-04T18:53:43Z

added regression test

marmbrus · 2014-09-05T02:07:02Z

test this please

rxin · 2014-09-06T06:04:05Z

Jenkins, test this please.

SparkQA · 2014-09-06T06:43:34Z

QA tests have started for PR 2262 at commit 3e1b05c.

This patch merges cleanly.

SparkQA · 2014-09-06T08:25:02Z

QA tests have finished for PR 2262 at commit 3e1b05c.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2014-09-08T22:49:44Z

Jenkins, test this please

marmbrus · 2014-09-08T23:14:02Z

This passed tests before and I ran the new regression test by hand. I'm going to merge this into master.

Thanks Eric!

**This PR introduces a subtle change in semantics for HiveContext when using the results in Python or Scala. Specifically, while resolution remains case insensitive, it is now case preserving.** _This PR is a follow up to #2293 (and to a lesser extent #2262 #2334)._ In #2293 the catalog was changed to store analyzed logical plans instead of unresolved ones. While this change fixed the reported bug (which was caused by yet another instance of us forgetting to put in a `LowerCaseSchema` operator) it had the consequence of breaking assumptions made by `MultiInstanceRelation`. Specifically, we can't replace swap out leaf operators in a tree without rewriting changed expression ids (which happens when you self join the same RDD that has been registered as a temp table). In this PR, I instead remove the need to insert `LowerCaseSchema` operators at all, by moving the concern of matching up identifiers completely into analysis. Doing so allows the test cases from both #2293 and #2262 to pass at the same time (and likely fixes a slew of other "unknown unknown" bugs). While it is rolled back in this PR, storing the analyzed plan might actually be a good idea. For instance, it is kind of confusing if you register a temporary table, change the case sensitivity of resolution and now you can't query that table anymore. This can be addressed in a follow up PR. Follow-ups: - Configurable case sensitivity - Consider storing analyzed plans for temp tables Author: Michael Armbrust <michael@databricks.com> Closes #2382 from marmbrus/lowercase and squashes the following commits: c21171e [Michael Armbrust] Ensure the resolver is used for field lookups and ensure that case insensitive resolution is still case preserving. d4320f1 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into lowercase 2de881e [Michael Armbrust] Address comments. 219805a [Michael Armbrust] style 5b93711 [Michael Armbrust] Replace LowerCaseSchema with Resolver.

make limit/takeOrdered output SinglePartition

ac32723

ericl changed the title ~~[SPARK-3349] Output partitioning of limit should not be inherited from child~~ [SPARK-3349] [SQL] Output partitioning of limit should not be inherited from child Sep 4, 2014

chenghao-intel reviewed Sep 4, 2014
View reviewed changes

add regression test

3e1b05c

asfgit closed this in 7db5339 Sep 8, 2014

This was referenced Sep 10, 2014

[SPARK-3455] [SQL] **HOT FIX** Fix the unit test failure #2334

Closed

[SPARK-3414][SQL] Replace LowerCaseSchema with Resolver #2382

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-3349] [SQL] Output partitioning of limit should not be inherited from child #2262

[SPARK-3349] [SQL] Output partitioning of limit should not be inherited from child #2262

ericl commented Sep 4, 2014

AmplabJenkins commented Sep 4, 2014

marmbrus commented Sep 4, 2014

SparkQA commented Sep 4, 2014

SparkQA commented Sep 4, 2014

chenghao-intel Sep 4, 2014

marmbrus Sep 4, 2014

chenghao-intel Sep 4, 2014

ericl commented Sep 4, 2014

marmbrus commented Sep 5, 2014

rxin commented Sep 6, 2014

SparkQA commented Sep 6, 2014

SparkQA commented Sep 6, 2014

marmbrus commented Sep 8, 2014

marmbrus commented Sep 8, 2014

[SPARK-3349] [SQL] Output partitioning of limit should not be inherited from child #2262

[SPARK-3349] [SQL] Output partitioning of limit should not be inherited from child #2262

Conversation

ericl commented Sep 4, 2014

AmplabJenkins commented Sep 4, 2014

marmbrus commented Sep 4, 2014

SparkQA commented Sep 4, 2014

SparkQA commented Sep 4, 2014

chenghao-intel Sep 4, 2014

Choose a reason for hiding this comment

marmbrus Sep 4, 2014

Choose a reason for hiding this comment

chenghao-intel Sep 4, 2014

Choose a reason for hiding this comment

ericl commented Sep 4, 2014

marmbrus commented Sep 5, 2014

rxin commented Sep 6, 2014

SparkQA commented Sep 6, 2014

SparkQA commented Sep 6, 2014

marmbrus commented Sep 8, 2014

marmbrus commented Sep 8, 2014