Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17289][SQL] Fix a bug to satisfy sort requirements in partial aggregations #14865

Closed
wants to merge 1 commit into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Aug 29, 2016

What changes were proposed in this pull request?

Partial aggregations are generated in EnsureRequirements, but the planner fails to
check if partial aggregation satisfies sort requirements.
For the following query:

val df2 = (0 to 1000).map(x => (x % 2, x.toString)).toDF("a", "b").createOrReplaceTempView("t2")
spark.sql("select max(b) from t2 group by a").explain(true)

Now, the SortAggregator won't insert Sort operator before partial aggregation, this will break sort-based partial aggregation.

== Physical Plan ==
SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17])
+- *Sort [a#5 ASC], false, 0
   +- Exchange hashpartitioning(a#5, 200)
      +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19])
         +- LocalTableScan [a#5, b#6]

Actually, a correct plan is:

== Physical Plan ==
SortAggregate(key=[a#5], functions=[max(b#6)], output=[max(b)#17])
+- *Sort [a#5 ASC], false, 0
   +- Exchange hashpartitioning(a#5, 200)
      +- SortAggregate(key=[a#5], functions=[partial_max(b#6)], output=[a#5, max#19])
         +- *Sort [a#5 ASC], false, 0
            +- LocalTableScan [a#5, b#6]

How was this patch tested?

Added tests in PlannerSuite.

@hvanhovell
Copy link
Contributor

cc @clockfly

@SparkQA
Copy link

SparkQA commented Aug 29, 2016

Test build #64587 has finished for PR 14865 at commit a0cc986.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

the fix LGTM

@liancheng
Copy link
Contributor

@maropu Discussed with @clockfly and @cloud-fan offline. @cloud-fan proposed a simpler alternative of #10896. Please refer to this comment for details.

This PR still LGTM and I'm merging it since master is broken right now. Thanks for fixing it!

@asfgit asfgit closed this in 94922d7 Aug 30, 2016
@maropu
Copy link
Member Author

maropu commented Aug 30, 2016

@liancheng okay, I'll check the discussion. Thanks!

@maropu maropu deleted the SPARK-17289 branch July 5, 2017 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants