[SPARK-39915][SQL][3.3] Dataset.repartition(N) may not create N partitions Non-AQE part #37730

ulysses-you · 2022-08-30T14:07:30Z

What changes were proposed in this pull request?

backport #37706 for branch-3.3

Skip optimize the root user-specified repartition in PropagateEmptyRelation.

Why are the changes needed?

Spark should preserve the final repatition which can affect the final output partition which is user-specified.

For example:

spark.sql("select * from values(1) where 1 < rand()").repartition(1)

// before:
== Optimized Logical Plan ==
LocalTableScan <empty>, [col1#0]

// after:
== Optimized Logical Plan ==
Repartition 1, true
+- LocalRelation <empty>, [col1#0]

Does this PR introduce any user-facing change?

yes, the empty plan may change

How was this patch tested?

add test

… Non-AQE part Skip optimize the root user-specified repartition in `PropagateEmptyRelation`. Spark should preserve the final repatition which can affect the final output partition which is user-specified. For example: ```scala spark.sql("select * from values(1) where 1 < rand()").repartition(1) // before: == Optimized Logical Plan == LocalTableScan <empty>, [col1#0] // after: == Optimized Logical Plan == Repartition 1, true +- LocalRelation <empty>, [col1#0] ``` yes, the empty plan may change add test Closes apache#37706 from ulysses-you/empty. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

ulysses-you · 2022-08-30T14:08:04Z

cc @cloud-fan

dongjoon-hyun

+1, LGTM. Merged to branch-3.3.
Thank you, @ulysses-you and @cloud-fan .

…tions Non-AQE part ### What changes were proposed in this pull request? backport #37706 for branch-3.3 Skip optimize the root user-specified repartition in `PropagateEmptyRelation`. ### Why are the changes needed? Spark should preserve the final repatition which can affect the final output partition which is user-specified. For example: ```scala spark.sql("select * from values(1) where 1 < rand()").repartition(1) // before: == Optimized Logical Plan == LocalTableScan <empty>, [col1#0] // after: == Optimized Logical Plan == Repartition 1, true +- LocalRelation <empty>, [col1#0] ``` ### Does this PR introduce _any_ user-facing change? yes, the empty plan may change ### How was this patch tested? add test Closes #37730 from ulysses-you/empty-3.3. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

ulysses-you · 2022-09-13T05:51:30Z

thank you @dongjoon-hyun

ulysses-you mentioned this pull request Aug 30, 2022

[SPARK-39915][SQL] Dataset.repartition(N) may not create N partitions Non-AQE part #37706

Closed

github-actions bot added the SQL label Aug 30, 2022

cloud-fan approved these changes Aug 30, 2022

View reviewed changes

dongjoon-hyun approved these changes Sep 9, 2022

View reviewed changes

dongjoon-hyun closed this Sep 9, 2022

ulysses-you deleted the empty-3.3 branch September 13, 2022 05:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-39915][SQL][3.3] Dataset.repartition(N) may not create N partitions Non-AQE part #37730

[SPARK-39915][SQL][3.3] Dataset.repartition(N) may not create N partitions Non-AQE part #37730

ulysses-you commented Aug 30, 2022 •

edited by dongjoon-hyun

Loading

ulysses-you commented Aug 30, 2022

dongjoon-hyun left a comment

ulysses-you commented Sep 13, 2022

[SPARK-39915][SQL][3.3] Dataset.repartition(N) may not create N partitions Non-AQE part #37730

[SPARK-39915][SQL][3.3] Dataset.repartition(N) may not create N partitions Non-AQE part #37730

Conversation

ulysses-you commented Aug 30, 2022 • edited by dongjoon-hyun Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

ulysses-you commented Aug 30, 2022

dongjoon-hyun left a comment

Choose a reason for hiding this comment

ulysses-you commented Sep 13, 2022

ulysses-you commented Aug 30, 2022 •

edited by dongjoon-hyun

Loading