[SPARK-47764][CORE][SQL] Cleanup shuffle dependencies based on ShuffleCleanupMode #45930

bozhang2820 · 2024-04-08T13:10:31Z

What changes were proposed in this pull request?

This change adds a new trait, ShuffleCleanupMode under QueryExecution, and two new configs, spark.sql.shuffleDependency.skipMigration.enabled and spark.sql.shuffleDependency.fileCleanup.enabled.

For Spark Connect query executions, ShuffleCleanupMode is controlled by the two new configs, and shuffle dependency cleanup are performed accordingly.

When spark.sql.shuffleDependency.fileCleanup.enabled is true, shuffle dependency files will be cleaned up at the end of query executions.

When spark.sql.shuffleDependency.skipMigration.enabled is true, shuffle dependencies will be skipped at the shuffle data migration for node decommissions.

Why are the changes needed?

This is to: 1. speed up shuffle data migration at decommissions and 2. possibly (when file cleanup mode is enabled) release disk space occupied by unused shuffle files.

Does this PR introduce any user-facing change?

Yes. This change adds two new configs, spark.sql.shuffleDependency.skipMigration.enabled and spark.sql.shuffleDependency.fileCleanup.enabled to control the cleanup behaviors.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No

cloud-fan · 2024-04-15T06:03:47Z

core/src/main/scala/org/apache/spark/shuffle/MigratableResolver.scala

+  /**
+   * Mark a shuffle that should not be migrated.
+   */
+  def addShuffleToSkip(shuffleId: Int): Unit


let's add a default implememtation

cloud-fan · 2024-04-15T06:04:27Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+
+  val SHUFFLE_DEPENDENCY_FILE_CLEANUP_ENABLED =
+    buildConf("spark.sql.shuffleDependency.fileCleanup.enabled")
+      .doc("When enabled, shuffle dependency files will be cleaned up at the end of SQL " +


Suggested change

.doc("When enabled, shuffle dependency files will be cleaned up at the end of SQL " +

.doc("When enabled, shuffle files will be cleaned up at the end of SQL " +

cloud-fan · 2024-04-15T06:07:03Z

sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala

@@ -108,7 +108,8 @@ class CacheManager extends Logging with AdaptiveSparkPlanHelper {
    } else {
      val sessionWithConfigsOff = getOrCloneSessionWithConfigsOff(spark)
      val inMemoryRelation = sessionWithConfigsOff.withActive {
-        val qe = sessionWithConfigsOff.sessionState.executePlan(planToCache)
+        val qe = sessionWithConfigsOff.sessionState.executePlan(
+          planToCache, shuffleCleanupMode = DoNotCleanup)


isn't this the default?

Tried to be explicit here. Removed the unnecessary argument.

cloud-fan · 2024-04-15T06:10:00Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

+      logicalPlan: LogicalPlan,
+      shuffleCleanupMode: ShuffleCleanupMode): DataFrame =
+    sparkSession.withActive {
+      val qe = sparkSession.sessionState.executePlan(


can we new QueryExecution here? Then we don't need to touch session state builder

Good idea. Done.

cloud-fan · 2024-04-15T06:14:49Z

core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala

        Some(ShuffleBlockInfo(shuffleId, mapId))
      case _ =>
        None
    }
  }

+  private val shuffleIdsToSkip = Collections.newSetFromMap[Int](new ConcurrentHashMap)


What's the life cycle of it?

Updated to remove from this Set when the shuffle is unregistered.

cloud-fan · 2024-04-18T02:27:51Z

core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala

@@ -187,6 +187,7 @@ private[spark] class SortShuffleManager(conf: SparkConf) extends ShuffleManager
        shuffleBlockResolver.removeDataByMap(shuffleId, mapTaskId)
      }
    }
+    shuffleBlockResolver.removeShuffleToSkip(shuffleId)


this is a weird place to do cleanup. Shall we cover all shuffle manager implementations? Shall we do it in the caller of this unregisterShuffle function?

Yeah this is a bit weird... Changed to use a Guava cache with a fixed maximum size (1000) instead, so that we do not need to do cleanups for shufflesToSkip.

cloud-fan · 2024-04-23T06:03:19Z

core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala

        Some(ShuffleBlockInfo(shuffleId, mapId))
      case _ =>
        None
    }
  }

+  private val shuffleIdsToSkip =
+    CacheBuilder.newBuilder().maximumSize(1000).build[java.lang.Integer, java.lang.Boolean]()


if the value does not matter, shall we just use Object type and always pass null?

Unfortunately Guava cache won't accept null values...

cloud-fan · 2024-04-23T06:05:23Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

@@ -869,6 +874,8 @@ case class AdaptiveExecutionContext(session: SparkSession, qe: QueryExecution) {
   */
  val stageCache: TrieMap[SparkPlan, ExchangeQueryStageExec] =
    new TrieMap[SparkPlan, ExchangeQueryStageExec]()
+
+  val shuffleIds: TrieMap[Int, Boolean] = new TrieMap[Int, Boolean]()


what does the value mean? BTW, stageCache uses TrieMap because the key is SparkPlan. For int key, I think normal hash map works fine

I think a concurrent hash map is still required since the context are shared between the main query and all sub queries?

yea, concurrent hash map with int key should be good here.

cloud-fan · 2024-04-24T08:13:40Z

thanks, merging to master!

ulysses-you · 2024-04-30T06:52:57Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala

+            shuffleIds.foreach { shuffleId =>
+              queryExecution.shuffleCleanupMode match {
+                case RemoveShuffleFiles =>
+                  SparkEnv.get.shuffleManager.unregisterShuffle(shuffleId)


Shall we call shuffleDriverComponents.removeShuffle ? We are at driver side, shuffleManager.unregisterShuffle would do nothing in non-local mode.

Thanks for catching this! Will fix this in a follow-up asap.

Created #46302.

ulysses-you · 2024-04-30T07:06:38Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala

@@ -161,6 +165,24 @@ object SQLExecution extends Logging {
            case e =>
              Utils.exceptionString(e)
          }
+          if (queryExecution.shuffleCleanupMode != DoNotCleanup
+            && isExecutedPlanAvailable) {
+            val shuffleIds = queryExecution.executedPlan match {


It seems the root node can be a command. Shall we collect all the AdaptiveSparkPlanExec inside the plan ?

Oh this is a good catch! I think we should. cc @bozhang2820

I could be wrong but I thought DataFrames for commands are created in SparkConnectPlanner, and the ones for queries are only created in SparkConnectPlanExecution?

Ideally we should clean up shuffles for CTAS and INSERT as well, as they also run queries.

…eCleanupMode ### What changes were proposed in this pull request? This change adds a new trait, `ShuffleCleanupMode` under `QueryExecution`, and two new configs, `spark.sql.shuffleDependency.skipMigration.enabled` and `spark.sql.shuffleDependency.fileCleanup.enabled`. For Spark Connect query executions, `ShuffleCleanupMode` is controlled by the two new configs, and shuffle dependency cleanup are performed accordingly. When `spark.sql.shuffleDependency.fileCleanup.enabled` is `true`, shuffle dependency files will be cleaned up at the end of query executions. When `spark.sql.shuffleDependency.skipMigration.enabled` is `true`, shuffle dependencies will be skipped at the shuffle data migration for node decommissions. ### Why are the changes needed? This is to: 1. speed up shuffle data migration at decommissions and 2. possibly (when file cleanup mode is enabled) release disk space occupied by unused shuffle files. ### Does this PR introduce _any_ user-facing change? Yes. This change adds two new configs, `spark.sql.shuffleDependency.skipMigration.enabled` and `spark.sql.shuffleDependency.fileCleanup.enabled` to control the cleanup behaviors. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#45930 from bozhang2820/spark-47764. Authored-by: Bo Zhang <bo.zhang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…Shuffle to remove shuffle properly ### What changes were proposed in this pull request? This is a follow-up for #45930, where we introduced ShuffleCleanupMode and implemented cleaning up of shuffle dependencies. There was a bug where `ShuffleManager.unregisterShuffle` was used on Driver, and in non-local mode it is not effective at all. This change fixed the bug by changing to use `ShuffleDriverComponents.removeShuffle` instead. ### Why are the changes needed? This is to address the comments in #45930 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Updated unit tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #46302 from bozhang2820/spark-47764-1. Authored-by: Bo Zhang <bo.zhang@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…Shuffle to remove shuffle properly ### What changes were proposed in this pull request? This is a follow-up for apache#45930, where we introduced ShuffleCleanupMode and implemented cleaning up of shuffle dependencies. There was a bug where `ShuffleManager.unregisterShuffle` was used on Driver, and in non-local mode it is not effective at all. This change fixed the bug by changing to use `ShuffleDriverComponents.removeShuffle` instead. ### Why are the changes needed? This is to address the comments in apache#45930 (comment) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Updated unit tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46302 from bozhang2820/spark-47764-1. Authored-by: Bo Zhang <bo.zhang@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

Cleanup shuffle dependencies based on ShuffleCleanupMode

2cc5a8b

github-actions bot added SQL CORE CONNECT labels Apr 8, 2024

Fix mima

2feb4c2

github-actions bot added the BUILD label Apr 9, 2024

cloud-fan reviewed Apr 15, 2024

View reviewed changes

bozhang2820 added 2 commits April 17, 2024 22:29

Address comments

e30ffa8

Remove unnecessary changes

50134ed

cloud-fan reviewed Apr 18, 2024

View reviewed changes

bozhang2820 added 3 commits April 22, 2024 22:15

Change to use Guava cache to store shuffle IDs to skip

e803026

Add unit tests

f6ab96f

Merge remote-tracking branch 'apache/master' into spark-47764

d0f33d6

cloud-fan reviewed Apr 23, 2024

View reviewed changes

cloud-fan approved these changes Apr 23, 2024

View reviewed changes

bozhang2820 added 2 commits April 23, 2024 17:36

Cleanup shuffles after each case

42d172e

Use ConcurrentHashMap

b6283cf

cloud-fan closed this in c44493d Apr 24, 2024

ulysses-you reviewed Apr 30, 2024

View reviewed changes

bozhang2820 mentioned this pull request Apr 30, 2024

[SPARK-47764][FOLLOW-UP] Change to use ShuffleDriverComponents.removeShuffle to remove shuffle properly #46302

Closed

abellina mentioned this pull request Jul 15, 2024

[SPARK-48861][SQL] Enable shuffle file removal/skipMigration for all SQL executions #47360

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-47764][CORE][SQL] Cleanup shuffle dependencies based on ShuffleCleanupMode #45930

[SPARK-47764][CORE][SQL] Cleanup shuffle dependencies based on ShuffleCleanupMode #45930

bozhang2820 commented Apr 8, 2024

cloud-fan Apr 15, 2024

bozhang2820 Apr 17, 2024

cloud-fan Apr 15, 2024

bozhang2820 Apr 17, 2024

cloud-fan Apr 15, 2024

bozhang2820 Apr 17, 2024

cloud-fan Apr 15, 2024

bozhang2820 Apr 17, 2024

cloud-fan Apr 15, 2024

bozhang2820 Apr 17, 2024

cloud-fan Apr 18, 2024

bozhang2820 Apr 23, 2024 •

edited

Loading

cloud-fan Apr 23, 2024

bozhang2820 Apr 23, 2024

cloud-fan Apr 23, 2024

bozhang2820 Apr 23, 2024

cloud-fan Apr 23, 2024

bozhang2820 Apr 24, 2024

cloud-fan commented Apr 24, 2024

ulysses-you Apr 30, 2024

bozhang2820 Apr 30, 2024

bozhang2820 Apr 30, 2024

ulysses-you Apr 30, 2024

cloud-fan Apr 30, 2024

bozhang2820 Apr 30, 2024

cloud-fan Apr 30, 2024 •

edited

Loading

	.doc("When enabled, shuffle dependency files will be cleaned up at the end of SQL " +
	.doc("When enabled, shuffle files will be cleaned up at the end of SQL " +

[SPARK-47764][CORE][SQL] Cleanup shuffle dependencies based on ShuffleCleanupMode #45930

[SPARK-47764][CORE][SQL] Cleanup shuffle dependencies based on ShuffleCleanupMode #45930

Conversation

bozhang2820 commented Apr 8, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bozhang2820 Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Apr 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan Apr 30, 2024 • edited Loading

Choose a reason for hiding this comment

bozhang2820 Apr 23, 2024 •

edited

Loading

cloud-fan Apr 30, 2024 •

edited

Loading