-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-46957][CORE] Decommission migrated shuffle files should be able to cleanup from executor #47037
Conversation
@dongjoon-hyun @holdenk @mridulm @jiangxb1987 @bozhang2820 Could you help take a look? Thanks! |
@dongjoon-hyun Do you think it is the same issue for |
@attilapiros @LuciferYang @warrenzhu25 Anyone else want to take a look at this fix? |
I'm going to merge this PR by the end of day if there is no objection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
Thanks @Ngone51
…e to cleanup from executor ### What changes were proposed in this pull request? This PR uses `SortShuffleManager#taskIdMapsForShuffle` to track the migrated shuffle files on the destination executor. ### Why are the changes needed? This is a long-standing bug in decommission where the migrated shuffle files can't be cleaned up from the executor. Normally, the shuffle files are tracked by `taskIdMapsForShuffle` during the map task execution. Upon receiving the `RemoveShuffle(shuffleId)` request from driver, executor can clean up those shuffle files by searching `taskIdMapsForShuffle`. However, for the migrated shuffle files by decommission, they lose the track in the destination executor's `taskIdMapsForShuffle` and can't be deleted as a result. Note this bug only affects shuffle removal on the executor. For shuffle removal on the external shuffle service (when `spark.shuffle.service.removeShuffle` enabled and the executor stores the shuffle files has gone), we don't rely on `taskIdMapsForShuffle` but using the specific shuffle block id to locate the shuffle file directly. So it won't be an issue there. ### Does this PR introduce _any_ user-facing change? No. (Common users won't see the difference underlying.) ### How was this patch tested? Add unit test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47037 from Ngone51/SPARK-46957. Authored-by: Yi Wu <yi.wu@databricks.com> Signed-off-by: Yi Wu <yi.wu@databricks.com> (cherry picked from commit b5a55e4) Signed-off-by: Yi Wu <yi.wu@databricks.com>
…e to cleanup from executor ### What changes were proposed in this pull request? This PR uses `SortShuffleManager#taskIdMapsForShuffle` to track the migrated shuffle files on the destination executor. ### Why are the changes needed? This is a long-standing bug in decommission where the migrated shuffle files can't be cleaned up from the executor. Normally, the shuffle files are tracked by `taskIdMapsForShuffle` during the map task execution. Upon receiving the `RemoveShuffle(shuffleId)` request from driver, executor can clean up those shuffle files by searching `taskIdMapsForShuffle`. However, for the migrated shuffle files by decommission, they lose the track in the destination executor's `taskIdMapsForShuffle` and can't be deleted as a result. Note this bug only affects shuffle removal on the executor. For shuffle removal on the external shuffle service (when `spark.shuffle.service.removeShuffle` enabled and the executor stores the shuffle files has gone), we don't rely on `taskIdMapsForShuffle` but using the specific shuffle block id to locate the shuffle file directly. So it won't be an issue there. ### Does this PR introduce _any_ user-facing change? No. (Common users won't see the difference underlying.) ### How was this patch tested? Add unit test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47037 from Ngone51/SPARK-46957. Authored-by: Yi Wu <yi.wu@databricks.com> Signed-off-by: Yi Wu <yi.wu@databricks.com> (cherry picked from commit b5a55e4) Signed-off-by: Yi Wu <yi.wu@databricks.com>
Thanks, merged to master/3.5/3.4! |
There are compliaton failures against Java 8 in branch-3.5/3.4. I'll send a followup fix soon. |
…e to cleanup from executor ### What changes were proposed in this pull request? This PR uses `SortShuffleManager#taskIdMapsForShuffle` to track the migrated shuffle files on the destination executor. ### Why are the changes needed? This is a long-standing bug in decommission where the migrated shuffle files can't be cleaned up from the executor. Normally, the shuffle files are tracked by `taskIdMapsForShuffle` during the map task execution. Upon receiving the `RemoveShuffle(shuffleId)` request from driver, executor can clean up those shuffle files by searching `taskIdMapsForShuffle`. However, for the migrated shuffle files by decommission, they lose the track in the destination executor's `taskIdMapsForShuffle` and can't be deleted as a result. Note this bug only affects shuffle removal on the executor. For shuffle removal on the external shuffle service (when `spark.shuffle.service.removeShuffle` enabled and the executor stores the shuffle files has gone), we don't rely on `taskIdMapsForShuffle` but using the specific shuffle block id to locate the shuffle file directly. So it won't be an issue there. ### Does this PR introduce _any_ user-facing change? No. (Common users won't see the difference underlying.) ### How was this patch tested? Add unit test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47037 from Ngone51/SPARK-46957. Authored-by: Yi Wu <yi.wu@databricks.com> Signed-off-by: Yi Wu <yi.wu@databricks.com>
FYI, created a separate backport PR along with the compliation error fix: #47122 |
…e to cleanup from executor ### What changes were proposed in this pull request? This PR uses `SortShuffleManager#taskIdMapsForShuffle` to track the migrated shuffle files on the destination executor. ### Why are the changes needed? This is a long-standing bug in decommission where the migrated shuffle files can't be cleaned up from the executor. Normally, the shuffle files are tracked by `taskIdMapsForShuffle` during the map task execution. Upon receiving the `RemoveShuffle(shuffleId)` request from driver, executor can clean up those shuffle files by searching `taskIdMapsForShuffle`. However, for the migrated shuffle files by decommission, they lose the track in the destination executor's `taskIdMapsForShuffle` and can't be deleted as a result. Note this bug only affects shuffle removal on the executor. For shuffle removal on the external shuffle service (when `spark.shuffle.service.removeShuffle` enabled and the executor stores the shuffle files has gone), we don't rely on `taskIdMapsForShuffle` but using the specific shuffle block id to locate the shuffle file directly. So it won't be an issue there. ### Does this PR introduce _any_ user-facing change? No. (Common users won't see the difference underlying.) ### How was this patch tested? Add unit test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47037 from Ngone51/SPARK-46957. Authored-by: Yi Wu <yi.wu@databricks.com> Signed-off-by: Yi Wu <yi.wu@databricks.com>
.get | ||
|
||
val newShuffleFiles = shuffleFiles.diff(existingShuffleFiles) | ||
assert(newShuffleFiles.size >= shuffleBlockUpdates.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this seems to introduce a flakiness.
[info] - SPARK-46957: Migrated shuffle files should be able to cleanup from executor *** FAILED *** (36 seconds, 137 milliseconds)
[info] 0 was not greater than or equal to 6 (BlockManagerDecommissionIntegrationSuite.scala:423)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must be a race condition on JDK21 but I did not managed to reproduce it locally. Neither on arm64 nor on a x86_64 (using docker).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see a jira already exists for this: https://issues.apache.org/jira/browse/SPARK-49297
I am working on a possible fix.
What changes were proposed in this pull request?
This PR uses
SortShuffleManager#taskIdMapsForShuffle
to track the migrated shuffle files on the destination executor.Why are the changes needed?
This is a long-standing bug in decommission where the migrated shuffle files can't be cleaned up from the executor. Normally, the shuffle files are tracked by
taskIdMapsForShuffle
during the map task execution. Upon receiving theRemoveShuffle(shuffleId)
request from driver, executor can clean up those shuffle files by searchingtaskIdMapsForShuffle
. However, for the migrated shuffle files by decommission, they lose the track in the destination executor'staskIdMapsForShuffle
and can't be deleted as a result.Note this bug only affects shuffle removal on the executor. For shuffle removal on the external shuffle service (when
spark.shuffle.service.removeShuffle
enabled and the executor stores the shuffle files has gone), we don't rely ontaskIdMapsForShuffle
but using the specific shuffle block id to locate the shuffle file directly. So it won't be an issue there.Does this PR introduce any user-facing change?
No. (Common users won't see the difference underlying.)
How was this patch tested?
Add unit test.
Was this patch authored or co-authored using generative AI tooling?
No.