-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27065][CORE] avoid more than one active task set managers for a stage #23927
Changes from 3 commits
424a3c8
53c6ed8
f94809d
0ca733d
07d7de9
58f646e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -212,14 +212,20 @@ private[spark] class TaskSchedulerImpl( | |
val stage = taskSet.stageId | ||
val stageTaskSets = | ||
taskSetsByStageIdAndAttempt.getOrElseUpdate(stage, new HashMap[Int, TaskSetManager]) | ||
stageTaskSets(taskSet.stageAttemptId) = manager | ||
val conflictingTaskSet = stageTaskSets.exists { case (_, ts) => | ||
ts.taskSet != taskSet && !ts.isZombie | ||
} | ||
if (conflictingTaskSet) { | ||
throw new IllegalStateException(s"more than one active taskSet for stage $stage:" + | ||
s" ${stageTaskSets.toSeq.map{_._2.taskSet.id}.mkString(",")}") | ||
|
||
// Mark all the existing TaskSetManagers of this stage as zombie, as we are adding a new one. | ||
// This is necessary to handle a corner case. Let's say a stage has 10 partitions and has 2 | ||
// TaskSetManagers: TSM1(zombie) and TSM2(active). TSM1 has a running task for partition 10 | ||
// and it completes. TSM2 finishes tasks for partition 1-19, and thinks he is still active | ||
// because partition 10 is not completed yet. However, DAGScheduler gets task completion | ||
// events for all the 10 partitions and thinks the stage is finished. If it's a shuffle stage | ||
// and somehow it has missing map outputs, then DAGScheduler will resubmit it and create a | ||
// TSM3 for it. As a stage can't have more than one active task set managers, we must mark | ||
// TSM2 as zombie (it actually is). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If TSM3 is created just after TSM2 finished partition 10, so, how does TSM3 know about the finished partition 10? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This PR focuses on fixing the potential occurrence of https://issues.apache.org/jira/browse/SPARK-25250 remains unfixed and will be addressed in #22806 or #23871 . Note that, SPARK-23433 can crush the cluster, even #22806 or #23871 can fix it as well, we need a simple fix and backport to 2.3/2.4. SPARK-25250 is just a matter of wasting resource, we can keep the fix in master only. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's make sense and the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep it makes sense to fix the issue that this PR addresses alongwith the other PR's for SPARK-25250. |
||
stageTaskSets.foreach { case (_, ts) => | ||
ts.isZombie = true | ||
} | ||
stageTaskSets(taskSet.stageAttemptId) = manager | ||
schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties) | ||
|
||
if (!isLocal && !hasReceivedTask) { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -123,7 +123,7 @@ private[spark] class TaskSetManager( | |
// state until all tasks have finished running; we keep TaskSetManagers that are in the zombie | ||
// state in order to continue to track and account for the running tasks. | ||
// TODO: We should kill any running task attempts when the task set manager becomes a zombie. | ||
private[scheduler] var isZombie = false | ||
@volatile private[scheduler] var isZombie = false | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is necessary. You still only touch There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah good catch! |
||
|
||
// Whether the taskSet run tasks from a barrier stage. Spark must launch all the tasks at the | ||
// same time for a barrier stage. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -201,30 +201,10 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with B | |
// Even if one of the task sets has not-serializable tasks, the other task set should | ||
// still be processed without error | ||
taskScheduler.submitTasks(FakeTask.createTaskSet(1)) | ||
taskScheduler.submitTasks(taskSet) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we can't have 2 active task set managers at the same time. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we shall just give it another stageId ? |
||
taskDescriptions = taskScheduler.resourceOffers(multiCoreWorkerOffers).flatten | ||
assert(taskDescriptions.map(_.executorId) === Seq("executor0")) | ||
} | ||
|
||
test("refuse to schedule concurrent attempts for the same stage (SPARK-8103)") { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this part of code is reverted in this PR, so remove the test as well There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is fine, but do we also want to add a test case to ensure the new behavior will not break ? |
||
val taskScheduler = setupScheduler() | ||
val attempt1 = FakeTask.createTaskSet(1, 0) | ||
val attempt2 = FakeTask.createTaskSet(1, 1) | ||
taskScheduler.submitTasks(attempt1) | ||
intercept[IllegalStateException] { taskScheduler.submitTasks(attempt2) } | ||
|
||
// OK to submit multiple if previous attempts are all zombie | ||
taskScheduler.taskSetManagerForAttempt(attempt1.stageId, attempt1.stageAttemptId) | ||
.get.isZombie = true | ||
taskScheduler.submitTasks(attempt2) | ||
val attempt3 = FakeTask.createTaskSet(1, 2) | ||
intercept[IllegalStateException] { taskScheduler.submitTasks(attempt3) } | ||
taskScheduler.taskSetManagerForAttempt(attempt2.stageId, attempt2.stageAttemptId) | ||
.get.isZombie = true | ||
taskScheduler.submitTasks(attempt3) | ||
assert(!failedTaskSet) | ||
} | ||
|
||
test("don't schedule more tasks after a taskset is zombie") { | ||
val taskScheduler = setupScheduler() | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: 1-9.