[CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure #5494

ncounterspecialist · 2015-04-13T16:44:03Z

Fixed null check when all the dependent stages are cancelled due to previous stage failure. This happens when one of the executor node goes down and all the dependent stages are cancelled.

… cancelled due to previous stage failure

srowen · 2015-04-13T17:01:01Z

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

-    } else {
-      // this stage will be assigned to "default" pool
-      null
+    val activeJob = jobIdToActiveJob.get(stage.jobId).getOrElse(null)


Are you suggesting that in some cases jobId != stage.jobId and that's the error?
Then just val properties = jobIdToActiveJob.get(stage.jobId).map(_.properties).getOrElse(null)?

Yeah this one liner should work. I can create a new pull request with this change if fix seems fine to you.

Just push a change to this same branch that overwrites with the new change. Then it can be tested.

Commited new change.

… cancelled due to previous stage failure

srowen · 2015-04-14T14:48:31Z

ok to test

SparkQA · 2015-04-14T16:22:28Z

Test build #30246 has finished for PR 5494 at commit 55ba5e3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

andrewor14 · 2015-04-14T19:05:55Z

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

-      // this stage will be assigned to "default" pool
-      null
-    }
+    val properties = jobIdToActiveJob.get(stage.jobId).map(_.properties).getOrElse(null)    


could be orNull, not a big deal at all

andrewor14 · 2015-04-14T19:06:04Z

Thanks I'm merging this into master.

… cancelled due to previous stage failure Fixed null check when all the dependent stages are cancelled due to previous stage failure. This happens when one of the executor node goes down and all the dependent stages are cancelled. Author: pankaj arora <pankaj.arora@guavus.com> Closes apache#5494 from pankajarora12/NEWBRANCH and squashes the following commits: 55ba5e3 [pankaj arora] [CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure 4575720 [pankaj arora] [CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure

Author: Mark Hamstra Apache Spark master PR: apache#6291 This issue was addressed in apache#5494, but the fix in that PR, while safe in the sense that it will prevent the SparkContext from shutting down, misses the actual bug. The intent of submitMissingTasks should be understood as "submit the Tasks that are missing for the Stage, and run them as part of the ActiveJob identified by jobId". Because of a long-standing bug, the jobId parameter was never being used. Instead, we were trying to use the jobId with which the Stage was created -- which may no longer exist as an ActiveJob, hence the crash reported in SPARK-6880. The correct fix is to use the ActiveJob specified by the supplied jobId parameter, which is guaranteed to exist at the call sites of submitMissingTasks. This fix should be applied to all maintenance branches, since it has existed since 1.0.

…ted with a Stage This issue was addressed in #5494, but the fix in that PR, while safe in the sense that it will prevent the SparkContext from shutting down, misses the actual bug. The intent of `submitMissingTasks` should be understood as "submit the Tasks that are missing for the Stage, and run them as part of the ActiveJob identified by jobId". Because of a long-standing bug, the `jobId` parameter was never being used. Instead, we were trying to use the jobId with which the Stage was created -- which may no longer exist as an ActiveJob, hence the crash reported in SPARK-6880. The correct fix is to use the ActiveJob specified by the supplied jobId parameter, which is guaranteed to exist at the call sites of submitMissingTasks. This fix should be applied to all maintenance branches, since it has existed since 1.0. kayousterhout pankajarora12 Author: Mark Hamstra <markhamstra@gmail.com> Author: Imran Rashid <irashid@cloudera.com> Closes #6291 from markhamstra/SPARK-6880.

…ted with a Stage This issue was addressed in #5494, but the fix in that PR, while safe in the sense that it will prevent the SparkContext from shutting down, misses the actual bug. The intent of `submitMissingTasks` should be understood as "submit the Tasks that are missing for the Stage, and run them as part of the ActiveJob identified by jobId". Because of a long-standing bug, the `jobId` parameter was never being used. Instead, we were trying to use the jobId with which the Stage was created -- which may no longer exist as an ActiveJob, hence the crash reported in SPARK-6880. The correct fix is to use the ActiveJob specified by the supplied jobId parameter, which is guaranteed to exist at the call sites of submitMissingTasks. This fix should be applied to all maintenance branches, since it has existed since 1.0. kayousterhout pankajarora12 Author: Mark Hamstra <markhamstra@gmail.com> Author: Imran Rashid <irashid@cloudera.com> Closes #6291 from markhamstra/SPARK-6880. (cherry picked from commit 0a5aef7) Signed-off-by: Imran Rashid <irashid@cloudera.com>

…ted with a Stage This issue was addressed in apache/spark#5494, but the fix in that PR, while safe in the sense that it will prevent the SparkContext from shutting down, misses the actual bug. The intent of `submitMissingTasks` should be understood as "submit the Tasks that are missing for the Stage, and run them as part of the ActiveJob identified by jobId". Because of a long-standing bug, the `jobId` parameter was never being used. Instead, we were trying to use the jobId with which the Stage was created -- which may no longer exist as an ActiveJob, hence the crash reported in SPARK-6880. The correct fix is to use the ActiveJob specified by the supplied jobId parameter, which is guaranteed to exist at the call sites of submitMissingTasks. This fix should be applied to all maintenance branches, since it has existed since 1.0. kayousterhout pankajarora12 Author: Mark Hamstra <markhamstra@gmail.com> Author: Imran Rashid <irashid@cloudera.com> Closes #6291 from markhamstra/SPARK-6880.

[CORE] SPARK-6880: Fixed null check when all the dependent stages are…

4575720

… cancelled due to previous stage failure

srowen reviewed Apr 13, 2015
View reviewed changes

[CORE] SPARK-6880: Fixed null check when all the dependent stages are…

55ba5e3

… cancelled due to previous stage failure

andrewor14 reviewed Apr 14, 2015
View reviewed changes

asfgit closed this in dcf8a9f Apr 14, 2015

markhamstra mentioned this pull request May 20, 2015

[SPARK-10666][SPARK-6880][CORE] Use properties from ActiveJob associated with a Stage #6291

Closed

mbautin mentioned this pull request May 21, 2015

[CORE] SPARK-6880: Fixed null check when all the dependent stages are… alteryx/spark#56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure #5494

[CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure #5494

ncounterspecialist commented Apr 13, 2015

srowen Apr 13, 2015

ncounterspecialist Apr 13, 2015

srowen Apr 14, 2015

ncounterspecialist Apr 14, 2015

srowen commented Apr 14, 2015

SparkQA commented Apr 14, 2015

andrewor14 Apr 14, 2015

andrewor14 commented Apr 14, 2015

[CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure #5494

[CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure #5494

Conversation

ncounterspecialist commented Apr 13, 2015

srowen Apr 13, 2015

Choose a reason for hiding this comment

ncounterspecialist Apr 13, 2015

Choose a reason for hiding this comment

srowen Apr 14, 2015

Choose a reason for hiding this comment

ncounterspecialist Apr 14, 2015

Choose a reason for hiding this comment

srowen commented Apr 14, 2015

SparkQA commented Apr 14, 2015

andrewor14 Apr 14, 2015

Choose a reason for hiding this comment

andrewor14 commented Apr 14, 2015