Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure #5494

Closed

Conversation

ncounterspecialist
Copy link
Contributor

Fixed null check when all the dependent stages are cancelled due to previous stage failure. This happens when one of the executor node goes down and all the dependent stages are cancelled.

} else {
// this stage will be assigned to "default" pool
null
val activeJob = jobIdToActiveJob.get(stage.jobId).getOrElse(null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that in some cases jobId != stage.jobId and that's the error?
Then just val properties = jobIdToActiveJob.get(stage.jobId).map(_.properties).getOrElse(null)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this one liner should work. I can create a new pull request with this change if fix seems fine to you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just push a change to this same branch that overwrites with the new change. Then it can be tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commited new change.

@srowen
Copy link
Member

srowen commented Apr 14, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30246 has finished for PR 5494 at commit 55ba5e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

// this stage will be assigned to "default" pool
null
}
val properties = jobIdToActiveJob.get(stage.jobId).map(_.properties).getOrElse(null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be orNull, not a big deal at all

@andrewor14
Copy link
Contributor

Thanks I'm merging this into master.

@asfgit asfgit closed this in dcf8a9f Apr 14, 2015
mbautin pushed a commit to mbautin/spark that referenced this pull request May 21, 2015
… cancelled due to previous stage failure

Fixed null check when all the dependent stages are cancelled due to previous stage failure. This happens when one of the executor node goes down and all the dependent stages are cancelled.

Author: pankaj arora <pankaj.arora@guavus.com>

Closes apache#5494 from pankajarora12/NEWBRANCH and squashes the following commits:

55ba5e3 [pankaj arora] [CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure
4575720 [pankaj arora] [CORE] SPARK-6880: Fixed null check when all the dependent stages are cancelled due to previous stage failure
mbautin pushed a commit to mbautin/spark that referenced this pull request May 21, 2015
Author: Mark Hamstra
Apache Spark master PR: apache#6291

This issue was addressed in apache#5494, but the fix in that PR, while safe in the
sense that it will prevent the SparkContext from shutting down, misses the
actual bug. The intent of submitMissingTasks should be understood as "submit
the Tasks that are missing for the Stage, and run them as part of the ActiveJob
identified by jobId". Because of a long-standing bug, the jobId parameter was
never being used. Instead, we were trying to use the jobId with which the Stage
was created -- which may no longer exist as an ActiveJob, hence the crash
reported in SPARK-6880.

The correct fix is to use the ActiveJob specified by the supplied jobId
parameter, which is guaranteed to exist at the call sites of
submitMissingTasks.

This fix should be applied to all maintenance branches, since it has existed
since 1.0.
asfgit pushed a commit that referenced this pull request Nov 25, 2015
…ted with a Stage

This issue was addressed in #5494, but the fix in that PR, while safe in the sense that it will prevent the SparkContext from shutting down, misses the actual bug.  The intent of `submitMissingTasks` should be understood as "submit the Tasks that are missing for the Stage, and run them as part of the ActiveJob identified by jobId".  Because of a long-standing bug, the `jobId` parameter was never being used.  Instead, we were trying to use the jobId with which the Stage was created -- which may no longer exist as an ActiveJob, hence the crash reported in SPARK-6880.

The correct fix is to use the ActiveJob specified by the supplied jobId parameter, which is guaranteed to exist at the call sites of submitMissingTasks.

This fix should be applied to all maintenance branches, since it has existed since 1.0.

kayousterhout pankajarora12

Author: Mark Hamstra <markhamstra@gmail.com>
Author: Imran Rashid <irashid@cloudera.com>

Closes #6291 from markhamstra/SPARK-6880.
asfgit pushed a commit that referenced this pull request Nov 25, 2015
…ted with a Stage

This issue was addressed in #5494, but the fix in that PR, while safe in the sense that it will prevent the SparkContext from shutting down, misses the actual bug.  The intent of `submitMissingTasks` should be understood as "submit the Tasks that are missing for the Stage, and run them as part of the ActiveJob identified by jobId".  Because of a long-standing bug, the `jobId` parameter was never being used.  Instead, we were trying to use the jobId with which the Stage was created -- which may no longer exist as an ActiveJob, hence the crash reported in SPARK-6880.

The correct fix is to use the ActiveJob specified by the supplied jobId parameter, which is guaranteed to exist at the call sites of submitMissingTasks.

This fix should be applied to all maintenance branches, since it has existed since 1.0.

kayousterhout pankajarora12

Author: Mark Hamstra <markhamstra@gmail.com>
Author: Imran Rashid <irashid@cloudera.com>

Closes #6291 from markhamstra/SPARK-6880.

(cherry picked from commit 0a5aef7)
Signed-off-by: Imran Rashid <irashid@cloudera.com>
asfgit pushed a commit that referenced this pull request Nov 25, 2015
…ted with a Stage

This issue was addressed in #5494, but the fix in that PR, while safe in the sense that it will prevent the SparkContext from shutting down, misses the actual bug.  The intent of `submitMissingTasks` should be understood as "submit the Tasks that are missing for the Stage, and run them as part of the ActiveJob identified by jobId".  Because of a long-standing bug, the `jobId` parameter was never being used.  Instead, we were trying to use the jobId with which the Stage was created -- which may no longer exist as an ActiveJob, hence the crash reported in SPARK-6880.

The correct fix is to use the ActiveJob specified by the supplied jobId parameter, which is guaranteed to exist at the call sites of submitMissingTasks.

This fix should be applied to all maintenance branches, since it has existed since 1.0.

kayousterhout pankajarora12

Author: Mark Hamstra <markhamstra@gmail.com>
Author: Imran Rashid <irashid@cloudera.com>

Closes #6291 from markhamstra/SPARK-6880.

(cherry picked from commit 0a5aef7)
Signed-off-by: Imran Rashid <irashid@cloudera.com>
kiszk pushed a commit to kiszk/spark-gpu that referenced this pull request Dec 26, 2015
…ted with a Stage

This issue was addressed in apache/spark#5494, but the fix in that PR, while safe in the sense that it will prevent the SparkContext from shutting down, misses the actual bug.  The intent of `submitMissingTasks` should be understood as "submit the Tasks that are missing for the Stage, and run them as part of the ActiveJob identified by jobId".  Because of a long-standing bug, the `jobId` parameter was never being used.  Instead, we were trying to use the jobId with which the Stage was created -- which may no longer exist as an ActiveJob, hence the crash reported in SPARK-6880.

The correct fix is to use the ActiveJob specified by the supplied jobId parameter, which is guaranteed to exist at the call sites of submitMissingTasks.

This fix should be applied to all maintenance branches, since it has existed since 1.0.

kayousterhout pankajarora12

Author: Mark Hamstra <markhamstra@gmail.com>
Author: Imran Rashid <irashid@cloudera.com>

Closes #6291 from markhamstra/SPARK-6880.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants