Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3293] yarn's web show "SUCCEEDED" when the driver throw a exception in yarn-client #2311

Closed
wants to merge 2 commits into from

Conversation

witgo
Copy link
Contributor

@witgo witgo commented Sep 7, 2014

No description provided.

@SparkQA
Copy link

SparkQA commented Sep 7, 2014

QA tests have started for PR 2311 at commit 3828707.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 7, 2014

QA tests have finished for PR 2311 at commit 3828707.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class NoExitsException(exitCode: Int) extends SecurityException

@ScrapCodes
Copy link
Member

You will have to rebase your patch with master for tests to pass.

@SparkQA
Copy link

SparkQA commented Sep 8, 2014

QA tests have started for PR 2311 at commit 2ebe62e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 8, 2014

QA tests have finished for PR 2311 at commit 2ebe62e.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class NoExitsException(exitCode: Int) extends SecurityException

@witgo
Copy link
Contributor Author

witgo commented Sep 8, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Sep 8, 2014

QA tests have started for PR 2311 at commit 2ebe62e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 8, 2014

QA tests have finished for PR 2311 at commit 2ebe62e.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class NoExitsException(exitCode: Int) extends SecurityException

System.setSecurityManager(new java.lang.SecurityManager() {
override def checkExit(paramInt: Int) {
if (!stopped) {
throw new NoExitsException(paramInt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of throwing the exception, why not just set exitStatus directly and avoid having a custom exception? You can also use that as a trigger to call "finish()" if it hasn't been called yet, with a nice error message about the app having called System.exit.

That avoids messing with the app's exception handling, since very few people realize System.exit can throw and thus don't really plan for it.

@SparkQA
Copy link

SparkQA commented Sep 9, 2014

QA tests have started for PR 2311 at commit 7efcca0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 9, 2014

QA tests have finished for PR 2311 at commit 7efcca0.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Sep 21, 2014

QA tests have started for PR 2311 at commit 7efcca0.

  • This patch does not merge cleanly!

@SparkQA
Copy link

SparkQA commented Sep 21, 2014

QA tests have finished for PR 2311 at commit 7efcca0.

  • This patch fails unit tests.
  • This patch does not merge cleanly!

@SparkQA
Copy link

SparkQA commented Sep 22, 2014

QA tests have started for PR 2311 at commit 5f0883d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 22, 2014

QA tests have finished for PR 2311 at commit 5f0883d.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -119,13 +126,7 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
} else {
runExecutorLauncher(securityMgr)
}

if (finalStatus != FinalApplicationStatus.UNDEFINED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're changing this logic, you need to modify runExecutorLauncher so that it correctly reports an error on failure. Right now it will always exit with 0 and will not report the final status to the RM.

@tgravescs
Copy link
Contributor

I've been having a multiple different issues with the success/failure reporting (SPARK-3627). Does this handle all of those?

@vanzin
Copy link
Contributor

vanzin commented Sep 22, 2014

@tgravescs this change should handle uncaught exceptions and explicit System.exit in the driver code. Not sure if that covers all the issues you're seeing.

@@ -29,6 +29,7 @@ import org.apache.hadoop.yarn.api._
import org.apache.hadoop.yarn.api.records._
import org.apache.hadoop.yarn.conf.YarnConfiguration

import org.apache.spark.executor.{ExecutorUncaughtExceptionHandler, ExecutorExitCode}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see these used anywhere?

@SparkQA
Copy link

SparkQA commented Sep 23, 2014

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20706/

@witgo
Copy link
Contributor Author

witgo commented Sep 23, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Sep 23, 2014

QA tests have started for PR 2311 at commit 617efef.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 23, 2014

QA tests have finished for PR 2311 at commit 617efef.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 23, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20707/

@SparkQA
Copy link

SparkQA commented Sep 25, 2014

QA tests have started for PR 2311 at commit 9655aa8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 25, 2014

QA tests have finished for PR 2311 at commit 9655aa8.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20808/

@tgravescs
Copy link
Contributor

@witgo There are cases not covered by this pr that I would like to fix up dealing with exit codes and final status. Would you mind if I just pull these changes into mine? I want to try to fix it up across the board. I hope to have something up tomorrow if all the testing goes well.

@witgo
Copy link
Contributor Author

witgo commented Sep 26, 2014

Ok, no problem.

@tgravescs
Copy link
Contributor

also maybe I'm missing something but I don't see how the changes help in yarn-client mode? The changes you made were on the application master running on a separate host then the driver itself, startUserClass is only used in cluster mode, and when the disconnect happens in client mode we always report success.

@witgo
Copy link
Contributor Author

witgo commented Sep 26, 2014

Yes, this does not involve org.apache.spark.deploy.yarn.Client class, which run outside the cluster.
We should call YarnClient.killApplication when an uncaught exception is thrown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants