Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YARN][SPARK-3293]Fix yarn's web show "SUCCEEDED" when the driver throw a exception in yarn-client #3508

Closed
wants to merge 19 commits into from

Conversation

SaintBacchus
Copy link
Contributor

spark has five exit situations:
1.system.exit(0) - succeeds
2.clean exit main -succeeds
3.sc.stop - succeeds
4.system.exit(1) - failed
5.exception - failed
But now yarn-client's ExecutorLauncher is not care about this and just report Ok when driver is down.
So I catch this five situations in driver and send the relevant status to ExecutorLauncher to reply the AM status.
In addition, ExecutorLauncher does not need retry if it report the failed status because this driver has been ended before the sencond ExecutorLauncher launched.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@SaintBacchus
Copy link
Contributor Author

The relevant PR is #2311 and #2577

@oza
Copy link
Contributor

oza commented Nov 29, 2014

Please note that YARN-2091 changed the error code: https://issues.apache.org/jira/browse/YARN-2091

@SaintBacchus
Copy link
Contributor Author

@oza Thx for your notification, but this PR is just report the AM status to RM and let web show right status. And Implementation of the status report is done by AM nowadays.

@oza
Copy link
Contributor

oza commented Nov 29, 2014

@SaintBacchus I see, I got it.

* This allows us to catch that and properly set the YARN application status and
* cleanup if needed.
*/
private def setupSystemSecurityManager(amActor: ActorRef): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The securityManager in the AM was causing a performance impact and we just removed it. I expect the same issue to happen here. #3484

@andrewor14
Copy link
Contributor

Hey @SaintBacchus as @tgravescs pointed out we can't use Java's security manager here until we understand the exact cause of the performance regression that entails. I think for the 1.2 release we should just have broken exit codes. For this patch it would be good to just fix the exception case if it's possible to do so without Java's security manager.

By the way, I wouldn't block the 1.2 release on this, so if you don't have spare cycles then don't worry about rushing this in.

@SaintBacchus
Copy link
Contributor Author

Hey @andrewor14 , it seems that using security manager may cause a performance impact as https://issues.apache.org/jira/browse/SPARK-4584 said.
So I tested it(using javaWordCount). They were almost the same and then using a bigger dataset to test. And same again: with security use 167 second by average and without it use 166seconds by average. So I doubt security manager will actually affect the performance of jvm?
20140712224804968

@SaintBacchus
Copy link
Contributor Author

By your suggestion @andrewor14 , do I just fix the two situations: user use sc.stop then report success and when exception occurs report failed? Is it right?

@andrewor14
Copy link
Contributor

Yeah the performance regression was difficult to reproduce according to @vanzin. I haven't been able to reproduce it myself but others have. Yes, this patch should only deal with the exception case since we can't safely and efficiently deal with exit codes.

@vanzin
Copy link
Contributor

vanzin commented Dec 2, 2014

Just to reinforce, we shouldn't add a security manager without better understanding what impact it has. Feel free to file a bug if you'd like this investigated, but I like the policy of "don't use System.exit()" better.

In any case, it seems your patch will fail apps that do a System.exit(0) without the security manager, so that's something to think about. Cluster mode takes the opposite approach (all apps that use System.exit() succeed), but probably for a reason that does not apply here (AM retries).

@andrewor14
Copy link
Contributor

Yeah I think what we want from this patch is the following behavior:

System.exit(X) = success
Uncaught exception = fail
sc.stop = success

I believe this is possible to do without the security manager. In the long run, the ideal behavior we want is:

System.exit(0) = success
System.exit(X) = fail, where X != 0
Uncaught exception = fail
sc.stop = success

But there is no easy way to do this without the security manager.

@SaintBacchus
Copy link
Contributor Author

Ok, I close this PR for so many meanless commits and create a new PR to reslove this prolem.

@SaintBacchus SaintBacchus deleted the yarn-client-exception branch November 23, 2015 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants