-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2778] [yarn] Add yarn integration tests. #2257
Conversation
This patch adds a couple of, currently, very simple integration tests to make sure both client and cluster modes are working. The tests don't do much yet other than run a simple job, but the plan is to enhance them after we get the framework in. The cluster tests are noisy, so redirect all log output to a file like other tests do. Copying the conf around sucks but it's less work than messing with maven/sbt and having to clean up other projects. Note the test is only added for yarn-stable. The code compiles against yarn-alpha but there are two issues I ran into that I could not overcome: - and old netty dependency kept creeping into the classpath and causing akka to not work, when using sbt; the old netty was correctly suppressed under maven. - MiniYARNCluster kept failing to execute containers because it did not create the NM's local dir itself; this is apparently a known behavior, but I'm not sure how to work around it. None of those issues are present with the stable Yarn. Also, these tests are a little slow to run. Apparently Spark doesn't yet tag tests (so that these could be isolated in a "slow" batch), so this is something to keep in mind.
Suggestions about making this work in yarn-alpha are welcome; I was hoping this would allow me to test yarn-alpha changes, but I failed. :-/ |
Jenkins, test this please. |
1 similar comment
Jenkins, test this please. |
QA tests have started for PR 2257 at commit
|
QA tests have finished for PR 2257 at commit
|
Jenkins, test this please. |
QA tests have started for PR 2257 at commit
|
QA tests have finished for PR 2257 at commit
|
Jenkins, test this please. |
QA tests have started for PR 2257 at commit
|
QA tests have finished for PR 2257 at commit
|
I think its ok to not have them for yarn-alpha. how slow are they? You couldn't find a way to have them run separately then and not with normal unit tests? |
They take about 1min to run on my machine. Most of the time is setting up and tearing down the MiniYARNCluster instance, which is only done once, so I hope adding more tests won't add so much on top of that. scalatest has tags that can be used to choose which tests to run, but Spark doesn't use them anywhere. I can look at adding them to these (and disabling them in the default run) if people think that's too long. |
Thanks @vanzin, this is really cool. If it takes a long time to run this test, we can always enable it only if yarn files are modified (SQL already does this). Then we don't have to worry about further inflating the test time if we want to test more features. |
@vanzin this mostly looks good. I've been trying to run it locally but I have been having lots of issues with the unit tests lately. Hopefully have this done and check in later today. Thanks for working on this, we definitely need more tests on the yarn side. We may need to be careful about adding more in the future if they take to longer, perhaps we can look at supporting the different types as you mention. |
Jenkins, test this please. |
QA tests have started for PR 2257 at commit
|
QA tests have finished for PR 2257 at commit
|
yarnCluster.start() | ||
|
||
val sysProps = sys.props.map { case (k, v) => (k, v) } | ||
sysProps.foreach { case (k, v) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just making a copy of sys.props
? Why not just do toMap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, why not just take a snapshot of sys.props
here and restore it later? Might be simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I was trying to be paranoid and clean up any "spark.*" options from sys.props before running these tests. That may not be the best idea (since I'd be cleaning up "spark.test" properties set by the build scripts), so I went with the copy instead.
It's great to see us adding tests here. @vanzin how long do these tests take, roughly? We might have to only run these in certain situations if they take a long time. |
@pwendell see the comments above: They take about 1min to run on my machine. Most of the time is setting up and tearing down the MiniYARNCluster instance, which is only done once, so I hope adding more tests won't add so much on top of that. scalatest has tags that can be used to choose which tests to run, but Spark doesn't use them anywhere. I can look at adding them to these (and disabling them in the default run) if people think that's too long. |
I reduced the number of executors from the cluster test (from 4 to 1, something I meant to do before but forgot) and shaved ~15s on my machine. Tests still run sort of slow, but it's a little better now. |
QA tests have started for PR 2257 at commit
|
QA tests have finished for PR 2257 at commit
|
val status = new File(args(1)) | ||
var result = "failure" | ||
try { | ||
val data = sc.parallelize(1 to 4).map(i => i).collect().toSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the map to identity here do anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just a way to trigger an actual job. I guess I could do parallelize(1 to 4, 4).collect()
to achieve the same thing.
Hey @vanzin I left a few minor comments, but this LGTM overall. I haven't verified the dependency logic, however. |
QA tests have started for PR 2257 at commit
|
QA tests have finished for PR 2257 at commit
|
Test FAILed. |
Test FAILed. |
Jenkins, retest this please. Is there any way to access |
I think you have to SSH into the machines to get them, but even if you have them they're kinda jumbled because we log everything to the same file. Actually here we might want to do something like what we did in #2108 so we know what's wrong when the test fails. |
Having everything in the same file is ok. I think the trick in this case is to convince the child processes launched by Yarn to not use the common log4j configuration, and instead log to stderr (so that Yarn will take care of the logs). Let me look at doing that. |
QA tests have started for PR 2257 at commit
|
QA tests have finished for PR 2257 at commit
|
Test FAILed. |
I'll merge with master and see if I can reproduce the failure... |
Yep, fails locally too after the merge. Let me look. |
This was added by the fix to SPARK-2668: a stray equal sign was creating a bad system property, and the Jetty initialization code was tripping on it. Also fixed a "MatchError" that could be hit in ApplicationMaster.
I found the problem - it was caused by a recent PR that basically broke yarn-cluster mode... |
QA tests have started for PR 2257 at commit
|
Ah good catch. The latest changes LGTM if you get the tests to pass. |
QA tests have finished for PR 2257 at commit
|
Test PASSed. |
Yay, and the test already found a bug before even being checked in. |
Thanks Marcelo and @andrewor14 for review - I'll merge this. |
This patch adds a couple of, currently, very simple integration tests
to make sure both client and cluster modes are working. The tests don't
do much yet other than run a simple job, but the plan is to enhance
them after we get the framework in.
The cluster tests are noisy, so redirect all log output to a file
like other tests do. Copying the conf around sucks but it's less
work than messing with maven/sbt and having to clean up other
projects.
Note the test is only added for yarn-stable. The code compiles
against yarn-alpha but there are two issues I ran into that I
could not overcome:
causing akka to not work, when using sbt; the old netty was
correctly suppressed under maven.
did not create the NM's local dir itself; this is apparently
a known behavior, but I'm not sure how to work around it.
None of those issues are present with the stable Yarn.
Also, these tests are a little slow to run. Apparently Spark doesn't
yet tag tests (so that these could be isolated in a "slow" batch),
so this is something to keep in mind.