-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5345][DEPLOY] Fix unstable test case in FsHistoryProviderSuite #4133
Conversation
Test build #25880 has finished for PR 4133 at commit
|
Spark.testing should already be set for all tests. Is that needed ? What is the problem briefly and what does this fix? |
Can you explain how this fixes the problem? You're changing the order of the checks, but that doesn't match what the code does. For example, run this in a Scala shell:
That emulates what |
In fact, your test change uncovered a bug. In FsHistoryProvider.scala, L214:
This will not merge the lists correctly for apps that have not finished yet. It should do a check similar to the |
I haven't investigated the test in detail, but could the flakiness have been caused by the check interval? IIRC the history server doesn't actually start checking for logs until one full check interval has elapsed. If we assert before that happens, then obviously we're not gonna find the logs we expect. Could that be the problem? |
The flakiness might be because the polling timer is running in test mode, while the tests expect it not to run. @sarutak's patch seems to solve that, but that exposes a different bug that I mentioned above. I'll take a quick look at the code to see if that theory holds. |
@@ -43,7 +43,7 @@ class FsHistoryProviderSuite extends FunSuite with BeforeAndAfter with Matchers | |||
testDir = Utils.createTempDir() | |||
provider = new FsHistoryProvider(new SparkConf() | |||
.set("spark.history.fs.logDirectory", testDir.getAbsolutePath()) | |||
.set("spark.history.fs.updateInterval", "0")) | |||
.set("spark.testing", "true")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... as Sean mentioned, this should already be defined here. Can you double check that it's really not set, and if not, what's causing it?
Hi @sarutak , Check out https://github.com/vanzin/spark/tree/SPARK-5345. That fixes the sort problem, and cleans up the code a bit. It doesn't explain how the thread might be started during testing, though. But it doesn't seem to be happening with this patch (I even added an exception - not in the patch - to check that's the case, which wasn't triggered). |
I hit the exception I added for testing when running all tests, but never when running the test in isolation. It looks like some test is clearing the system properties (or that particular one), which is bad for other reasons, and I don't think should be worked around here. |
BTW it's very likely this is caused by the issue described in #4220 (comment). |
Jenkins, retest this please. |
I think we've observed this test's flakiness even after fixing #4220, so we should continue investigating it. |
Test build #26640 has finished for PR 4133 at commit
|
Even if the flakiness hasn't been fixed yet, this patch is still not correct. The test itself is correct, the |
Shall we close this PR in favor of #4370 ? |
This PR cannot be pushed as is, since it makes the test test the buggy sort behavior instead of correcting it. Still, if after #4220 these tests still fail, there's something still to be done for the bug. |
I think that we should close this for the time being, since we can always re-open if we see the bug again. |
O.K, I close this PR for now. |
In FsHistoryProviderSuite, a test "Parse new and old application logs" sometimes fail and sometimes succeed. It's unstable.