-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26275][PYTHON][ML] Increases timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction test #23236
Conversation
…aining_and_prediction test
cc @BryanCutler and @viirya |
retest this please |
Test build #99721 has finished for PR 23236 at commit
|
Test build #99724 has finished for PR 23236 at commit
|
Test build #99726 has finished for PR 23236 at commit
|
Test build #99727 has finished for PR 23236 at commit
|
Test build #99729 has finished for PR 23236 at commit
|
lgtm |
Seems ok to me, but there are a few silly things with this test that might help also
If you don't feel comfortable changing any of these, just increasing the timeout is probably fine |
Thanks for a close look, @BryanCutler. I just increased the timeout because I was just trying to keep the test as is and fix it. The actual test doesn't look taking so much time but it looks it can be flaky when the resource usage is heavy. If you feel this can be resolved in other ways, would you mind if I ask to take a look for this please? and i'm happy to close this one. Actually, I am not quite familiar with ML ones .. |
True, the test is not that long under light resources. Locally, I saw a couple seconds difference with the changes I mentioned. The weird thing is the unmodified test completes after the 11th batch with errors
Compared to the error values from the test failures above, they match up until the 10th batch but then these continue until the 16th where it has a timeout
I would expect the seed to produce the same values (or all diff if the random func is different), which makes me think something else is going on.. Anyway, I think it's fine to increase the timeout and if it's still flaky, we can look at making changes. |
I agree with @BryanCutler's analysis and it looks a bit weird at few things about this test. I also think it is fine to increase the timeout for now. |
Thanks guys .. I will try to take a look for it as well (although it's going to take relatively a long while). Let me get this in for now anyway. |
Merged to master. |
I suspect that might because as the resource usage is heavy, |
Yeah that sounds likely, and the timeout increase should help with that |
…essionWithSGDTests.test_training_and_prediction test ## What changes were proposed in this pull request? Looks this test is flaky https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99704/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99569/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99644/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99548/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99454/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99609/console ``` ====================================================================== FAIL: test_training_and_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests) Test that the model improves on toy data with no. of batches ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 367, in test_training_and_prediction self._eventually(condition) File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 78, in _eventually % (timeout, lastValue)) AssertionError: Test failed due to timeout after 30 sec, with last condition returning: Latest errors: 0.67, 0.71, 0.78, 0.7, 0.75, 0.74, 0.73, 0.69, 0.62, 0.71, 0.69, 0.75, 0.72, 0.77, 0.71, 0.74 ---------------------------------------------------------------------- Ran 13 tests in 185.051s FAILED (failures=1, skipped=1) ``` This looks happening after increasing the parallelism in Jenkins to speed up at apache#23111. I am able to reproduce this manually when the resource usage is heavy (with manual decrease of timeout). ## How was this patch tested? Manually tested by ``` cd python ./run-tests --testnames 'pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction' --python-executables=python ``` Closes apache#23236 from HyukjinKwon/SPARK-26275. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Looks this test is flaky
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99704/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99569/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99644/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99548/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99454/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99609/console
This looks happening after increasing the parallelism in Jenkins to speed up at #23111. I am able to reproduce this manually when the resource usage is heavy (with manual decrease of timeout).
How was this patch tested?
Manually tested by