Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test: TestIssue3699 times out #5452

Closed
xiang90 opened this issue May 25, 2016 · 6 comments
Closed

Test: TestIssue3699 times out #5452

xiang90 opened this issue May 25, 2016 · 6 comments
Milestone

Comments

@xiang90
Copy link
Contributor

xiang90 commented May 25, 2016

goroutine 12943 [sleep]:
time.Sleep(0x989680)
    /usr/local/golang/1.6.2/go/src/runtime/time.go:59 +0xf9
github.com/coreos/etcd/integration.(*cluster).waitMembersMatch(0xc8231b7200, 0xc82000ea20, 0xc820187e00, 0x4, 0x4)
    /home/runner/workspace/src/github.com/coreos/etcd/gopath/src/github.com/coreos/etcd/integration/cluster.go:319 +0x405
github.com/coreos/etcd/integration.(*cluster).addMember(0xc8231b7200, 0xc82000ea20)
    /home/runner/workspace/src/github.com/coreos/etcd/gopath/src/github.com/coreos/etcd/integration/cluster.go:251 +0x65b
github.com/coreos/etcd/integration.(*cluster).AddMember(0xc8231b7200, 0xc82000ea20)
    /home/runner/workspace/src/github.com/coreos/etcd/gopath/src/github.com/coreos/etcd/integration/cluster.go:270 +0x2b
github.com/coreos/etcd/integration.TestIssue3699(0xc82000ea20)
    /home/runner/workspace/src/github.com/coreos/etcd/gopath/src/github.com/coreos/etcd/integration/cluster_test.go:312 +0x146
testing.tRunner(0xc82000ea20, 0x1625238)
    /usr/local/golang/1.6.2/go/src/testing/testing.go:473 +0x98
created by testing.RunTests
    /usr/local/golang/1.6.2/go/src/testing/testing.go:582 +0x892
@xiang90 xiang90 added this to the v3.0.0 milestone May 27, 2016
@xiang90
Copy link
Contributor Author

xiang90 commented May 28, 2016

@heyitsanthony @AkihiroSuda Can you try to reproduce this? I cannot... :(

@AkihiroSuda
Copy link
Contributor

I can easily hit this failure without doing anything special

=== RUN   TestIssue3699
--- FAIL: TestIssue3699 (17.20s)
        cluster_test.go:331: waited too long for ready notification

20fc3e9

Is this identical to the timeout you mentioned here?
Or is that something different which should panic without the "waited too long for ready notification" error string?

@xiang90
Copy link
Contributor Author

xiang90 commented May 28, 2016

@AkihiroSuda Can you try to reproduce this withe logging enabled? (set https://github.com/coreos/etcd/blob/master/integration/v2_http_kv_test.go#L35 to debug)

@AkihiroSuda
Copy link
Contributor

AkihiroSuda commented May 29, 2016

Attached the log: test1.txt

Interestingly I could not reproduce the failure (tested 1k times) when I enabled logging, perhaps due to the affect of logging or something else to scheduling.

So I used Namazu testing tool (recently @mitake talked about this at CoreOS Fest) for increasing reproducibility by setting random scheduling attributes to Linux LWPs.

I think the result is still reliable because it just calls sched_setattr(2) and does no code instrumentation which can lead to false-alarm.
But please keep in mind that the scheduling is slowed down and hence it can lead to timeout.

@xiang90
Copy link
Contributor Author

xiang90 commented May 29, 2016

@AkihiroSuda From the log you provided, I think the failure is just a false-alarm. The timeout in the test is bad, which is 10 seconds. If there is even one election happened during the last part of the test, the test will fail. There is a 5 seconds per request timeout in the last part. So if the test misses one request, then it might not be able to try again. I verified that from the log you provided. I do not think there is anything we can do except to either make the election timeout longer (which we do if we test it on slow CI) or make the timeout of this test longer (from 10 seconds to 20 seconds at https://github.com/coreos/etcd/blob/master/integration/cluster_test.go#L330?)

Can you give a try to either make the election timeout longer or test timeout longer on your env?

However this is different than the one in my original comment.

@xiang90
Copy link
Contributor Author

xiang90 commented May 30, 2016

OK... After some debugging, this test failure only happens in #5468. And the new commits are the root cause. Closing this one since master branch is OK.

@xiang90 xiang90 closed this as completed May 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants