Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flake in cmd: etcdserver: request timed out #16248

Closed
dmage opened this issue Sep 8, 2017 · 9 comments
Closed

flake in cmd: etcdserver: request timed out #16248

dmage opened this issue Sep 8, 2017 · 9 comments
Assignees
Labels
component/kubernetes kind/test-flake Categorizes issue or PR as related to test flakes. priority/P0

Comments

@dmage
Copy link
Contributor

dmage commented Sep 8, 2017

serviceaccount "registry" created
deploymentconfig "docker-registry" created
Error from server: etcdserver: request timed out
Error from server (InternalError): Internal error occurred: failed to allocate a serviceIP: etcdserver: request timed out
[INFO] No compiled `junitreport` binary was found. Attempting to build one using:
[ERROR] PID 20181: hack/lib/start.sh:636: `oc create --config="${ADMIN_KUBECONFIG}" -f -` exited with status 1.

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/15796/test_pull_request_origin_cmd/2378/

/kind test-flake

@smarterclayton
Copy link
Contributor

smarterclayton commented Sep 16, 2017

Seeing this on a ton of builds, it looks like something is wrong with etcd:

https://storage.googleapis.com/origin-ci-test/logs/test_branch_origin_cmd/434/artifacts/scripts/test-cmd/logs/openshift.log

2017-09-16 07:20:00.546622 W | wal: sync duration of 2.538028386s, expected less than 1s
I0916 07:20:01.636214   20748 wrap.go:42] GET /api: (396.503µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.637069   20748 wrap.go:42] GET /apis: (392.446µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.637768   20748 wrap.go:42] GET /apis/apiregistration.k8s.io/v1beta1: (288.62µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.638559   20748 wrap.go:42] GET /apis/extensions/v1beta1: (369.954µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.639341   20748 wrap.go:42] GET /apis/apps/v1beta1: (339.175µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.640022   20748 wrap.go:42] GET /apis/authentication.k8s.io/v1: (332.214µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.640678   20748 wrap.go:42] GET /apis/authentication.k8s.io/v1beta1: (296.03µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.641342   20748 wrap.go:42] GET /apis/authorization.k8s.io/v1: (294.396µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.642068   20748 wrap.go:42] GET /apis/authorization.k8s.io/v1beta1: (305.441µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.642746   20748 wrap.go:42] GET /apis/autoscaling/v1: (307.923µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.643495   20748 wrap.go:42] GET /apis/batch/v1: (377.521µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.644160   20748 wrap.go:42] GET /apis/batch/v2alpha1: (311.811µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.644883   20748 wrap.go:42] GET /apis/certificates.k8s.io/v1beta1: (318.274µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.645535   20748 wrap.go:42] GET /apis/networking.k8s.io/v1: (333.642µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.646210   20748 wrap.go:42] GET /apis/policy/v1beta1: (316.67µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.646931   20748 wrap.go:42] GET /apis/authorization.openshift.io/v1: (393.465µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.647594   20748 wrap.go:42] GET /apis/rbac.authorization.k8s.io/v1beta1: (309.769µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.648228   20748 wrap.go:42] GET /apis/storage.k8s.io/v1: (292.366µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.648851   20748 wrap.go:42] GET /apis/storage.k8s.io/v1beta1: (302.77µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.649550   20748 wrap.go:42] GET /apis/apiextensions.k8s.io/v1beta1: (357.948µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.650296   20748 wrap.go:42] GET /apis/apps.openshift.io/v1: (382.538µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.651027   20748 wrap.go:42] GET /apis/build.openshift.io/v1: (380.961µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.651748   20748 wrap.go:42] GET /apis/image.openshift.io/v1: (372.758µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.652557   20748 wrap.go:42] GET /apis/network.openshift.io/v1: (379.005µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.653237   20748 wrap.go:42] GET /apis/oauth.openshift.io/v1: (329.81µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.653896   20748 wrap.go:42] GET /apis/project.openshift.io/v1: (323.313µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.654544   20748 wrap.go:42] GET /apis/quota.openshift.io/v1: (343.019µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.655243   20748 wrap.go:42] GET /apis/route.openshift.io/v1: (310.496µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.655868   20748 wrap.go:42] GET /apis/security.openshift.io/v1: (324.151µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.656541   20748 wrap.go:42] GET /apis/template.openshift.io/v1: (311.964µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.657187   20748 wrap.go:42] GET /apis/user.openshift.io/v1: (299.203µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
I0916 07:20:01.657964   20748 wrap.go:42] GET /api/v1: (462.609µs) 200 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4] 172.17.0.2:60630]
E0916 07:20:05.041983   20748 status.go:62] apiserver received an error that is not an metav1.Status: etcdserver: request timed out
I0916 07:20:05.042264   20748 trace.go:76] Trace[1750739734]: "Create /api/v1/namespaces/default/secrets" (started: 2017-09-16 07:19:58.0408633 +0000 UTC) (total time: 7.001367658s):
Trace[1750739734]: [19.041µs] [19.041µs] About to convert to expected version
Trace[1750739734]: [35.82µs] [16.779µs] Conversion done
Trace[1750739734]: [124.845µs] [89.025µs] About to store object in database
Trace[1750739734]: [7.001367658s] [7.001242813s] END
I0916 07:20:05.042342   20748 wrap.go:42] POST /api/v1/namespaces/default/secrets: (7.001721676s) 500
goroutine 39945 [running]:
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).recordStatus(0xc42bf16700, 0x1f4)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:207 +0xdd
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).WriteHeader(0xc42bf16700, 0x1f4)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:186 +0x35
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/metrics.(*responseWriterDelegator).WriteHeader(0xc432f19fb0, 0x1f4)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go:135 +0x45
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.SerializeObject(0x5a0bb54, 0x23, 0x7fdf0cf70928, 0xc42d93a750, 0xeeaa8e0, 0xc42e46d840, 0xc425a17800, 0x1f4, 0xee6eae0, 0xc427324100)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go:91 +0x8d
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.WriteObjectNegotiated(0x7fdf08282d10, 0xc431acfe90, 0xeebe2a0, 0xc423cfdbc0, 0x0, 0x0, 0x594e823, 0x2, 0xeeaa8e0, 0xc42e46d840, ...)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go:116 +0x29e
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters.ErrorNegotiated(0x7fdf08282d10, 0xc431acfe90, 0xee82ee0, 0xc4266d4ce0, 0xeebe2a0, 0xc423cfdbc0, 0x0, 0x0, 0x594e823, 0x2, ...)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/responsewriters/writers.go:135 +0x165
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers.(*RequestScope).err(0xc434cffe00, 0xee82ee0, 0xc4266d4ce0, 0xeeaa8e0, 0xc42e46d840, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/rest.go:80 +0x10e
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers.createHandler.func1(0xeeaa8e0, 0xc42e46d840, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/handlers/rest.go:465 +0x1131
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints.restfulCreateResource.func1(0xc432f19f20, 0xc422b9e960)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/installer.go:1027 +0xd5
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/metrics.InstrumentRouteFunc.func1(0xc432f19f20, 0xc422b9e960)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go:104 +0x1cf
github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Container).dispatch(0xc422fd0c60, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/container.go:277 +0xb8d
github.com/openshift/origin/vendor/github.com/emicklei/go-restful.(*Container).Dispatch(0xc422fd0c60, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/emicklei/go-restful/container.go:199 +0x57
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.director.ServeHTTP(0x5974e63, 0xe, 0xc422fd0c60, 0xc4278f6310, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/handler.go:153 +0x6e7
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.(*director).ServeHTTP(0xc425628a60, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	<autogenerated>:72 +0x86
github.com/openshift/origin/vendor/k8s.io/kube-aggregator/pkg/apiserver.(*proxyHandler).ServeHTTP(0xc433011300, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kube-aggregator/pkg/apiserver/handler_proxy.go:91 +0x122
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux.(*pathHandler).ServeHTTP(0xc422dc8000, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux/pathrecorder.go:248 +0x3dd
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).ServeHTTP(0xc42e11fea0, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/mux/pathrecorder.go:234 +0x72
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.director.ServeHTTP(0x597ad20, 0xf, 0xc426adefc0, 0xc42e11fea0, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/handler.go:161 +0x301
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server.(*director).ServeHTTP(0xc4312d4880, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	<autogenerated>:72 +0x86
github.com/openshift/origin/pkg/cmd/server/origin.namespacingFilter.func1(0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/handlers.go:96 +0xd2
net/http.HandlerFunc.ServeHTTP(0xc431216e10, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/usr/lib/golang/src/net/http/server.go:1942 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuthorization.func1(0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/authorization.go:50 +0x30c
net/http.HandlerFunc.ServeHTTP(0xc431282980, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/usr/lib/golang/src/net/http/server.go:1942 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithMaxInFlightLimit.func1(0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/maxinflight.go:96 +0x311
net/http.HandlerFunc.ServeHTTP(0xc4312829c0, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/usr/lib/golang/src/net/http/server.go:1942 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithImpersonation.func1(0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/impersonation.go:47 +0x226a
net/http.HandlerFunc.ServeHTTP(0xc431282a00, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/usr/lib/golang/src/net/http/server.go:1942 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters.WithAuthentication.func1(0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/filters/authentication.go:73 +0x2b0
net/http.HandlerFunc.ServeHTTP(0xc4311291d0, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/usr/lib/golang/src/net/http/server.go:1942 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/request.WithRequestContext.func1(0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/endpoints/request/requestcontext.go:110 +0xef
net/http.HandlerFunc.ServeHTTP(0xc4312d48a0, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/usr/lib/golang/src/net/http/server.go:1942 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithCORS.func1(0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/cors.go:75 +0x189
net/http.HandlerFunc.ServeHTTP(0xc4311e58c0, 0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/usr/lib/golang/src/net/http/server.go:1942 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.WithPanicRecovery.func1(0xeeaa960, 0xc42bf16700, 0xc425a17800)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/wrap.go:41 +0x11b
net/http.HandlerFunc.ServeHTTP(0xc4312d4a60, 0x7fdf08282d78, 0xc42e46d828, 0xc425a17800)
	/usr/lib/golang/src/net/http/server.go:1942 +0x44
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1(0xc4312d4aa0, 0xeec1ee0, 0xc42e46d828, 0xc425a17800, 0xc422b9e900)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:91 +0x8d
created by github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/filters/timeout.go:93 +0x1c0

logging error output: "k8s\x00\n\f\n\x02v1\x12\x06Status\x123\n\x04\n\x00\x12\x00\x12\aFailure\x1a\x1detcdserver: request timed out\"\x000\xf4\x03\x1a\x00\"\x00"
 [[openshift/v1.7.0+80709908fd (linux/amd64) kubernetes/930b5c4/system:serviceaccount:openshift-infra:serviceaccount-pull-secrets-controller] 172.17.0.2:60672]

Other calls later fail after exactly 7 seconds, implies some sort of timeout on either end. I think we have an etcd server or client bug. This may be related to the issues seen on us-west-1 (where certain calls time out after a while from the API server, but not directly to etcd).

It's possible that this has something to do with the client library getting stuck - perhaps the particular error here causes it to fail and stay stuck? We should check the upstream etcd versions after 3.2.1.

Raising priority.

@smarterclayton smarterclayton changed the title Internal error occurred: failed to allocate a serviceIP: etcdserver: request timed out flake in cmd: etcdserver: request timed out Sep 16, 2017
@smarterclayton
Copy link
Contributor

Also note:

E0916 07:20:29.530005   20748 leaderelection.go:263] Failed to update lock: etcdserver: request timed out

@deads2k in unified master mode leader election failures shouldn't be terminating the process, which is likely making this problem worse. The new "split controller" logic should not be on when leader election is not on.

@mfojtik
Copy link
Contributor

mfojtik commented Sep 19, 2017

There are just few places in the etcd code that returns "request timed out". Most of them seems to be related to leader election and the watch (but this is not watch problem).

In v3 server these are basically: LeaseRenew, LeaseTimeToLive and linearizableReadLoop (but that has a warning on top and that warning I don't see in our logs). Since we run just one etcd member, not sure if leases apply here.

@mfojtik
Copy link
Contributor

mfojtik commented Sep 19, 2017

@smarterclayton my candidates:
This one I believe we picked recently (@rrati), wonder if this is a side-effect or this issue pre-dates this:
etcd-io/etcd@15e9510
EDIT: I forgot that this was for v2 client (?)

Other candidate:
etcd-io/etcd@efd7800

Except these two I don't see anything related to this issue in release-3.2 branch...

@mfojtik
Copy link
Contributor

mfojtik commented Sep 19, 2017

@smarterclayton also for the 7s timeout:

https://github.com/mfojtik/origin/blob/b76b0a9d66ca1ab477ab381a3b9b5297e6333c35/vendor/github.com/coreos/etcd/etcdserver/config.go#L138

  // 5s for queue waiting, computation and disk IO delay
  // + 2 * election timeout for possible leader election

So this is really leader election broken? Can we tweak ElectionTicks ? (lets experiment with it here: #16429)

@mfojtik
Copy link
Contributor

mfojtik commented Sep 19, 2017

@deads2k said this could be the tmpfs issue as the /tmp we create here is not used when we run the test/cmd here

I wonder if we should add OS_BUILD_ENV_TMP_VOLUME into above to make sure we use the host /tmp and we don't create empty Docker volume.... @stevekuznetsov ?

I changed #16429 hardcode the /tmp mount from the host to prove this.

@mfojtik
Copy link
Contributor

mfojtik commented Sep 19, 2017

@stevekuznetsov closing this as the issue was the tmpfs and it was fixed in ci-cd repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/kubernetes kind/test-flake Categorizes issue or PR as related to test flakes. priority/P0
Projects
None yet
Development

No branches or pull requests

8 participants