hack/test-cmd.sh:114: executing 'oc new-project 'cmd-admin'' timeout #15900

mfojtik · 2017-08-22T11:34:03Z

Seen here: https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_cmd/1307/console

Console:

hack/test-cmd.sh:114: executing 'oc new-project 'cmd-admin'' expecting success
FAILURE after 30.287s: hack/test-cmd.sh:114: executing 'oc new-project 'cmd-admin'' expecting success: the command returned the wrong error code
There was no output from the command.
Standard error from the command:

From the master server logs it seems like rolebindings creation failed:

I0822 11:26:47.002253   20501 wrap.go:42] POST /apis/rbac.authorization.k8s.io/v1beta1/namespaces/cmd-admin/rolebindings: (7.002020944s) 500
goroutine 32608 [running]:
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).recordStatus(0xc42628c310, 0x1f4)
	/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog/httplog.go:207 +0xdd
github.com/openshift/origin/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).WriteHeader(0xc42628c310, 0x1f4)

// tons of stracebacks....

logging error output: "k8s\x00\n\f\n\x02v1\x12\x06Status\x123\n\x04\n\x00\x12\x00\x12\aFailure\x1a\x1detcdserver: request timed out\"\x000\xf4\x03\x1a\x00\"\x00"

And that seems to be due to etcd timeout:

E0822 11:26:47.001865   20501 status.go:62] apiserver received an error that is not an metav1.Status: etcdserver: request timed out

Which seems to be related to:

etcdserver/api/v3rpc: Failed to dial 172.17.0.2:24001: connection error: desc = "transport: remote error: tls: bad certificate"; please retry

and

2017-08-22 11:26:26.957356 W | etcdserver: timed out waiting for read index response

The text was updated successfully, but these errors were encountered:

mfojtik · 2017-08-22T11:38:25Z

@ironcladlou you was looking for something to look at? ;-)

ironcladlou · 2017-08-23T13:55:40Z

Looks like etcd writes were timing out in general for ~30-40s between ~11:26:40 and 11:27:28.

stevekuznetsov · 2017-08-23T14:37:21Z

This might be an issue with EBS block allocation -- @deads2k recently reconfigured the job and the etcd data dir may not be in tmpfs anymore

ironcladlou · 2017-08-23T14:41:25Z

This old friend? #6542 😬

deads2k · 2017-08-31T15:25:23Z

I0831 14:52:32.816638   20662 trace.go:76] Trace[73421993]: "GuaranteedUpdate etcd3: *api.ServiceAccount" (started: 2017-08-31 14:52:25.815545555 +0000 UTC) (total time: 7.001059231s):
Trace[73421993]: [31.869Âµs] [31.869Âµs] initial value restored
Trace[73421993]: [101.779Âµs] [69.91Âµs] Transaction prepared
Trace[73421993]: [7.001059231s] [7.000957452s] END
E0831 14:52:32.816669   20662 status.go:62] apiserver received an error that is not an metav1.Status: etcdserver: request timed out
I0831 14:52:32.816846   20662 trace.go:76] Trace[186387414]: "Update /api/v1/namespaces/kube-system/serviceaccounts/daemon-set-controller" (started: 2017-08-31 14:52:25.81547282 +0000 UTC) (total time: 7.001354058s):
Trace[186387414]: [15.846Âµs] [15.846Âµs] About to convert to expected version

ironcladlou · 2017-09-08T13:46:19Z

@php-coder I'm not sure why the error in #15558 (comment) was attributed to this issue; I see no evidence provided to support the claim. I only mention it because we have no fancy flake analytics like upstream and so I don't want the frequency of this flake to be misrepresented.

sjenning · 2017-09-19T02:50:21Z

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/16425/test_pull_request_origin_cmd/3018/

enj · 2017-09-19T13:37:41Z

Seen in #14784 https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/14784/test_pull_request_origin_cmd/3046/

mrogers950 · 2017-09-19T18:10:18Z

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/15794/test_pull_request_origin_cmd/3105/

mrogers950 · 2017-10-19T13:47:41Z

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/16894/test_pull_request_origin_cmd/4855/

openshift-bot · 2018-02-23T11:19:38Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2018-03-25T11:27:24Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2018-04-24T11:30:45Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

mfojtik assigned ironcladlou Aug 22, 2017

mfojtik added component/kubernetes dependency/etcd kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1 labels Aug 22, 2017

mfojtik mentioned this issue Aug 22, 2017

Replace legacy client in Docker Registry with external clientset #15624

Merged

php-coder mentioned this issue Sep 4, 2017

SCC: add AllowedFlexVolumes to manage a whitelist of allowed flexvolumes drivers #15558

Merged

sjenning mentioned this issue Sep 19, 2017

UPSTREAM: 46542: Ignore pods for quota marked for deletion whose node is unreachable #16425

Closed

enj mentioned this issue Sep 19, 2017

Per-client access token expiration #14784

Merged

mrogers950 mentioned this issue Sep 19, 2017

Add Prometheus metrics for authentication attempts #15794

Merged

mrogers950 mentioned this issue Oct 19, 2017

Add integration test for the request token endpoints #16894

Merged

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 23, 2018

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 25, 2018

openshift-ci-robot closed this as completed Apr 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hack/test-cmd.sh:114: executing 'oc new-project 'cmd-admin'' timeout #15900

hack/test-cmd.sh:114: executing 'oc new-project 'cmd-admin'' timeout #15900

mfojtik commented Aug 22, 2017 •

edited

Loading

mfojtik commented Aug 22, 2017

ironcladlou commented Aug 23, 2017

stevekuznetsov commented Aug 23, 2017

ironcladlou commented Aug 23, 2017

deads2k commented Aug 31, 2017

ironcladlou commented Sep 8, 2017 •

edited

Loading

sjenning commented Sep 19, 2017

enj commented Sep 19, 2017

mrogers950 commented Sep 19, 2017

mrogers950 commented Oct 19, 2017

openshift-bot commented Feb 23, 2018

openshift-bot commented Mar 25, 2018

openshift-bot commented Apr 24, 2018

hack/test-cmd.sh:114: executing 'oc new-project 'cmd-admin'' timeout #15900

hack/test-cmd.sh:114: executing 'oc new-project 'cmd-admin'' timeout #15900

Comments

mfojtik commented Aug 22, 2017 • edited Loading

mfojtik commented Aug 22, 2017

ironcladlou commented Aug 23, 2017

stevekuznetsov commented Aug 23, 2017

ironcladlou commented Aug 23, 2017

deads2k commented Aug 31, 2017

ironcladlou commented Sep 8, 2017 • edited Loading

sjenning commented Sep 19, 2017

enj commented Sep 19, 2017

mrogers950 commented Sep 19, 2017

mrogers950 commented Oct 19, 2017

openshift-bot commented Feb 23, 2018

openshift-bot commented Mar 25, 2018

openshift-bot commented Apr 24, 2018

mfojtik commented Aug 22, 2017 •

edited

Loading

ironcladlou commented Sep 8, 2017 •

edited

Loading