Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart workspace fail (OpenShift Dedicated, minishift) #16512

Closed
4 of 24 tasks
vparfonov opened this issue Apr 1, 2020 · 25 comments
Closed
4 of 24 tasks

Restart workspace fail (OpenShift Dedicated, minishift) #16512

vparfonov opened this issue Apr 1, 2020 · 25 comments
Assignees
Labels
area/che-server kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.
Milestone

Comments

@vparfonov
Copy link
Contributor

vparfonov commented Apr 1, 2020

Describe the bug

Got error message during restarting early created workspaces. This reproduced not each time need to repeat restarting several times.

Che version

  • latest
  • nightly - 7.11.0-SNAPSHOT
  • other: please specify

Steps to reproduce

  1. Create workspace
  2. Stop workspace
  3. Try to restart it (probably need to repeat several times)

Got this error log:

Error: Failed to run the workspace: "Failure executing: POST at: https://172.30.0.1/api/v1/namespaces/eclipse-che/secrets. Message: object is being deleted: secrets "workspacetim1cx2edjv0re97-sshprivatekeys" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=secrets, name=workspacetim1cx2edjv0re97-sshprivatekeys, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=object is being deleted: secrets "workspacetim1cx2edjv0re97-sshprivatekeys" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={})."

Expected behavior

Runtime

  • kubernetes (include output of kubectl version)
  • Openshift (include output of oc version)
  • minikube (include output of minikube version and kubectl version)
  • minishift (include output of minishift version and oc version)
  • docker-desktop + K8S (include output of docker version and kubectl version)
  • other: (please specify)
oc version

Client Version: version.Info{Major:"4", Minor:"1+", GitVersion:"v4.1.0+b4261e0", GitCommit:"b4261e07ed", GitTreeState:"clean", BuildDate:"2019-07-06T03:16:01Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"darwin/amd64"}

Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.2", GitCommit:"4320e48", GitTreeState:"clean", BuildDate:"2020-01-21T19:50:59Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

Screenshots

error-log

Installation method

  • chectl
  • che-operator
  • minishift-addon
  • I don't know
  • oc apply for che-server, plugin and devfile registry

Environment

  • my computer
    • Windows
    • Linux
    • macOS
  • Cloud
    • Amazon
    • Azure
    • GCE
    • other (OCD che-dev Cluster)
  • other: please specify

Eclipse Che Logs

che-2-5kqgc-che.log

Additional context

Could not reproduce before this commit: db46ad4

@vparfonov vparfonov added the kind/bug Outline of a bug - must adhere to the bug report template. label Apr 1, 2020
@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Apr 1, 2020
@vparfonov vparfonov added area/infra/openshift area/che-server severity/P1 Has a major impact to usage or development of the system. and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Apr 1, 2020
@ibuziuk
Copy link
Member

ibuziuk commented Apr 1, 2020

@vparfonov could you clarify how you deployed eclipse che on OSD?

@vparfonov
Copy link
Contributor Author

here script that I used for deploy https://gist.github.com/vparfonov/dcdd1c4ec30604d9b957737cebffb4bb

@vparfonov
Copy link
Contributor Author

Reproduced on multi-user installation with original old fashion deploy_che.sh script.
Steps to reproduce:

  1. checkout to commit before removing deploy_che.sh script (PR):
    git checkout 59f9e41be62586c174d10da0485e3ba66588ef33
  2. cd deploy/openshift
  3. ./deploy_che.sh --mutliuser
  4. Update DeploymentConfig to allow start more than one workspace:
- name: CHE_LIMITS_USER_WORKSPACES_RUN_COUNT
  value: '-1'
  1. Create and start two workspaces.
  2. Stop both.
  3. Start one of workspaces and until it running start other workspace

multiuser

che-3-pdh86-che.log

@ibuziuk
Copy link
Member

ibuziuk commented Apr 2, 2020

@ericwill any ideas? looks like an issue with ssh plugin cc: @vinokurig

@vinokurig
Copy link
Contributor

Looks like it is related to #14950, @vzhukovskii any ideas?

@vzhukovs
Copy link
Contributor

vzhukovs commented Apr 3, 2020

Will take a look asap

@vparfonov
Copy link
Contributor Author

vparfonov commented Apr 3, 2020

FYI, I reproduced it on minishift

minishift v1.34.2+83ebaab

@vzhukovs
Copy link
Contributor

vzhukovs commented Apr 3, 2020

As far as I understand, changes related to storing private keys in secrets should not broke the flow of starting two dedicated workspaces. What situation with the only one workspace, does it start successfully?

@vinokurig
Copy link
Contributor

Looks like single workspace starts successfully

@vzhukovs
Copy link
Contributor

vzhukovs commented Apr 3, 2020

I suppose, then, it shouldn't be related to #14950

@skabashnyuk
Copy link
Contributor

More logs.

2020-04-03 09:51:09,540[ceSharedPool-26]  [ERROR] [.i.k.KubernetesInternalRuntime 259]  - Failure executing: POST at: https://172.30.0.1/api/v1/namespaces/skabashn/secrets. Message: object is being deleted: secrets "workspaceyy5nesnxw954tsbz-sshprivatekeys" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=secrets, name=workspaceyy5nesnxw954tsbz-sshprivatekeys, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=object is being deleted: secrets "workspaceyy5nesnxw954tsbz-sshprivatekeys" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
org.eclipse.che.workspace.infrastructure.kubernetes.KubernetesInfrastructureException: Failure executing: POST at: https://172.30.0.1/api/v1/namespaces/skabashn/secrets. Message: object is being deleted: secrets "workspaceyy5nesnxw954tsbz-sshprivatekeys" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=secrets, name=workspaceyy5nesnxw954tsbz-sshprivatekeys, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=object is being deleted: secrets "workspaceyy5nesnxw954tsbz-sshprivatekeys" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
	at org.eclipse.che.workspace.infrastructure.kubernetes.namespace.KubernetesSecrets.create(KubernetesSecrets.java:51)
	at org.eclipse.che.workspace.infrastructure.openshift.OpenShiftInternalRuntime.createSecrets(OpenShiftInternalRuntime.java:127)
	at org.eclipse.che.workspace.infrastructure.openshift.OpenShiftInternalRuntime.startMachines(OpenShiftInternalRuntime.java:112)
	at org.eclipse.che.workspace.infrastructure.kubernetes.KubernetesInternalRuntime.internalStart(KubernetesInternalRuntime.java:222)
	at org.eclipse.che.api.workspace.server.spi.InternalRuntime.start(InternalRuntime.java:141)
	at org.eclipse.che.api.workspace.server.WorkspaceRuntimes$StartRuntimeTask.run(WorkspaceRuntimes.java:920)
	at org.eclipse.che.commons.lang.concurrent.CopyThreadLocalRunnable.run(CopyThreadLocalRunnable.java:38)
	at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://172.30.0.1/api/v1/namespaces/skabashn/secrets. Message: object is being deleted: secrets "workspaceyy5nesnxw954tsbz-sshprivatekeys" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=secrets, name=workspaceyy5nesnxw954tsbz-sshprivatekeys, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=object is being deleted: secrets "workspaceyy5nesnxw954tsbz-sshprivatekeys" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}).
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:507)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:251)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:815)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:333)
	at org.eclipse.che.workspace.infrastructure.kubernetes.namespace.KubernetesSecrets.create(KubernetesSecrets.java:49)
	... 10 common frames omitted

Might be related to
fabric8io/kubernetes-client#1775
fabric8io/kubernetes-client#1840
See also
strimzi/strimzi-kafka-operator#2223

@vinokurig
Copy link
Contributor

vinokurig commented Apr 3, 2020

It is not related to the SSH plugin as well, because the plugin hadn't been started before the error.

@ibuziuk
Copy link
Member

ibuziuk commented Apr 3, 2020

@skabashnyuk so this is likely related to the k8s client lib update right?

@skabashnyuk
Copy link
Contributor

@skabashnyuk so this is likely related to the k8s client lib update right?

I don't know yet.

@ibuziuk ibuziuk changed the title Restart workspace fail on OpenShift Dedicated Restart workspace fail (OpenShift Dedicated, minishift) Apr 3, 2020
@dmytro-ndp dmytro-ndp mentioned this issue Apr 3, 2020
16 tasks
@tolusha
Copy link
Contributor

tolusha commented Apr 3, 2020

Deploy script is deprecated and removed.
Any issues caused by that script make no sense.

@ibuziuk
Copy link
Member

ibuziuk commented Apr 3, 2020

@tolusha the custom script was used for OSD deployment (there is no way to deploy on OSD any other way atm)
@vparfonov have you reproduced the issue on minishift via chectl ?

for the record, it looks like I can not reproduce on Hosted Che against 7.11 Snapshot redhat-developer/rh-che#1828

@vparfonov
Copy link
Contributor Author

@tolusha @ibuziuk yes have reproduced it with chectl and minishift

@sleshchenko
Copy link
Member

I faced the same issue today on the minikube + chectl + operator, but the error message was about self-signed-cert secret.

@sleshchenko
Copy link
Member

I can't reproduce this issue anymore when I use an image from #16540

@vparfonov
Copy link
Contributor Author

Me to, works well with #16540

@sparkoo
Copy link
Member

sparkoo commented Apr 3, 2020

I can't reproduce on minishift with chectl and operator...

@skabashnyuk skabashnyuk self-assigned this Apr 6, 2020
@skabashnyuk skabashnyuk added this to the 7.12 milestone Apr 6, 2020
@skabashnyuk
Copy link
Contributor

skabashnyuk commented Apr 6, 2020

k8s client 4.9.0

2020-04-06 08:21:14,766[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - --> DELETE https://172.30.0.1/api/v1/namespaces/skabashn/secrets/workspace1pej05ozhspqhloc-sshprivatekeys
2020-04-06 08:21:14,767[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Type: application/json; charset=utf-8
2020-04-06 08:21:14,767[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Length: 66
2020-04-06 08:21:14,767[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Authorization: Bearer XXXX
2020-04-06 08:21:14,767[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - 
2020-04-06 08:21:14,767[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - {"apiVersion":"v1","kind":"DeleteOptions","orphanDependents":true}
2020-04-06 08:21:14,767[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - --> END DELETE (66-byte body)
2020-04-06 08:21:14,775[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - <-- 200 OK https://172.30.0.1/api/v1/namespaces/skabashn/secrets/workspace1pej05ozhspqhloc-sshprivatekeys (7ms)
2020-04-06 08:21:14,775[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Audit-Id: 978dfc40-0d59-47c4-81ed-97694e28c2da
2020-04-06 08:21:14,775[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Cache-Control: no-cache, private
2020-04-06 08:21:14,775[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Type: application/json
2020-04-06 08:21:14,776[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Date: Mon, 06 Apr 2020 08:21:14 GMT
2020-04-06 08:21:14,776[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Transfer-Encoding: chunked
2020-04-06 08:21:14,776[aceSharedPool-1]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - 

kubernetes client 4.1.0

2020-04-06 09:07:15,240[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - --> DELETE https://172.30.0.1/api/v1/namespaces/skabashn/secrets/workspace1pej05ozhspqhloc-sshprivatekeys
2020-04-06 09:07:15,240[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Type: application/json; charset=utf-8
2020-04-06 09:07:15,240[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Length: 67
2020-04-06 09:07:15,241[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Authorization: XXXX
2020-04-06 09:07:15,241[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - 
2020-04-06 09:07:15,241[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - {"apiVersion":"v1","kind":"DeleteOptions","orphanDependents":false}
2020-04-06 09:07:15,241[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - --> END DELETE (67-byte body)
2020-04-06 09:07:15,250[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - <-- 200 OK https://172.30.0.1/api/v1/namespaces/skabashn/secrets/workspace1pej05ozhspqhloc-sshprivatekeys (9ms)
2020-04-06 09:07:15,250[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Audit-Id: fd8fcae6-e6e2-4934-8643-001e73fa35ee
2020-04-06 09:07:15,250[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Cache-Control: no-cache, private
2020-04-06 09:07:15,250[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Type: application/json
2020-04-06 09:07:15,250[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Date: Mon, 06 Apr 2020 09:07:15 GMT
2020-04-06 09:07:15,250[aceSharedPool-3]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Length: 193

See more fabric8io/kubernetes-client#1840

PR #16540

2020-04-06 09:24:52,420[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - --> DELETE https://172.30.0.1/api/v1/namespaces/skabashn/secrets/workspace1pej05ozhspqhloc-sshprivatekeys
2020-04-06 09:24:52,420[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Type: application/json; charset=utf-8
2020-04-06 09:24:52,421[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Length: 75
2020-04-06 09:24:52,421[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Authorization: Bearer xx
2020-04-06 09:24:52,421[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - 
2020-04-06 09:24:52,421[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - {"apiVersion":"v1","kind":"DeleteOptions","propagationPolicy":"Foreground"}
2020-04-06 09:24:52,421[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - --> END DELETE (75-byte body)
2020-04-06 09:24:52,429[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - <-- 200 OK https://172.30.0.1/api/v1/namespaces/skabashn/secrets/workspace1pej05ozhspqhloc-sshprivatekeys (8ms)
2020-04-06 09:24:52,429[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Audit-Id: 1ec0c9fe-40d3-4e94-880a-8ffd3904e6d2
2020-04-06 09:24:52,430[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Cache-Control: no-cache, private
2020-04-06 09:24:52,430[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Content-Type: application/json
2020-04-06 09:24:52,430[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Date: Mon, 06 Apr 2020 09:24:52 GMT
2020-04-06 09:24:52,430[aceSharedPool-0]  [INFO ] [.w.i.k.KubernetesClientFactory 203]  - Transfer-Encoding: chunked

@ibuziuk
Copy link
Member

ibuziuk commented Apr 8, 2020

@skabashnyuk am I correct that this issue is not severe e.g. even if the initial restart failed one of the subsequent should pass once the k8s-resources are finally deleted? So far we were not able to reproduce it on staging against 7.11.0 upstream and do not treat it as a blocker for production update

@skabashnyuk
Copy link
Contributor

am I correct that this issue is not severe e.g. even if the initial restart failed one of the subsequent should pass once the k8s-resources are finally deleted

I would say it could. Without any guarantee of course.

@ibuziuk
Copy link
Member

ibuziuk commented Apr 8, 2020

thanks, so our plan is to promote 7.11.0 and request 7.11.1 if this issue will become a problem. On the staging, we were not able to reproduce the issue so far #16475 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/che-server kind/bug Outline of a bug - must adhere to the bug report template. severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

9 participants