Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminal closes with error 'websocket: close 1006' #14271

Open
3 tasks done
matthiasdeblock opened this issue Jun 29, 2023 · 13 comments
Open
3 tasks done

Terminal closes with error 'websocket: close 1006' #14271

matthiasdeblock opened this issue Jun 29, 2023 · 13 comments
Labels
bug Something isn't working

Comments

@matthiasdeblock
Copy link

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

Opening a terminal in ArgoCD and let it rest for about 50 seconds results in an unresponsive terminal.

To Reproduce

Open a Pod terminal in ArgoCD and wait for 50 seconds.

Expected behavior

Continue to work in the shell even after 50 seconds inactivity.

Screenshots

Screenshot from the Response of a websocket terminal call:

image

Version

argocd: v2.4.28+598f792
  BuildDate: 2023-03-23T14:58:46Z
  GitCommit: 598f79236ae4160325b37342434baef4ff95d61c
  GitTreeState: clean
  GoVersion: go1.18.10
  Compiler: gc
  Platform: linux/amd64

Logs

time="2023-06-29T14:12:55Z" level=error msg="read message err: websocket: close 1006 (abnormal closure): unexpected EOF"
E0629 14:12:55.558269       1 v2.go:105] websocket: close 1006 (abnormal closure): unexpected EOF
@matthiasdeblock matthiasdeblock added the bug Something isn't working label Jun 29, 2023
@ebuildy
Copy link
Contributor

ebuildy commented Jul 7, 2023

could be fixed by #14192

@mateuszkozakiewicz
Copy link

mateuszkozakiewicz commented Jul 17, 2023

Also running into a similar issue, my connection is closed immediately and I've tried using portforward to the argocd-server pod so this is not loadbalancer fault
image

time="2023-07-17T20:44:42Z" level=info msg="terminal session starting" appNamespace=argocd application=whoami container=whoami namespace=web podName=whoami-web-app-5574fc8558-l9xq5 project=web-portfolio userName=admin                                   
time="2023-07-17T20:44:42Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=Watch grpc.service=application.ApplicationService grpc.start_time="2023-07-17T20:44:39Z" grpc.time_ms=3185.257 span.kind=server system=grpc      
time="2023-07-17T20:44:42Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=WatchResourceTree grpc.service=application.ApplicationService grpc.start_time="2023-07-17T20:44:39Z" grpc.time_ms=3183.268 span.kind=server syste2023/07/17 20:44:43 http: response.WriteHeader on hijacked connection from github.com/argoproj/argo-cd/v2/server/application.(*terminalHandler).ServeHTTP (terminal.go:245)                                                                                 2023/07/17 20:44:43 http: response.Write on hijacked connection from fmt.Fprintln (print.go:285)                                                          
time="2023-07-17T20:44:43Z" level=error msg="read message err: read tcp 127.0.0.1:8080->127.0.0.1:37252: use of closed network connection"
time="2023-07-17T20:44:43Z" level=error msg="read message err: read tcp 127.0.0.1:8080->127.0.0.1:37252: use of closed network connection"
time="2023-07-17T20:44:43Z" level=error msg="read message err: read tcp 127.0.0.1:8080->127.0.0.1:37252: use of closed network connection"
time="2023-07-17T20:44:43Z" level=error msg="read message err: read tcp 127.0.0.1:8080->127.0.0.1:37252: use of closed network connection"
E0717 20:44:43.290930       7 v2.go:105] EOF
E0717 20:44:43.290932       7 v2.go:105] EOF
E0717 20:44:43.290931       7 v2.go:105] EOF
E0717 20:44:43.290935       7 v2.go:105] EOF

ArgoCD 2.7.7, helm-chart 5.38.1
values file:

configs:
    cm:
        exec.enabled: true
    params:
        server.insecure: true

@matthiasdeblock
Copy link
Author

could be fixed by #14192

Is it possible to backport this to 2.4 and later?

@crenshaw-dev
Copy link
Member

crenshaw-dev commented Aug 11, 2023

@matthiasdeblock we no longer support anything earlier than 2.6.

@bradenwright-opunai
Copy link

Im running into the same symptoms but I'm on v2.8.0+804d4b8 anyway to know if the fix should be included or not.

@bradenwright-opunai
Copy link

bradenwright-opunai commented Sep 12, 2023

Fwiw, I tried to upgrade to v2.8.3+77556d9 but it still seems to hang, any idea of what version got fixed or if there is a further problem. Also probably worth mentioning that currently ArgoCD is deploy in GKE using GCE-Ingress

@bradenwright-opunai
Copy link

FWIW, I also tried to test when port forwarding to ArgoCD server, and it works without issues. So feel like something with the load balancer. Doesn't feel like a timeout bc its quick. I can setup a backednconfig for session affinity if needed, but I'd expect the docs to say if that was a requirement.

https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-configuration

@bradenwright-opunai
Copy link

bradenwright-opunai commented Sep 13, 2023

Im leaving some comments on #14192 as well, but from what I can tell the terminal is locking in under 60 secs, like 15-45 secs and it locks so the keep alive being at 60 secs isn't resolving my issue (best I can tell). I did just see that the timeout for the LB is 30 secs so let me try to make that value longer.

@bradenwright-opunai
Copy link

Alright so in GCP the default timeout for a backend service is 30 secs, with the default settings the terminal was hanging. After increasing that timeout > 60 secs (currently set for 3600 secs ie 1 hr) and I've been able to wait 10+ mins and return to a working terminal. Everything is working now as expected.

I would recoomend that the docs https://argo-cd.readthedocs.io/en/stable/operator-manual/web_based_terminal/ be updated to call out the 60 sec check that now exists and that for LB's the timeout needs to be > 60 secs

@Arulaln-AR
Copy link

@erhudy , is the fix available in v2.7.3 argocd version. We are facing the same issue where in our terminal closes around 60 seconds of inactivity.

@bravosierrasierra
Copy link

bravosierrasierra commented Oct 19, 2023

Same problem is here. Upgrade to 2.8.4 nothing changed.

Problem appeared after kubernetes upgrade from 1.22 with cilium to 1.25 with cilium on same cloud provider.
Disabling network policies nothing changed.

Exposing argocd without cloud NLB nothing changed: terminal freezes in 30-60 seconds.

~/git/notes/org-mode $ kubectl port-forward service/argocd-server -n argocd 30000:80
Forwarding from 127.0.0.1:30000 -> 8080
Forwarding from [::1]:30000 -> 8080
Handling connection for 30000
Handling connection for 30000
Handling connection for 30000
Handling connection for 30000
Handling connection for 30000
error: lost connection to pod
~/git/notes/org-mode $ 

Ingress-nginx have extended timeout annotation

annotations:
    ingress.kubernetes.io/proxy-body-size: 100M
    kubernetes.io/ingress.class: "nginx"
    ingress.kubernetes.io/app-root: "/"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"

in logs:


level=error msg="read message err: websocket: close 1006 (abnormal closure): unexpected EOF"

level=info msg="finished unary call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = no session information" grpc.code=Unauthenticated grpc.method=List grpc.service=application.ApplicationService grpc.start_time="2023-10-19T13:12:29Z" grpc.time_ms=24.735 span.kind=server system=grpc

Argocd-server with Oauth2 integration with Keycloak: other UI-elements works as expected

@bravosierrasierra
Copy link

found merged MR #14192 for terminal keepalive. Enabled websocket tracing (https://developer.chrome.com/blog/new-in-devtools-74/#binary) and dont see any pings in websocket messages
image

@bravosierrasierra
Copy link

Same problem is here. Upgrade to 2.8.4 nothing changed.

Problem appeared after kubernetes upgrade from 1.22 with cilium to 1.25 with cilium on same cloud provider. Disabling network policies nothing changed.

my problem was containerd, restarted every minute via mistake cron job. Sorry for my mistake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants