Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Terminal and linux shell different from bash #9641

Closed
odedonato opened this issue Jun 13, 2022 · 29 comments · Fixed by #9895
Closed

Problem with Terminal and linux shell different from bash #9641

odedonato opened this issue Jun 13, 2022 · 29 comments · Fixed by #9895
Labels
bug Something isn't working component:web-terminal

Comments

@odedonato
Copy link
Contributor

Describe the bug

Tried the new most wanted feature, Terminal, it works with linux container with bash but got some problem with container Alpine based.

In screenshot, tried on busybox:stable container (it's in loop to keep it running) got # after OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "bash": executable file not found in $PATH: unknown but cannot use it (cannot write)

Tried with some other container based on node:lts-alpine and got only black screen with blinking cursor.

Shell is working with same container using Rancher and Lens.

To Reproduce

Click on Terminal on Alpine based container

Expected behavior

Get shell from container and possibility to use it

Screenshots
image

Version

2.4.0+91aefab
@odedonato odedonato added the bug Something isn't working label Jun 13, 2022
@mvpwar
Copy link

mvpwar commented Jun 13, 2022

use nginx image instead, it works

@odedonato
Copy link
Contributor Author

odedonato commented Jun 13, 2022

@mvpwar thanks for your reply

I tried with nginx:1.19.1 and it works but with nginx:1.19.1-alpine not work, black screen with blinking cursor.

@mihma
Copy link

mihma commented Jun 13, 2022

Same issue for me when using node:14.16.0-alpine base image

@crenshaw-dev
Copy link
Member

Can y'all check the browser's console and/or network dev tools for errors when connecting to the alpine pod?

@odedonato
Copy link
Contributor Author

Hi @crenshaw-dev

After a while I got a 504 Gateway timeout as you can see on the screenshot

image

With other containers, not alpine, on same cluster all is working like a charm

@mihma
Copy link

mihma commented Jun 14, 2022

Hi @crenshaw-dev ,
I get blinking cursor with never dropping connection to the terminal as well as some weird font download error.

PS. It works fine on non-alpine pods: working terminal and zero font errors.

Screenshot 2022-06-14 at 15 45 11

@arthurk
Copy link
Contributor

arthurk commented Jun 15, 2022

It fails for our alpine based images too. It works when bash is installed via apk add bash

The error I get in argocd-server is:

{"level":"error","msg":"read message err: websocket: unexpected reserved bits 0x60","time":"2022-06-15T04:15:27Z"}

@djfinnoy
Copy link
Contributor

Same issue here, works fine with containers that have bash, weird websocket issues with containers that only have sh. Black screen for containers with neither.

@crenshaw-dev
Copy link
Member

From #9425 :

Looks like ArgoCD is having issues spawning a terminal with the alpine ash shell.

It works after adding bash package:

apk add bash

Is it possible we just need to add ash as a shell to try here?

validShells := []string{"bash", "sh", "powershell", "cmd"}

@odedonato
Copy link
Contributor Author

@crenshaw-dev I was wondering if is useful and possible to put validShells in a ConfigMap, so we can add our shell, like pwsh, ash, etc.
In that way is not binded in the code.

@crenshaw-dev
Copy link
Member

@odedonato making it configurable is probably a good idea. But I'm not yet convinced that the (only) problem is not having enough shells in that array. If someone has time, a quick test to see if adding ash fixes Alpine would be helpful.

@odedonato
Copy link
Contributor Author

@crenshaw-dev I downloaded the code and added ash then builded the container and tried but didn't work, same black screen with blinking cursor.

@arthurk
Copy link
Contributor

arthurk commented Jun 28, 2022

I don't think adding more shells to the list is going to fix the problem. I often use k9s to launch shells in containers and it works well. Their code looks like this:

	shellCheck = `command -v bash >/dev/null && exec bash || exec sh`

https://github.com/derailed/k9s/blob/fc8ffe5d37d068f14e86aed7e8b847d016c99b19/internal/view/exec.go#L30

I can also run an alpine image and exec sh in the CLI which works:

kubectl run --rm -i -t --image=alpine mytest sh

However using the same image it doesn't work in the web ui.

I would be interested to see if putting sh before bash would show a different result in Argo CD

@odedonato
Copy link
Contributor Author

odedonato commented Jun 28, 2022

@crenshaw-dev turns out that the black screen it's only on local k8s cluster where is located argocd and it's because it was a lack of permission on k8s RBAC, in ClusterRole:argocd-server added:

- apiGroups:
  - ""
  resources:
  - pods/exec
  verbs:
  - create

After that I got the error OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "bash": executable file not found in $PATH: unknown same error in the screenshot in the first post, got # but not usable, so I tried what suggested by @arthurk.

I built argo with sh first and it works, but it opens only sh on all containers. It's just a workaroud imho.

I found this:

// FIXME: if the first shell fails then the first keyboard event is lost

It's exactly what happen!

@elmariofredo
Copy link

@yeya24 can we maybe switch to sh as a first shell until this is resolved? This issue make this feature pretty much unusable for most of containers.

@yeya24
Copy link
Contributor

yeya24 commented Jul 4, 2022

I am aware of this issue but there might be other issues more than the shell order. For k8s dashboard code, it uses the same shell order but it can enable exec for alpine images using sh.
There might be some bugs related to the session close step. If the error can be detected then ideally sh will be tried after we found bash is not working.

@elmariofredo
Copy link

It feels like this issue will be bit harder to figure out, as there are multiple libraries involved. My suggestion is to quick fix this issue, by temporary removing bash shell from list which would resolve this issue for most containers. When problem shell test is resolved we can put bash back. WDYT?

@yeya24
Copy link
Contributor

yeya24 commented Jul 4, 2022

It feels like this issue will be bit harder to figure out, as there are multiple libraries involved. My suggestion is to quick fix this issue, by temporary removing bash shell from list which would resolve this issue for most containers. When problem shell test is resolved we can put bash back. WDYT?

I don't have any preference. Feel free to open a pr to change the order and ArgoCD maintainers can decide whether they are willing to accept this change or not. Again maybe a configmap for shells so that users can reorder the shells as they want would be more flexible.

@crenshaw-dev
Copy link
Member

crenshaw-dev commented Jul 4, 2022

I’m in favor of that change, and I agree that a configurable list makes sense. I think the config can go in ‘argocd-cmd-params-cm’.

@m-yosefpor
Copy link
Contributor

m-yosefpor commented Jul 5, 2022

I’m in favor of that change, and I agree that a configurable list makes sense. I think the config can go in ‘argocd-cmd-params-cm’.

I'm not against this feature, but I feel this feature might not be useful for most of the use-cases, as logical order of shells for a simple pod terminal is what currently listed in the code (as bash and sh are the most common shells which could be found in container images). Also it doesn't really fixes the retrying next shell issue (which is the actual problem here), since as I wrote in here, we might still want bash to be the first shell and retry sh in case it is missing. Having a global config for it doesn't really solve this exact problem (of course it could be a tmp workaround).

@crenshaw-dev
Copy link
Member

of course it could be a tmp workaround

I'm satisfied with a temporary workaround for now. This will solve a headache for 90% of use cases and will buy us time to figure out the 10%.

Making the list configurable is a win regardless, because people might want to add options that are not currently in the hardcoded list (e.g. ash).

@crenshaw-dev
Copy link
Member

I think I actually found the real issue. Can someone test the PR to confirm? #9895

@elmariofredo
Copy link

I think I actually found the real issue. Can someone test the PR to confirm? #9895

I can confirm that your PR is solving the issue. Even that bash will fail, switch to sh is working and terminal is usable 🎉

Thanks!

image

@michalziobro
Copy link

I'm still having this issue with 2.4.4 version:
argocd-server log:

E0712 10:17:06.186996       1 v3.go:79] EOF
9
E0712 10:17:06.429485       1 v2.go:105] EOF
8
time="2022-07-12T10:17:07Z" level=error msg="read message err: websocket: unexpected reserved bits 0x40"
7
E0712 10:17:07.727456       1 v2.go:105] websocket: unexpected reserved bits 0x40

blinking cursor in terminal tab, cannot type anything:
image

Terminal works ok with bash.

@arthurk
Copy link
Contributor

arthurk commented Jul 12, 2022

@michalziobro the fix is not in 2.4.4, I hope it will be available in the next release

@crenshaw-dev
Copy link
Member

Yep, I'll release the fix in the next few hours. It'll be in 2.4.6, since I've got some other stuff queued up for 2.4.5.

@arthurk
Copy link
Contributor

arthurk commented Jul 13, 2022

I've upgraded to 2.4.6 and can confirm that the web terminal feature works well now

@S0n98
Copy link

S0n98 commented Dec 21, 2022

I'm also facing an issue with web shell, got only black screen with blinking cursor, but the logs I got from argocd-server are quite different:

2022/12/21 08:03:07 http: response.WriteHeader on hijacked connection from github.com/argoproj/argo-cd/v2/server/application.(*terminalHandler).ServeHTTP (terminal.go:237)
71
2022/12/21 08:03:07 http: response.Write on hijacked connection from fmt.Fprintln (print.go:265)

Image used: linuxserver/openvpn-as:version-2.9.0-5c5bd120-Ubuntu18

Argocd :

{
    "Version": "v2.5.2+148d8da",
    "BuildDate": "2022-11-07T16:42:47Z",
    "GitCommit": "148d8da7a996f6c9f4d102fdd8e688c2ff3fd8c7",
    "GitTreeState": "clean",
    "GoVersion": "go1.18.8",
    "Compiler": "gc",
    "Platform": "linux/amd64",
    "KustomizeVersion": "v4.5.7 2022-08-02T16:35:54Z",
    "HelmVersion": "v3.10.1+g9f88ccb",
    "KubectlVersion": "v0.24.2",
    "JsonnetVersion": "v0.18.0"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component:web-terminal
Projects
None yet
Development

Successfully merging a pull request may close this issue.