Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Che Terminal Window Won't Connect #19201

Closed
4 tasks done
dymart opened this issue Mar 3, 2021 · 19 comments
Closed
4 tasks done

Che Terminal Window Won't Connect #19201

dymart opened this issue Mar 3, 2021 · 19 comments
Labels
area/editor/theia Issues related to the che-theia IDE of Che kind/bug Outline of a bug - must adhere to the bug report template. new&noteworthy For new and/or noteworthy issues that deserve a blog post, new docs, or emphasis in release notes severity/P1 Has a major impact to usage or development of the system.

Comments

@dymart
Copy link

dymart commented Mar 3, 2021

Describe the bug

The terminal connection is inconsistent after a workspace is stopped and started again.
When the workspace is first created everything works great but when the workspace is stopped and started again there is regularly a problem connecting to the terminal window in a workspace. Everything else works correctly and you can browse files and edit them but opening a terminal window in a container only results in a blank terminal window.

Che version

  • latest
    the version of che is 7.26.2 installed with the operator also 7.26.2

Steps to reproduce

  1. Start a new workspace
  2. Check that the terminal connects and works. Write down the ingress that the websocket call to connect to the terminal window. Usually of the form: server5etxiotr-jwtproxy-server-4400.{custom domain}. Get this from your kubernetes cluster.
  3. Stop the workspace
  4. check that the ingress endpoints for that workspace have been deleted
  5. Restart the workspace
  6. check that new ingress endpoints have been created
  7. Connect to the new workspace and open the terminal, it usually doesn't connect. (this can be inconnsistent as it seems to be a problem passing the updated value for the new ingress for jwtproxy for websocket to use) This error can be inconsistent but has been occurring on my installation on semi regular basis which makes continued work on eclipse che difficult.
  8. Check what url the websocket is trying to connect to in web browser dev tools
  9. When there is an error connecting to the terminal with a websocket check if websocket is still trying to call the original ingress for that service. And check that against the value of the new ingress for that service.

Expected behavior

After restarts workspace terminal connects consistently.

Runtime

kubernetes ( version 1.17 and 1.18, tested separately ) instillation
nginx (app.kubernetes.io/version: 0.41.0) as a proxy.

Installation method

che operator 7.26.2

Environment

  • Cloud
    • Azure
    • GCE
      this was reproduced on both gke and azure

Eclipse Che Logs

If you open dev tools in your browser and the terminal is not connecting you can see an error that says:
WebSocket connection to 'wss:// {che jwtproxy server url} {token} failed:

The che jwtproxy server url during an error is usually an old or incorrect version of the url so that inigress does not exist anymore.
I have not encountered a terminal problem that did not come from calling the wrong url from websocket. This would say that the logic for terminal connection is correct and there is simply a problem passing or making sure that websocket calls the correct url.

Additional context

After trying to find the root cause of the issue for a while I might have an idea where this comes from. The {che jwtproxy server} url is created and destroyed each time the workspace is created stopped or deleted. When it's recreated it's named after the service that it is routing to ex: url: serverfeeclvsk-jwtproxy-server-4400 = service: serverfeeclvsk-jwtproxy. But after creating stopping and deleting workspaces a few times it seems as though the url is not correctly propagated through after it has changed on a restart. This causes a problem because the websocket cannot connect to the pod. But no errors show up anywhere because it's not even trying to contact the right ingress. The old ingress it's trying to contact doesn't exist anymore.

Service that websocket tries to connect to:
{service name for jwtproxy changes each time}-jwtproxy-server-4400.{custom domain}
ex.
server5etxiotr-jwtproxy-server-4400.{custom domain}

This problem can infrequently be resolved by reloading the page or restarting the workspace but this is not guarenteed and creates frustration for a user. I've tried reloading and restarting workspaces multiple times after encountering this error ternminal error only to eventually delete the workspace and create it again.

I really enjoy using eclipse che and hope this helps to find the root cause of the problem. Thanks

@dymart dymart added the kind/bug Outline of a bug - must adhere to the bug report template. label Mar 3, 2021
@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Mar 3, 2021
@RomanNikitenko
Copy link
Member

The similar issue was reported recently #19124

@tsmaeder
Copy link
Contributor

tsmaeder commented Mar 4, 2021

@dymart I assume the problem also extends to executing Che commands?

@dymart
Copy link
Author

dymart commented Mar 4, 2021

Hi @tsmaeder, I'm not exactly sure what you mean by executing Che commands in this context. If you could provide an example I could test it against my deployment and see what happens. Hopefully this could help narrow down the problem. Thanks

@SliceOften
Copy link

SliceOften commented Mar 4, 2021

This sounds similar to what I am also experiencing.

Deployed on Openshift 4 if you leave the terminals / commands open they won't rehook after restarting the workspace, seems like they are using the old routes/ingress which has been deleted and recreated upon stopping and starting the workspace.

Within the browser development tools I can see calls to a wss url, but checking the openshift console, that route doesn't exist anymore. So possibly something is caching the previous route/ingress to the jwtproxy and not getting the current sessions routes.

@tsmaeder
Copy link
Contributor

tsmaeder commented Mar 5, 2021

@dymart I mean commands you can define in the devfile. They show up in the "My Workspace" sidebar on the right.

@amisevsk amisevsk added area/editor/theia Issues related to the che-theia IDE of Che severity/P1 Has a major impact to usage or development of the system. and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Mar 5, 2021
@dymart
Copy link
Author

dymart commented Mar 5, 2021

@tsmaeder thanks for the clarification. I could not execute those commands either when the workspace was experiencing connection problems with terminal. The same WebSocket connection to 'wss{ old ingress}' failed error was present when trying to execute those commands as well.

Hopefully this helps

@RomanNikitenko
Copy link
Member

RomanNikitenko commented Mar 13, 2021

@dymart
thank you very much for the great investigation!

@benoitf @azatsarynnyy @tsmaeder
I investigated the issue and taking into account the info from the Additional context section I believe the problem is NOT on che-theia side.

Sure we cache the value for terminalApiEndPoint on che-theia side to avoid delays at every terminal creation, but at refreshing a page we request a new value using API. So, refreshing a page should fix the issue if the problem is really related to the cached value on che-theia side.
Looks like it's not the case:

I've tried reloading and restarting workspaces multiple times after encountering this error ternminal error only to eventually delete the workspace and create it again.

To investigate the issue I added a request to get the actual value of terminalApiEndPoint using API at each terminal creation.
Also I added two logs to display cached and not cached value when a user creates a new terminal.
I tested on:

  • minikube
  • dogfooding instance
  • using developer-sandbox instance

Unfortunately I couldn't reproduce the issue.
I see the similar issue created by @tsmaeder, so Thomas could you help me when you have a chance.

I think you could:

  • add the following to a devfile for testing to use cheEditor with my logs
- type: cheEditor
    reference: 'https://raw.githubusercontent.com/RomanNikitenko/che-plugin-registry/master/v3/plugins/eclipse/che-theia/custom-che-theia-editor.yaml'

or use rnikitenko/che-theia:testTerminal image for your yaml file

  • you can check logs in the browser console when you reproduce the terminal related problem (please see the screenshot below)
  • it's important at this step to compare the cached value, not cached value and actual value from openshift console(for dogfooding instance) / dashboard of minikube (if you use minikube for testing).

So:

  • using API che-theia gets an incorrect value of terminalApiEndPoint if cached value === not cached value but !== actual value from the openshift console/dashboard of minikube
  • the problem is on che-theiaside if not cached value === actual value from the openshift console/dashboard of minikube

minikube_dashboard_machine_exec
cached_value

@dymart
Copy link
Author

dymart commented Mar 16, 2021

Hi @RomanNikitenko,

Thank you for looking into it!

I will try upgrading to the dogfooding instance and hopefully I can get the same outcome you had and the problem will not be there. If there is a problem I will try your recommendations to get more information to help with debugging and post it back here.

Thanks for linking to the devfile and creating an image to make debugging easier. That is really appreciated!

@azatsarynnyy
Copy link
Member

Most likely, this PR eclipse-che/che-theia#1036 fixes the issue.
It would be nice if someone could test it.

@RomanNikitenko
Copy link
Member

@dymart
Sorry, maybe I missed that - could you clarify - you had the problem with restoring a terminal after restarting a workspace or with a new terminal creation after restarting a workspace.

@RomanNikitenko
Copy link
Member

I found the reliable way to reproduce the issue: eclipse-che/che-theia#1036 (comment)
The problem should be fixed by eclipse-che/che-theia#1036

@dymart
Copy link
Author

dymart commented Mar 18, 2021

@RomanNikitenko
I had a problem restoring a terminal after restarting. I also tried to open multiple new terminals after a restart as well and all the new terminals had the same error as well. Glad to hear you could reproduce the issue and that pr 1036 has fixed it!

@azatsarynnyy
Hopefully that pr fixes the problem :)

Thank you to everyone who looked into this issue!

Hopefully we can close this issue once it's resolved in a nightly release or in the 7.28 release it seems to have been added to!

@azatsarynnyy
Copy link
Member

@dymart you can check if it fixed on your side. Just switch che-theia to next version in your devfile:

components:
  - id: eclipse/che-theia/next
    type: cheEditor

@azatsarynnyy
Copy link
Member

I'm closing it as fixed by eclipse-che/che-theia#1036
Feel free to reopen it if it still reproducible.

@dymart
Copy link
Author

dymart commented Mar 22, 2021

@azatsarynnyy
I can still reproduce the original problem on version 7.27.2

and when I try to upgrade:
components:

  • id: eclipse/che-theia/next
    type: cheEditor

I can't open new terminals and get the errors:
root ERROR Failed to get remote terminal server api end point url. Cause: Found ${terminalComponents} components (should only have one)

logger-protocol.ts:112 root ERROR Failed to create terminal widget. Cause: TypeError: Cannot read property 'toString' of undefined

exec-terminal-contribution.ts:209 Uncaught (in promise) Error: Unable to create new terminal for machine: theia-ide8oi
at n. (exec-terminal-contribution.ts:209)

@RomanNikitenko
Copy link
Member

RomanNikitenko commented Mar 23, 2021

@dymart
I believe it's related to configuration of the editor.

Could you try like here:

- id: eclipse/che-theia/next
  registryUrl: 'https://che-plugin-registry-main.surge.sh/v3'
  type: cheEditor

@azatsarynnyy
Copy link
Member

@RomanNikitenko I wonder what is https://che-plugin-registry-main.surge.sh/v3
Is it someone's personal registry? Why not the default registry?

@RomanNikitenko
Copy link
Member

@RomanNikitenko I wonder what is https://che-plugin-registry-main.surge.sh/v3
Is it someone's personal registry? Why not the default registry?

it's workaround.
The true way: using che-plugin-registry:next if you use eclipse/che-theia/next as the editor

@l0rd l0rd added the new&noteworthy For new and/or noteworthy issues that deserve a blog post, new docs, or emphasis in release notes label Mar 23, 2021
@dymart
Copy link
Author

dymart commented Mar 24, 2021

@RomanNikitenko
Thank you for the suggestion! Everything seems to be working now!

Thank you @RomanNikitenko, @azatsarynnyy and everyone else who helped with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/editor/theia Issues related to the che-theia IDE of Che kind/bug Outline of a bug - must adhere to the bug report template. new&noteworthy For new and/or noteworthy issues that deserve a blog post, new docs, or emphasis in release notes severity/P1 Has a major impact to usage or development of the system.
Projects
None yet
Development

No branches or pull requests

8 participants