-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preventing false positive alarms of offline mode #9068
Conversation
c2bcc6f
to
ad7bf94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no timeout detection anymore or did I miss it?
edit:
The problem with lengthy ping request can be easily reproduced by opening the big files in the editor, which are more than 6-7 megabytes. This usecase also was reproduced on master, also on Google Cloud Shell Editor and OpenShift platforms.
So we get rid of the timeout detection, and just wait for the connection to be completely closed to mark the status as offline?
|
||
@inject(ILogger) | ||
protected readonly logger: ILogger; | ||
|
||
constructor(@inject(ConnectionStatusOptions) @optional() protected readonly options: ConnectionStatusOptions = ConnectionStatusOptions.DEFAULT) { } | ||
protected constructor(@inject(ConnectionStatusOptions) @optional() protected readonly options: ConnectionStatusOptions = ConnectionStatusOptions.DEFAULT) { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why protected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered in #9068 (comment)
this.wsConnectionProvider.onSocketOpened(() => { | ||
this.updateStatus(true); | ||
this.schedulePing(); | ||
}); | ||
this.wsConnectionProvider.onSocketClosed(() => { | ||
this.clearTimeout(this.scheduledPing); | ||
this.updateStatus(false); | ||
}); | ||
this.wsConnectionProvider.onIncomingMessageActivity(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this, with this change we update the status when a connection is opened, when the connection closes, and when we receive messages. I could not find a place where schedulePing
would timeout though. It will set the status to false
when an error occurs, but that doesn't account for timeouts IIUC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will set the status to false when an error occurs, but that doesn't account for timeouts IIUC?
Yeah, that's the point of this PR. Because timeout doesn't indicate that we have a broken connection, which is displayed as "offline" mode. The main problem is that when "offline" mode is triggered by timeout, the ping request receives a response from the server, then "online" status is triggered immediately after ping request. And this is quite false positive displaying the status, because user is able to call any other requests, e.g. create file, open it in the editor, open the terminal and so on.
Hi @marechal-p,
Yep, that's true, no timeout detection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few comments here and there. I will try it in action later today.
public onSocketOpened: Event<void> = this.onSocketOpenedEmitter.event; | ||
|
||
protected readonly onSocketClosedEmitter: Emitter<void> = new Emitter(); | ||
public onSocketClosed: Event<void> = this.onSocketClosedEmitter.event; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public
is superfluous.- Should be
readonly
. - Nameing:
onSocketDidClose
.
Same for open
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed the names for the following emitters.
@@ -45,7 +51,11 @@ export class WebSocketConnectionProvider extends AbstractConnectionProvider<WebS | |||
const url = this.createWebSocketUrl(WebSocketChannel.wsPath); | |||
const socket = this.createWebSocket(url); | |||
socket.onerror = console.error; | |||
socket.onopen = () => { | |||
this.onSocketOpenedEmitter.fire(undefined); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the calls to protected methods. So one can override and customize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. There are two new protected methods which are fire the events.
socket.onclose = ({ code, reason }) => { | ||
this.onSocketClosedEmitter.fire(undefined); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be called after calling channel.close(code, reason);
on each channel, not before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, good catch, thanks! Fixed.
@vzhukovs, we should note this change for both devs and users in the changelog. Please reference your PR description at least. Thank you! |
this.updateStatus(false); | ||
await this.logger.trace(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: This code never happens. Once the WS connection is closed, none of the proxy calls reject but wait until there is a new connection, or forever: #8803 (comment)
I do not mind if you leave it there, but you can simplify with this.logger.error(e)
as your code does not do what you want to do. trace
level is useless here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed from trace to error.
Once the WS connection is closed, none of the proxy calls reject but wait until there is a new connection, or forever: #8803 (comment)
AFAIU, this will happened only on browser environment, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU, this will happened only on browser environment, right?
In both. The PR was never merged.
I like the idea of this change, however, I can still see the false-positive offline status. See attached screencast: screencast.2021-02-19.14-21-01.mp4(The screencast is from electron, but the browser version has the same problem. Just get any random 10MB file, open it, scroll in the editor, you will see, that app will "go offline".) |
@kittaakos, you mean this https://github.com/eclipse-theia/theia/blob/master/CHANGELOG.md ? |
Will try to check this out. Thanks for pointing this! |
@kittaakos, sorry for late response, found that sometimes electron can't rebuild properly, had the similar issue as here: https://community.theia-ide.org/t/electron-start-error/1464/2 Checked this use case on electron build with file size of 20 megabytes and got the following result: https://drive.google.com/file/d/1NIOGpfyT10F0JW7zeYzQVt1Mhdo2YTNF/view?usp=sharing (screencast). Couldn't reproduce this issue. cc @azatsarynnyy can you also check plz, on your side whether you have such issue? |
Changelog has been updated. |
@vzhukovs can't reproduce it on my side - no false-positive offline status |
@vzhukovs can we merge it? |
I want to give it a try again. We can merge it tomorrow if you all are OK with it. |
Sounds good. Thanks @kittaakos! |
Thank you so much for your patience. I do the verification tomorrow ASAP. |
Unfortunately, I still see the Offline status. I downloaded and used a 10 MB test file from here: https://www.engineerhammad.com/2015/04/Download-Test-Files.html screencast.2021-03-02.10-26-11.mp4I tried with the c848002 reivison. |
Finally, I was able to reproduce it. Also I've just checked this use case on master branch, and there is the same behavior, but what I've noticed, is that logs contain a bunch of records about disposing connection from the server side during the opening a big file: And client tries to reconnect to the server side after each disconnection. I've noticed, that this behavior is reproducible only when there is a big project opened in Theia. When there is an empty workspace, there isn't such issue with disconnecting websocket channel. So I can assume, that this is not a regression by this PR. We might register a dedicate issue for this specific use case, which is rare, but anyway. wdyt? |
OK, I am fine with it. I thought the purpose of this PR is to fix the false positive offline status issue, which is still present with the proposed changes. I do not want to hold you back, so go ahead if you think it is done. |
Signed-off-by: Vladyslav Zhukovskyi <vzhukovs@redhat.com>
7e382cf
to
d7dbea3
Compare
Thanks for you help! Just squashed commits into one, waiting for the finished build actions. |
What it does
The substance of the current change proposal in modification of mechanism, that is responsible for monitoring the state of client activity. The problem lay in the way that there is a base abstract class, which has the timer, that activates the offline mode within a specified period of time, if there isn't any incoming activity from the client. And there is a class, which listens for websocket incoming activity, which also has another timer, that activates within the same period of time as the previous one, minus one second, to make a ping request and receives the pong response. That's how it's implemented in master. The main challenge is when these timers activates, the client has only one second making a ping request and receive the pong response. If the ping request executes more than one second, the client indicates that the application is in offline mode, when, in fact server responses to requests, but with some delay. This is a false positive trigger. The ping request might executes more than one second, but in that case the host operating system is under heavy system load or has limited resources. It is a very common scenario, when IDE runs as containerized application, for example in kubernetes.
To avoid false positive trigger was invited to refuse to use the timer which is located in an abstract class, which is responsible for triggering the online/offline mode. In the meantime, class, that listens to websocket activity to add an ability to listen to the websocket open and close event. Websocket closing triggers by loosing the connection. This might be the disconnecting the server, losing the internet connection on the user's side. When the user is losing the internet connection, reconnecting-websocket library is trying to restore connections and in the process, fire the event about websocket close and this is a clear message, that we can show offline mode. When the reconnecting-websocket restores the connection, it fires event about websocket open and this is an indicator for online mode.
The problem with lengthy ping request can be easily reproduced by opening the big files in the editor, which are more than 6-7 megabytes. This usecase also was reproduced on master, also on Google Cloud Shell Editor and OpenShift platforms.
The current change proposal also provides test covering common scenarios for the websocket behavior and indicating the application state, in short online/offline mode.
Reproduces:
Usecase with opening big file in Theia that built from master:
Screencast: https://drive.google.com/file/d/1alsddSf2pY9XfFL2GQcOGKCI2rpC3eZU/view?usp=sharing
Usecase when response for pong request is made with emulated delay:
Screencast: https://drive.google.com/file/d/1NzaGdH-dkZUbGP3cWjFmTpLVZvOKmyBx/view?usp=sharing
To reproduce this usecase, Theia needs to be patched with the following patchfile and built:
Screencast: https://drive.google.com/file/d/15KiJmLdmNvRBj3hNWmLRvcbLoJKoLJKM/view?usp=sharing
Additional context
linked comment: eclipse-che/che#18472 (comment)
fixes: eclipse-che/che#18723
Signed-off-by: Vladyslav Zhukovskyi vzhukovs@redhat.com
How to test
Build and run Theia from the current branch. Create a few big files (it would be enough 7-8 megabytes). Try to reopen them sequentially. Additionally, the host operating system can be a bit overloaded by calling the following command in the terminal:
So there shouldn't any offline marker appeared, because the network connection is not broken.
Screencast: https://drive.google.com/file/d/1xWXQ1uGGqQNJS_miPvEQySDb0aX8HtxN/view?usp=sharing
Review checklist
Reminder for reviewers