Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Web Socket error #324

Merged
merged 5 commits into from
Dec 30, 2020
Merged

Conversation

compulim
Copy link
Collaborator

Reintroducing PR #170 and workarounding RxJS issue.

There is a bug in RxJS retryWhen operator that caused observable to be re-subscribed more than once, after only one error is raised. That means, the RxJS bug could cause one error to trigger reconnect twice.

We wrote a test case that:

  • Disable retry delay
  • Mock all HTTP/WS connections
  • After WS connection is up, kill it after 10 ms
  • Run the test case for 2 seconds and count how many errors and retries
    • Within 2 seconds, given each of them will error out in 10 ms, it can only run up to ~200 retries

Without onerror

image

Within 2 seconds, we are seeing 88 errors injected, and 88 reconnections.

With onerror

image

Within 2 seconds, we are seeing 767 errors injected, and 1023 reconnections.

Takeaways:

  • 1 error can cause 1.33 reconnections
  • Theoretically, only 200 errors can happen in 2 seconds. But 767 errors were recorded

To mitigate, we manually added a flag closed. This flag guard observer.error() will only be called once. We believe if it observer.error() is called twice (by onerror, followed by onclose event), retryWhen could triggered retry twice (or more).

We added a test case to make sure all numbers are under control in the future.


// RxJS.retryWhen has a bug that would cause "error" signal to be sent after the observable is completed/errored.
// We need to guard against extraneous "error" signal to workaround the bug.
closed || subscriber.error(close);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error [](start = 37, length = 5)

normal web socket close should lead to non-error RX stream completion, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am keeping the original logic as-is, i.e. onclose will throw an error. And I think it's intentional.

If you look at line 910,

// WebSockets can be closed by the server or the browser. In the former case we need to
, the comment said:

this.observableWebSocket<ActivityGroup>()
  // WebSockets can be closed by the server or the browser. In the former case we need to
  // retrieve a new streamUrl. In the latter case we could first retry with the current streamUrl,
  // but it's simpler just to always fetch a new one.
  .retryWhen(error$ => error$.delay(this.getRetryDelay(), this.services.scheduler).mergeMap(error => this.reconnectToConversation()))

Looks like the author of the logic intentionally want to retry whenever the server or browser close the connection. And to do that, the code need to throw error on graceful close.

What do you think?

Copy link
Member

@willportnoy willportnoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

Copy link
Member

@willportnoy willportnoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@compulim compulim merged commit ba7c42b into microsoft:master Dec 30, 2020
@compulim compulim deleted the feat-handle-ws-error branch December 30, 2020 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants