Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sente not detecting non-terminating client disconnects? (e.g. sudden power loss / airplane mode) #230

Closed
altV opened this issue May 2, 2016 · 34 comments
Labels

Comments

@altV
Copy link

altV commented May 2, 2016

I'm using android default browser, and I see its id in connected-ids atom after turning on Flight mode or phone restarts.

Closing android browser or navigating away from the page works fine, though.

@ptaoussanis
Copy link
Member

Hi there, thanks for the report. What Android device, browser, and HTTP
server are you using?

Is this over Ajax, websockets, or both?

On Mon, May 2, 2016, 21:56 altV notifications@github.com wrote:

I'm using android default browser, and I see its id in connected-ids atom
after turning on Flight mode or phone restarts.

Closing android browser or navigating away from the page works fine,
though.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#230

@altV
Copy link
Author

altV commented May 7, 2016

Hi,
sorry for delayed reply!

Samsung Galaxy S6 / Default browser

"Mozilla/5.0 (Linux; Android 6.0.1; SAMSUNG SM-G920F Build/MMB29K) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/4.0 Chrome/44.0.2403.133 Mobile Safari/537.36"

It is over websocket - because chsk-state has type :ws.
How would I approach debugging?

@danielcompton
Copy link
Collaborator

Is this using the sample application? Turning the logging level up and providing the server and client logs would probably be quite a good start for tracking this down.

@altV
Copy link
Author

altV commented May 8, 2016

sorry, new to clojure -- how do I switch on logs on sente?

@danielcompton
Copy link
Collaborator

@ptaoussanis
Copy link
Member

Just to add to Daniel's good advice - the most likely causes of Sente not reflecting a disconnected client would be (in order):

  1. It's an Ajax client, and you're expecting the disconnect to reflect immediately. In fact, Ajax clients are designed to only show up as disconnected ~5 seconds after last activity.
  2. The underlying HTTP server is buggy and not reliably calling on-close when it needs to.
  3. A bug in Sente.

Debugging this with the example project, I'd start by making sure you've ruled out (1) for sure, then I'd suggest ruling out (2) by trying an alternative http server.

Hope that helps, cheers! :-)

@ptaoussanis ptaoussanis changed the title Sente does not detect android client disconnects Sente not detecting android client disconnects? May 9, 2016
@altV
Copy link
Author

altV commented May 9, 2016

  • I'm using http-kit as a web-server.
  • It is not an Ajax client, to my understanding.
  • I could not get example application to work using cd example-app && lein run, it opens a web page with four options and buttons don't seem to do anything
  • I've looked through the Sente source code, and now I understand it should be a bug in http-kit that figwheel is using
  • I have upgraded http-kit to the latest 2.1.19 version and it didn't help. Will try to switch to different http servers

Next steps:

  • get example-app to work
  • debug with something like netstat?
  • ask http-kit team on how to debug and how to build minimal example?

@danielcompton
Copy link
Collaborator

Hmm, looks like there's a problem with the example project, it didn't work for me either.

@ptaoussanis
Copy link
Member

ptaoussanis commented May 10, 2016

@altV Thanks for the info, will follow up in a moment.

@danielcompton Hi Daniel, this is the example project currently on master? Seems to be working for me. Please note that it's started with cd example-project && lein start not cd example-app && lein run. What behaviour are you seeing exactly?

I do lein start, that causes the Cljs to compile - then the server starts up, producing a bunch of log output to the console. Then a browser window auto-opens. All the buttons + async events are working as expected on my end.

@ptaoussanis
Copy link
Member

ptaoussanis commented May 10, 2016

@altV Responses inline...

I could not get example application to work using cd example-app && lein run, it opens a web page with four options and buttons don't seem to do anything

Please note that the example project is started with cd example-project && lein start. Does that help resolve the issue with the example project not working?

I've looked through the Sente source code, and now I understand it should be a bug in http-kit that figwheel is using

Could you please rephrase this, I don't follow? Figwheel isn't included in the reference example; are you including it (or any other plugins) manually?

Please let me know of any changes you might be making to the reference example project.

I have upgraded http-kit to the latest 2.1.19 version and it didn't help. Will try to switch to different http servers

This is in the example project? Are you sure you're using the latest example project (on master). The current example project already includes the latest+recommended version of http-kit ("2.2.0-alpha1").

Next steps:

My recommendation:

  1. Get reference example to work.
  2. Rule out that you're observing normal behaviour for Ajax connections. Log output from the ref example will help determine if your connection is falling back to Ajax for any reason.
  3. If you can reproduce the error with the ref example + http-kit, try the ref example + immutant to help establish if the issue is with http-kit specifically.

Then let's go from there.

Thanks!

@altV
Copy link
Author

altV commented May 10, 2016

Thank you, I'll try to do that.
(Also, I've tested http-kit 2.1.19 and 2.2.0-alpha1 both)

I've just switched to immutant/web (undertow?), and you won't believe this! It is still the same behavior - flight mode on the phone, and the client is still in the connected-uids under :ws key. Back-forward buttons work pretty solid - client dissapears in 5 seconds.

(My :user-id-fn is

(fn [ring-req]
  (let [uid (get-in ring-req [:session :saml])
        user-agent (some-> ring-req :headers (get "user-agent") uap/useragent)]
    [uid (:client-id ring-req) (:remote-addr ring-req) user-agent]))})]

just in case using vectors and long strings can affect something.)

@altV
Copy link
Author

altV commented May 10, 2016

Peter,

I ran example-project with lein start.
No serious warnings are there in the log. Web server is running on 50210. Web page was auto-opened after run. Buttons do not do anything. Server prints some data in debug log - the only lines are

16-05-10 17:11:13 ubuntu-leo INFO [example.server:34] - Starting http-kit...
16-05-10 17:11:13 ubuntu-leo INFO [example.server:203] - Web server is running at http://localhost:50210/
Created new window in existing browser session.
16-05-10 17:11:24 ubuntu-leo DEBUG [example.server:168] - Broadcasting server>user: {:ws #{}, :ajax #{}, :any #{}}
16-05-10 17:11:34 ubuntu-leo DEBUG [example.server:168] - Broadcasting server>user: {:ws #{}, :ajax #{}, :any #{}}
16-05-10 17:11:44 ubuntu-leo DEBUG [example.server:168] - Broadcasting server>user: {:ws #{}, :ajax #{}, :any #{}}
16-05-10 17:11:54 ubuntu-leo DEBUG [example.server:168] - Broadcasting server>user: {:ws #{}, :ajax #{}, :any #{}}
16-05-10 17:12:04 ubuntu-leo DEBUG [example.server:168] - Broadcasting server>user: {:ws #{}, :ajax #{}, :any #{}}

Web browser cannot request main.js - it's not served (404).
The log shows that it compiled successfully:

Compiling ClojureScript...
Retrieving cljsbuild/cljsbuild/1.1.3/cljsbuild-1.1.3.pom from clojars
Retrieving com/taoensso/sente/1.8.2-alpha1/sente-1.8.2-alpha1.pom from clojars
Retrieving com/taoensso/encore/2.49.0/encore-2.49.0.pom from clojars
Retrieving com/taoensso/truss/1.2.0/truss-1.2.0.pom from clojars
Retrieving com/cognitect/transit-cljs/0.8.237/transit-cljs-0.8.237.jar from central
Retrieving com/cognitect/transit-js/0.8.846/transit-js-0.8.846.jar from central
Retrieving com/taoensso/sente/1.8.2-alpha1/sente-1.8.2-alpha1.jar from clojars
Retrieving com/taoensso/encore/2.49.0/encore-2.49.0.jar from clojars
Retrieving cljsbuild/cljsbuild/1.1.3/cljsbuild-1.1.3.jar from clojars
Retrieving io/aviso/pretty/0.1.23/pretty-0.1.23.jar from clojars
Retrieving com/taoensso/truss/1.2.0/truss-1.2.0.jar from clojars
Compiling "target/main.js" from ["src"]...
Successfully compiled "target/main.js" in 15.449 seconds.

(my comment

I've looked through the Sente source code, and now I understand it should be a bug in http-kit that figwheel is using

means "I've read through the source code of sente, and found one mention of closing web-sockets, that looks like it's coming from the web-server. From it I see why you said that it is a possibility for bug coming from webserver")

@altV
Copy link
Author

altV commented May 10, 2016

I've copied main.js from target/ to resources/public/
It works now.

Now figuring out how to debug disconnects. Enabling more logging as described above - without it information is not enough from the log.

ptaoussanis added a commit that referenced this issue May 11, 2016
Didn't notice that the link was being excluded by .gitignore
@ptaoussanis
Copy link
Member

I've copied main.js from target/ to resources/public/

Thanks for catching this! The project was supposed to contain a symlink; hand't noticed that it was being excluded by .gitignore. Fixed in master.

@ptaoussanis
Copy link
Member

I've just switched to immutant/web (undertow?), and you won't believe this! It is still the same behavior - flight mode on the phone, and the client is still in the connected-uids under :ws key. Back-forward buttons work pretty solid - client dissapears in 5 seconds.

Sorry, can you clarify what you mean by "client dissapears in 5 seconds"?

You connect your phone to a Sente channel socket. The uid is automatically added under the :ws key. You turn on your phone's flight mode to simulate a dropped connection. Then the uid is removed from the :ws key after 5 seconds?

If I've understood that correctly, then that's exactly the correct+expected behaviour. Am I misunderstanding something?

Are you suggesting that everything works correctly with Immutant, but you see different (broken) behaviour with http-kit? Please try be explicit about what behaviour you're expecting to see vs what behaviour you're actually seeing that seems broken.

And if you are seeing behaviour that seems to be broken somehow, I'd ask that we please confirm that it's happening also with the reference example.

Thanks a lot for all your help debugging! Cheers :-)

@ptaoussanis
Copy link
Member

In case this is the point of confusion- it's normal for a disconnect to reflect in the :connected-uids atom only after 5 seconds. This is by design.

Is that maybe the behaviour you were identifying as broken?

@altV
Copy link
Author

altV commented May 11, 2016

It stays forever with both immutant and http-kit.

Scenario 1 - good one. tested with http-kit 2.1.19, http-kit 2.2.0-alpha, and immutant 2.1.4.

  1. phone loads web page (Android Samsung Galaxy S6 / default browser)
  2. it appears on server under connected-uids under :ws
  3. I click back button on the phone
  4. it dissapears after 5 seconds.

Scenario 2 - problematic one. tested with http-kit 2.1.19, http-kit 2.2.0-alpha, and immutant 2.1.4.

  1. phone loads web page (Android Samsung Galaxy S6 / default browser)
  2. it appears on server under connected-uids under :ws
  3. I either turn the phone screen off or enable flight mode.
  4. it stays forever in connected-uids until next refreshing the page (with flight mode disabled, of course).

I'm in hotel now so I can't experiment until tomorrow, but I guess I need to 1. test it with example project with much more logging, and then after probably 2. build minimal websocket server with http-kit and see how it interacts with websockets and which events get sent.

Scenario 2 works fine when client is iPad / Mobile Safari.

@altV
Copy link
Author

altV commented May 11, 2016

umm. well, now that I'm reading the Internet and the design of TCP protocol - is it a feature, not a bug?
http://stackoverflow.com/questions/10240694/java-socket-api-how-to-tell-if-a-connection-has-been-closed/10241044#comment39326781_10241044

Websockets are handled over TCP, and they say TCP does not handle all the disconnects, only proper explicit ones, so does it mean that in order to detect timeouts within some interval ("disconnects") I should enable some application level keep-alives with this desired interval?

@ptaoussanis
Copy link
Member

You should still see the uid eventually get auto-removed from your connected uids set after Sente's next WebSocket keepalive attempt fails though.

Sente's default WebSocket keepalive is 25 seconds.

When you switch off your phone / switch on airplane mode - is the uid still around in 26 seconds?

@ptaoussanis
Copy link
Member

Hmm, scratch that; obviously won't work if you've switched your phone off (the keepalive is currently initiated by the client).

This may well be an edge case I hadn't considered that should be addressed Sente side. Let me think about this some + get back to you. Any other thoughts of course welcome in the meantime.

@ptaoussanis ptaoussanis changed the title Sente not detecting android client disconnects? Sente not detecting non-terminating android client disconnects? (e.g. sudden power loss / airplane mode) May 11, 2016
@ptaoussanis
Copy link
Member

What OS are you running your Sente server on? Any idea what your kernel TCP keepalive params are?

@ptaoussanis ptaoussanis changed the title Sente not detecting non-terminating android client disconnects? (e.g. sudden power loss / airplane mode) Sente not detecting non-terminating client disconnects? (e.g. sudden power loss / airplane mode) May 11, 2016
@ptaoussanis
Copy link
Member

@altV Have just pushed [com.taoensso/sente "1.8.3-issue230"] to Clojars. Could I possibly ask you to test this version when you get a chance?

Just added a simple GC mechanism to catch dead WebSocket connections. Default settings will perform GC after 60 seconds of inactivity. I.e. after you enable airplane mode, you should now see the uid correctly disconnect after 60 seconds.

@ptaoussanis ptaoussanis added bug and removed bug? labels May 11, 2016
ptaoussanis added a commit that referenced this issue May 11, 2016
If a WebSocket connection is closed without normal termination (e.g. as caused
by sudden loss of power, airplane mode, etc.) - it can hang around in conns_
for an extended period of time until the underlying TCP connection is identified
as dead.

This simple mod allows Sente's server to perform WebSocket connection GC for
clients that haven't communicated with the server w/in a defined amount of time
(must be > client :ws-kalive-ms).

Big thanks to @altV for helping to catch + diagnose this issue.
ptaoussanis added a commit that referenced this issue May 11, 2016
If a WebSocket connection is closed without normal termination (e.g. as caused
by sudden loss of power, airplane mode, etc.) - it can hang around in conns_
for an extended period of time until the underlying TCP connection is identified
as dead.

This simple mod allows Sente's server to perform WebSocket connection GC for
clients that haven't communicated with the server w/in a defined amount of time
(must be > client :ws-kalive-ms).

Big thanks to @altV for helping to catch + diagnose this issue.
@theasp
Copy link
Collaborator

theasp commented May 11, 2016

@altV I'm curious, how long had you waited? In theory they should be disconnected after a long time, 2 hours and 12 minutes for a Linux server with the default config: http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html

This means that the keepalive routines wait for two hours (7200 secs) before sending the first keepalive probe, and then resend it every 75 seconds. If no ACK response is received for nine consecutive times, the connection is marked as broken.

Edit: The socket would need to be opened with keepalive support by the server for this to take affect, I am unsure if either server does this.

Edit 2: I figured out why i haven't encountered this. I use nginx as a proxy which appanrelty closes connections that have been idle for 60 seconds: https://blog.martinfjordvald.com/2013/02/websockets-in-nginx/

WebSockets Time Out
WebSockets are still affected by proxy_read_timeout which defaults to 60 seconds. This means that if you have an application using WebSockets but not sending any data more than once per 60 seconds you either need to increase the timeout or implement a ping message to keep the connection alive. The ping solution has the added benefit of discovering if the connection was closed unexpectedly.

Keep-Alive & WebSockets
Keep-alive pings aren’t usable for working around the above mentioned timeout as they work on the TCP level are are just empty packets. They aren’t reported to the application and thus the application will never respond to them meaning that proxy_read_timeout will still trigger.

@ptaoussanis Have you considered reversing the ping? Often a server would ping a client, and if the client doesn't respond in a specified time it disconnects them. IRC comes to mind: https://tools.ietf.org/html/rfc1459#section-4.6.2

I don't see a downside to your approach though.

@ptaoussanis
Copy link
Member

ptaoussanis commented May 11, 2016

@theasp Hi Andrew,

Have you considered reversing the ping?

I did (was my first inclination too), but it'd require more busywork on the server side. A client->server ping is simpler to implement, and won't affect server performance (the clients bear administrative costs like tracking when pings might/not be necessary due to other messages recently being sent, etc.).

I.e. the current implementation is simpler + more efficient than a server->client ping would be to achieve the ~same result.

Does that make sense?

@theasp
Copy link
Collaborator

theasp commented May 11, 2016

Yes, perfect sense. Thank you.

@danielcompton
Copy link
Collaborator

I've been thinking about this, and I think this explains why we were seeing Heroku warning us that some connections were timing out. They were successfully killed, but I hadn't really understood why it was happening (sporadically).

@altV
Copy link
Author

altV commented May 12, 2016

thank you, @ptaoussanis ! it is very nice.

  • timeout of 60 seconds seems to work (two tests today)
  • I now understand why my nginx reverse proxy with proxy_read_timeout set to many hours didn't affect websocket connections on my home setup
  • to address tcp keepalives mentions above by @theasp - unfortunately, I do not have a good understanding of the concept (can I use it on Heroku? can I use it on Linux - will it work across all OS and environments and routers? should I specify it when opening the TCP socket, right? where in Sente or http-kit does it apply then?).
    It used to hold these connections for at least 8 hours on ubuntu with nginx as reverse proxy.
  • I'm thinking of sending a pull-request now for the ability to configure these two values - this 60 second keepalive, and 5 seconds of detecting disconnects
  • I'm seeing Heroku disconnect warnings as @danielcompton mentions - these are probably about these idle connections
  • I cannot say I have done enough testing today. Most likely, it works, but to be able to test it thoroughly, I will try to make these values configurable
  • Today I saw that one safari browser established two websocket connections, but that might be related to code reloading (?), and of course, it is not related to this Sente not detecting non-terminating client disconnects? (e.g. sudden power loss / airplane mode) #230

Just to be 100% sure, am I correct that now clients send application-level heartbeats every 60 seconds?

@ptaoussanis
Copy link
Member

ptaoussanis commented May 12, 2016

@altV, @danielcompton Could someone possibly post an example Heroku warning here?

@altV Replies inline:

I'm thinking of sending a pull-request now for the ability to configure these two values

Please note that the values are already configurable with the :ws-conn-gc-ms opt.

Today I saw that one safari browser established two websocket connections, but that might be related to code reloading (?)

Yeah, that sounds like code reloading.

Just to be 100% sure, am I correct that now clients send application-level heartbeats every 60 seconds?

Sente clients have always automatically sent PINGs as a form of keepalive. These default to a 25 second interval, but may be skipped if other data was sent w/in the same window.

The thing that's changed is that the server will now keep track of how long it's been since it has seen a message from each client. If no message has been received in ws-conn-gc-ms time, the Sente server will assume that the connection has died and will close it from the server's end.

@danielcompton
Copy link
Collaborator

danielcompton commented May 12, 2016

The Heroku warning is an H15:

2016-05-12T08:21:59.319328+00:00 heroku router - - at=error code=H15 desc="Idle connection" method=GET path="/chsk?client-id=1329c96a-296d-476d-94e3-80d472b58921" host=myhost.com request_id=a01f4080-b84b-4fa4-908d-01a3b85ae2d2 fwd="1.1.1.1" dyno=web.1 connect=1ms service=6505617ms status=503 bytes=2262

Reading that page, Heroku enforces a 30 second timeout for the first response back to the client, but web sockets only need to send keep alives every 55 seconds.

ptaoussanis added a commit that referenced this issue May 13, 2016
… Heroku warnings

With the old values, non-terminating connections (e.g. sudden airplane mode)
could cause conns to be forcefully terminated by Heroku with an "idle connection"
warning message[1].

With the new values, our gc should normally kick in before Heroku's.

Waiting on confirmation that this commit actually helps resolve these warnings.

[1] Ref. https://devcenter.heroku.com/articles/error-codes#h15-idle-connection
@ptaoussanis
Copy link
Member

@danielcompton Thanks!

ptaoussanis added a commit that referenced this issue May 13, 2016
… Heroku warnings

With the old values, non-terminating connections (e.g. sudden airplane mode)
could cause conns to be forcefully terminated by Heroku with an "idle connection"
warning message[1].

With the new values, our gc should normally kick in before Heroku's.

Waiting on confirmation that this commit actually helps resolve these warnings.

[1] Ref. https://devcenter.heroku.com/articles/error-codes#h15-idle-connection
@altV
Copy link
Author

altV commented May 13, 2016

Umm. Unfortunately, I cannot provide more testing until the end of next week (pessimistically). We can close the issue and reopen it if there will be issues. I'll reply back by then.

@ptaoussanis
Copy link
Member

No problem, please take your time :-)

@ptaoussanis
Copy link
Member

This should in principle be resolved so going to close for now. Please feel free to reopen if you still happen to run into any issues again later. Cheers :-)

@altV
Copy link
Author

altV commented May 20, 2016

looks like it works, but I will do even more testing. Many thanks for your fix!

ptaoussanis added a commit that referenced this issue May 25, 2016
Didn't notice that the link was being excluded by .gitignore
ptaoussanis added a commit that referenced this issue May 25, 2016
v1.9.0 is a significant, non-breaking upgrade focused on cleaning up Sente's
implementation to make it more robust, and to help address a couple of issues
that were recently identified (#201 and #230).

TODO -
  [ ] Test new pings, timeouts, etc. (http-kit)
  [ ] Test new pings, timeouts, etc. (Immutant)
  [ ] Final round of housekeeping
  [ ] Cut an alpha1 release

This is a squashed commit that includes:

- Significant code refactor.

- Extended ref example to incl. a pair of buttons to excercise async push
  features.

- Drop (experimental) flexi packer.

- [#161] Clojure-side Transit optimizations
  Specifically:
    1. Now cache read-handlers and write-handlers.
       Can't tell if this actually realistically helps perf since I've
       got no handler maps to test against; feedback welcome.

    2. Now cache (thread-local) writer (allows baos reuse).
       Doesn't seem to actually realistically help perf from what I can
       tell. Might nix this later since this does add some complexity
       to the impl.

- [#201] Add support for more flexible conn-type upgrade/downgrade
  Initial downgrade strategy here is simple, may or may not turn out to be
  useful; still need to check. Point was to get a more flexible base that
  we can build on in future.

- [#230] Server-side ping to help GC non-terminating WebSocket conns.

  If a WebSocket connection is closed without normal termination (e.g. as
  caused by sudden loss of power, airplane mode, etc.) - it could hang
  around in conns_ for an extended period of time until the underlying TCP
  connection was identified as dead.

  This mods Sente's keep-alive ping from client->server to server->client,
  allowing the server to identify (and auto-gc) dead but abnormally
  terminated WebSocket connections.

  Big thanks to @altV for helping to catch + diagnose this issue.

- [#230] Experimental: tweak timeout defaults to help suppress possible
  Heroku warnings.

  With the old values, abnormally terminated conns (e.g. sudden airplane
  mode) could cause conns to be forcefully terminated by Heroku with an
  "idle connection" warning message[1].

  With the new values, our gc should normally kick in before Heroku's.

  Waiting on confirmation that this commit actually helps resolve these
  warnings.

  [1] Ref. https://devcenter.heroku.com/articles/error-codes#h15-idle-connection

- [#150 #159] Allow server to gc lp conns

  Using the infrastructure in place for #230, have decided to now
  initiate Ajax timeouts from the server side instead of the client side.

  As with #230, this'll help the underlying http server gc any abnormal
  connections.

  In particular, this should (?) resolve #159 (for Immutant) w/o the need
  for the earlier workaround that was introduced.
ptaoussanis added a commit that referenced this issue Jun 4, 2016
v1.9.0 is a significant, non-breaking upgrade focused on addressing a
couple minor issues that were recently identified (#201 and #230). While
in the code, also took the opportunity to refactor some implementation
details.

- @ptaoussanis

This is a squashed commit that includes:

- Extend ref example to incl. a pair of buttons to excercise async push
  features.

- Drop (experimental) flexi packer.

- [#161] Clojure-side Transit optimizations
  Specifically:
    1. Now cache read-handlers and write-handlers.
       Can't tell if this actually realistically helps perf since I've
       got no handler maps to test against; feedback welcome.

    2. Now cache (thread-local) writer (allows baos reuse).
       Doesn't seem to actually realistically help perf from what I can
       tell. Might nix this later since this does add some complexity
       to the impl.

- [#201] Add support for more flexible conn-type upgrade/downgrade
  Initial downgrade strategy here is simple, may or may not turn out to be
  useful; still need to check. Point was to get a more flexible base that
  we can build on in future.

- [#230] Server-side ping to help GC non-terminating WebSocket conns.

  If a WebSocket connection is closed without normal termination (e.g. as
  caused by sudden loss of power, airplane mode, etc.) - it could hang
  around in conns_ for an extended period of time until the underlying TCP
  connection was identified as dead.

  This mods Sente's keep-alive ping from client->server to server->client,
  allowing the server to identify (and auto-gc) dead but abnormally
  terminated WebSocket connections.

  Big thanks to @altV for helping to catch + diagnose this issue.

  This change should also help to suppress possible "idle connection"
  Heroku warnings, Ref. https://goo.gl/zLR0Gk

- [#150 #159] Allow server to gc lp conns

  Using the infrastructure in place for #230, have decided to now
  initiate Ajax timeouts from the server side instead of the client side.

  As with #230, this'll help the underlying http server gc any abnormal
  connections.

  In particular, this should (?) resolve #159 (for Immutant) w/o the need
  for the earlier Immutant-side workaround introduced for that issue.
ptaoussanis added a commit that referenced this issue Jun 9, 2016
v1.9.0 is a significant, non-breaking upgrade focused on addressing a
couple minor issues that were recently identified (#201 and #230). While
in the code, also took the opportunity to refactor some implementation
details.

- @ptaoussanis

This is a squashed commit that includes:

- Extend ref example to incl. a pair of buttons to excercise async push
  features.

- Drop (experimental) flexi packer.

- [#161] Clojure-side Transit optimizations
  Specifically:
    1. Now cache read-handlers and write-handlers.
       Can't tell if this actually realistically helps perf since I've
       got no handler maps to test against; feedback welcome.

    2. Now cache (thread-local) writer (allows baos reuse).
       Doesn't seem to actually realistically help perf from what I can
       tell. Might nix this later since this does add some complexity
       to the impl.

- [#201] Add support for more flexible conn-type upgrade/downgrade
  Initial downgrade strategy here is simple, may or may not turn out to be
  useful; still need to check. Point was to get a more flexible base that
  we can build on in future.

- [#230] Server-side ping to help GC non-terminating WebSocket conns.

  If a WebSocket connection is closed without normal termination (e.g. as
  caused by sudden loss of power, airplane mode, etc.) - it could hang
  around in conns_ for an extended period of time until the underlying TCP
  connection was identified as dead.

  This mods Sente's keep-alive ping from client->server to server->client,
  allowing the server to identify (and auto-gc) dead but abnormally
  terminated WebSocket connections.

  Big thanks to @altV for helping to catch + diagnose this issue.

  This change should also help to suppress possible "idle connection"
  Heroku warnings, Ref. https://goo.gl/zLR0Gk

- [#150 #159] Allow server to gc lp conns

  Using the infrastructure in place for #230, have decided to now
  initiate Ajax timeouts from the server side instead of the client side.

  As with #230, this'll help the underlying http server gc any abnormal
  connections.

  In particular, this should (?) resolve #159 (for Immutant) w/o the need
  for the earlier Immutant-side workaround introduced for that issue.
ptaoussanis added a commit that referenced this issue Jun 10, 2016
v1.9.0 is a significant, non-breaking upgrade focused on addressing a
couple minor issues that were recently identified (#201 and #230). While
in the code, also took the opportunity to refactor some implementation
details.

- @ptaoussanis

This is a squashed commit that includes:

- Extend ref example to incl. a pair of buttons to excercise async push
  features.

- Drop (experimental) flexi packer.

- [#161] Clojure-side Transit optimizations
  Specifically:
    1. Now cache read-handlers and write-handlers.
       Can't tell if this actually realistically helps perf since I've
       got no handler maps to test against; feedback welcome.

    2. Now cache (thread-local) writer (allows baos reuse).
       Doesn't seem to actually realistically help perf from what I can
       tell. Might nix this later since this does add some complexity
       to the impl.

- [#201] Add support for more flexible conn-type upgrade/downgrade
  Initial downgrade strategy here is simple, may or may not turn out to be
  useful; still need to check. Point was to get a more flexible base that
  we can build on in future.

- [#230] Server-side ping to help GC non-terminating WebSocket conns.

  If a WebSocket connection is closed without normal termination (e.g. as
  caused by sudden loss of power, airplane mode, etc.) - it could hang
  around in conns_ for an extended period of time until the underlying TCP
  connection was identified as dead.

  This mods Sente's keep-alive ping from client->server to server->client,
  allowing the server to identify (and auto-gc) dead but abnormally
  terminated WebSocket connections.

  Big thanks to @altV for helping to catch + diagnose this issue.

  This change should also help to suppress possible "idle connection"
  Heroku warnings, Ref. https://goo.gl/zLR0Gk

- [#150 #159] Allow server to gc lp conns

  Using the infrastructure in place for #230, have decided to now
  initiate Ajax timeouts from the server side instead of the client side.

  As with #230, this'll help the underlying http server gc any abnormal
  connections.

  In particular, this should (?) resolve #159 (for Immutant) w/o the need
  for the earlier Immutant-side workaround introduced for that issue.
ptaoussanis added a commit that referenced this issue Aug 19, 2016
Quick sketch of a possible implementation.
Didn't test, or think about this much.

Situation before #230:
  Clients maintain keep-alive, sending pings to server.
  On sudden conn break (e.g. airplane mode), clients would know about
  the break but the server wouldn't

Situation after #230:
  Server maintains keep-alive, sending pings to clients.
  On sudden conn break (e.g. airplane mode), server would know about
  the break but clients wouldn't.

As of this commit:
  Server and clients each maintain a keep-alive[1]. I.e. each side will
  attempt to ping if it hasn't heard from the other side in a
  prescribed window.

  On sudden conn break, each side will be made aware of the break when
  its own scheduled window fires and fails to successfully send a ping.

[1] Timeouts should be different. In particular, the client may like to
choose a much more aggressive window (e.g. 5s) to provide a rapid
indicator to users.
ptaoussanis added a commit that referenced this issue Aug 31, 2016
…ane mode)

Situation before #230:
  Clients maintain keep-alive, sending pings to server.
  On sudden conn break (e.g. airplane mode), clients would know about
  the break but the server wouldn't

Situation after #230:
  Server maintains keep-alive, sending pings to clients.
  On sudden conn break (e.g. airplane mode), server would know about
  the break but clients wouldn't.

As of this commit:
  Server and clients each maintain a keep-alive[1]. I.e. each side will
  attempt to ping if it hasn't heard from the other side in a
  prescribed window.

  On sudden conn break, each side will be made aware of the break when
  its own scheduled window fires and fails to successfully send a ping.

[1] Timeouts should be different. In particular, the client may like to
choose a much more aggressive window (e.g. 5s) to provide a rapid
indicator to users.
ptaoussanis added a commit that referenced this issue Sep 8, 2016
…ane mode)

Situation before #230:
  Clients maintain keep-alive, sending pings to server.
  On sudden conn break (e.g. airplane mode), clients would know about
  the break but the server wouldn't

Situation after #230:
  Server maintains keep-alive, sending pings to clients.
  On sudden conn break (e.g. airplane mode), server would know about
  the break but clients wouldn't.

As of this commit:
  Server and clients each maintain a keep-alive[1]. I.e. each side will
  attempt to ping if it hasn't heard from the other side in a
  prescribed window.

  On sudden conn break, each side will be made aware of the break when
  its own scheduled window fires and fails to successfully send a ping.

[1] Timeouts should be different. In particular, the client may like to
choose a much more aggressive window (e.g. 5s) to provide a rapid
indicator to users.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants