Sente not detecting non-terminating client disconnects? (e.g. sudden power loss / airplane mode) #230

altV · 2016-05-02T14:56:01Z

I'm using android default browser, and I see its id in connected-ids atom after turning on Flight mode or phone restarts.

Closing android browser or navigating away from the page works fine, though.

The text was updated successfully, but these errors were encountered:

ptaoussanis · 2016-05-02T15:03:23Z

Hi there, thanks for the report. What Android device, browser, and HTTP
server are you using?

Is this over Ajax, websockets, or both?

On Mon, May 2, 2016, 21:56 altV notifications@github.com wrote:

I'm using android default browser, and I see its id in connected-ids atom
after turning on Flight mode or phone restarts.

Closing android browser or navigating away from the page works fine,
though.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#230

altV · 2016-05-07T23:05:30Z

Hi,
sorry for delayed reply!

Samsung Galaxy S6 / Default browser

"Mozilla/5.0 (Linux; Android 6.0.1; SAMSUNG SM-G920F Build/MMB29K) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/4.0 Chrome/44.0.2403.133 Mobile Safari/537.36"

It is over websocket - because chsk-state has type :ws.
How would I approach debugging?

danielcompton · 2016-05-07T23:23:41Z

Is this using the sample application? Turning the logging level up and providing the server and client logs would probably be quite a good start for tracking this down.

altV · 2016-05-08T00:06:13Z

sorry, new to clojure -- how do I switch on logs on sente?

danielcompton · 2016-05-08T07:07:01Z

https://github.com/ptaoussanis/sente/blob/master/example-project/src/example/client.cljs#L18 and https://github.com/ptaoussanis/sente/blob/master/example-project/src/example/server.clj#L29. I recommend running from the example project so we can help debug from a known base.

ptaoussanis · 2016-05-09T05:26:53Z

Just to add to Daniel's good advice - the most likely causes of Sente not reflecting a disconnected client would be (in order):

It's an Ajax client, and you're expecting the disconnect to reflect immediately. In fact, Ajax clients are designed to only show up as disconnected ~5 seconds after last activity.
The underlying HTTP server is buggy and not reliably calling on-close when it needs to.
A bug in Sente.

Debugging this with the example project, I'd start by making sure you've ruled out (1) for sure, then I'd suggest ruling out (2) by trying an alternative http server.

Hope that helps, cheers! :-)

altV · 2016-05-09T17:31:45Z

I'm using http-kit as a web-server.
It is not an Ajax client, to my understanding.
I could not get example application to work using cd example-app && lein run, it opens a web page with four options and buttons don't seem to do anything
I've looked through the Sente source code, and now I understand it should be a bug in http-kit that figwheel is using
I have upgraded http-kit to the latest 2.1.19 version and it didn't help. Will try to switch to different http servers

Next steps:

get example-app to work
debug with something like netstat?
ask http-kit team on how to debug and how to build minimal example?

danielcompton · 2016-05-09T22:25:47Z

Hmm, looks like there's a problem with the example project, it didn't work for me either.

ptaoussanis · 2016-05-10T03:29:32Z

@altV Thanks for the info, will follow up in a moment.

@danielcompton Hi Daniel, this is the example project currently on master? Seems to be working for me. Please note that it's started with cd example-project && lein start not cd example-app && lein run. What behaviour are you seeing exactly?

I do lein start, that causes the Cljs to compile - then the server starts up, producing a bunch of log output to the console. Then a browser window auto-opens. All the buttons + async events are working as expected on my end.

ptaoussanis · 2016-05-10T03:37:06Z

@altV Responses inline...

I could not get example application to work using cd example-app && lein run, it opens a web page with four options and buttons don't seem to do anything

Please note that the example project is started with cd example-project && lein start. Does that help resolve the issue with the example project not working?

I've looked through the Sente source code, and now I understand it should be a bug in http-kit that figwheel is using

Could you please rephrase this, I don't follow? Figwheel isn't included in the reference example; are you including it (or any other plugins) manually?

Please let me know of any changes you might be making to the reference example project.

I have upgraded http-kit to the latest 2.1.19 version and it didn't help. Will try to switch to different http servers

This is in the example project? Are you sure you're using the latest example project (on master). The current example project already includes the latest+recommended version of http-kit ("2.2.0-alpha1").

Next steps:

My recommendation:

Get reference example to work.
Rule out that you're observing normal behaviour for Ajax connections. Log output from the ref example will help determine if your connection is falling back to Ajax for any reason.
If you can reproduce the error with the ref example + http-kit, try the ref example + immutant to help establish if the issue is with http-kit specifically.

Then let's go from there.

Thanks!

altV · 2016-05-10T15:52:18Z

Thank you, I'll try to do that.
(Also, I've tested http-kit 2.1.19 and 2.2.0-alpha1 both)

I've just switched to immutant/web (undertow?), and you won't believe this! It is still the same behavior - flight mode on the phone, and the client is still in the connected-uids under :ws key. Back-forward buttons work pretty solid - client dissapears in 5 seconds.

(My :user-id-fn is

(fn [ring-req]
  (let [uid (get-in ring-req [:session :saml])
        user-agent (some-> ring-req :headers (get "user-agent") uap/useragent)]
    [uid (:client-id ring-req) (:remote-addr ring-req) user-agent]))})]

just in case using vectors and long strings can affect something.)

altV · 2016-05-10T17:17:09Z

Peter,

I ran example-project with lein start.
No serious warnings are there in the log. Web server is running on 50210. Web page was auto-opened after run. Buttons do not do anything. Server prints some data in debug log - the only lines are

16-05-10 17:11:13 ubuntu-leo INFO [example.server:34] - Starting http-kit...
16-05-10 17:11:13 ubuntu-leo INFO [example.server:203] - Web server is running at http://localhost:50210/
Created new window in existing browser session.
16-05-10 17:11:24 ubuntu-leo DEBUG [example.server:168] - Broadcasting server>user: {:ws #{}, :ajax #{}, :any #{}}
16-05-10 17:11:34 ubuntu-leo DEBUG [example.server:168] - Broadcasting server>user: {:ws #{}, :ajax #{}, :any #{}}
16-05-10 17:11:44 ubuntu-leo DEBUG [example.server:168] - Broadcasting server>user: {:ws #{}, :ajax #{}, :any #{}}
16-05-10 17:11:54 ubuntu-leo DEBUG [example.server:168] - Broadcasting server>user: {:ws #{}, :ajax #{}, :any #{}}
16-05-10 17:12:04 ubuntu-leo DEBUG [example.server:168] - Broadcasting server>user: {:ws #{}, :ajax #{}, :any #{}}

Web browser cannot request main.js - it's not served (404).
The log shows that it compiled successfully:

Compiling ClojureScript...
Retrieving cljsbuild/cljsbuild/1.1.3/cljsbuild-1.1.3.pom from clojars
Retrieving com/taoensso/sente/1.8.2-alpha1/sente-1.8.2-alpha1.pom from clojars
Retrieving com/taoensso/encore/2.49.0/encore-2.49.0.pom from clojars
Retrieving com/taoensso/truss/1.2.0/truss-1.2.0.pom from clojars
Retrieving com/cognitect/transit-cljs/0.8.237/transit-cljs-0.8.237.jar from central
Retrieving com/cognitect/transit-js/0.8.846/transit-js-0.8.846.jar from central
Retrieving com/taoensso/sente/1.8.2-alpha1/sente-1.8.2-alpha1.jar from clojars
Retrieving com/taoensso/encore/2.49.0/encore-2.49.0.jar from clojars
Retrieving cljsbuild/cljsbuild/1.1.3/cljsbuild-1.1.3.jar from clojars
Retrieving io/aviso/pretty/0.1.23/pretty-0.1.23.jar from clojars
Retrieving com/taoensso/truss/1.2.0/truss-1.2.0.jar from clojars
Compiling "target/main.js" from ["src"]...
Successfully compiled "target/main.js" in 15.449 seconds.

(my comment

I've looked through the Sente source code, and now I understand it should be a bug in http-kit that figwheel is using

means "I've read through the source code of sente, and found one mention of closing web-sockets, that looks like it's coming from the web-server. From it I see why you said that it is a possibility for bug coming from webserver")

altV · 2016-05-10T18:14:25Z

I've copied main.js from target/ to resources/public/
It works now.

Now figuring out how to debug disconnects. Enabling more logging as described above - without it information is not enough from the log.

Didn't notice that the link was being excluded by .gitignore

ptaoussanis · 2016-05-11T02:27:37Z

I've copied main.js from target/ to resources/public/

Thanks for catching this! The project was supposed to contain a symlink; hand't noticed that it was being excluded by .gitignore. Fixed in master.

ptaoussanis · 2016-05-11T02:34:47Z

I've just switched to immutant/web (undertow?), and you won't believe this! It is still the same behavior - flight mode on the phone, and the client is still in the connected-uids under :ws key. Back-forward buttons work pretty solid - client dissapears in 5 seconds.

Sorry, can you clarify what you mean by "client dissapears in 5 seconds"?

You connect your phone to a Sente channel socket. The uid is automatically added under the :ws key. You turn on your phone's flight mode to simulate a dropped connection. Then the uid is removed from the :ws key after 5 seconds?

If I've understood that correctly, then that's exactly the correct+expected behaviour. Am I misunderstanding something?

Are you suggesting that everything works correctly with Immutant, but you see different (broken) behaviour with http-kit? Please try be explicit about what behaviour you're expecting to see vs what behaviour you're actually seeing that seems broken.

And if you are seeing behaviour that seems to be broken somehow, I'd ask that we please confirm that it's happening also with the reference example.

Thanks a lot for all your help debugging! Cheers :-)

ptaoussanis · 2016-05-11T02:38:23Z

In case this is the point of confusion- it's normal for a disconnect to reflect in the :connected-uids atom only after 5 seconds. This is by design.

Is that maybe the behaviour you were identifying as broken?

altV · 2016-05-11T05:12:03Z

It stays forever with both immutant and http-kit.

Scenario 1 - good one. tested with http-kit 2.1.19, http-kit 2.2.0-alpha, and immutant 2.1.4.

phone loads web page (Android Samsung Galaxy S6 / default browser)
it appears on server under connected-uids under :ws
I click back button on the phone
it dissapears after 5 seconds.

Scenario 2 - problematic one. tested with http-kit 2.1.19, http-kit 2.2.0-alpha, and immutant 2.1.4.

phone loads web page (Android Samsung Galaxy S6 / default browser)
it appears on server under connected-uids under :ws
I either turn the phone screen off or enable flight mode.
it stays forever in connected-uids until next refreshing the page (with flight mode disabled, of course).

I'm in hotel now so I can't experiment until tomorrow, but I guess I need to 1. test it with example project with much more logging, and then after probably 2. build minimal websocket server with http-kit and see how it interacts with websockets and which events get sent.

Scenario 2 works fine when client is iPad / Mobile Safari.

altV · 2016-05-11T06:16:58Z

umm. well, now that I'm reading the Internet and the design of TCP protocol - is it a feature, not a bug?
http://stackoverflow.com/questions/10240694/java-socket-api-how-to-tell-if-a-connection-has-been-closed/10241044#comment39326781_10241044

Websockets are handled over TCP, and they say TCP does not handle all the disconnects, only proper explicit ones, so does it mean that in order to detect timeouts within some interval ("disconnects") I should enable some application level keep-alives with this desired interval?

ptaoussanis · 2016-05-11T06:46:51Z

You should still see the uid eventually get auto-removed from your connected uids set after Sente's next WebSocket keepalive attempt fails though.

Sente's default WebSocket keepalive is 25 seconds.

When you switch off your phone / switch on airplane mode - is the uid still around in 26 seconds?

ptaoussanis · 2016-05-11T06:50:12Z

Hmm, scratch that; obviously won't work if you've switched your phone off (the keepalive is currently initiated by the client).

This may well be an edge case I hadn't considered that should be addressed Sente side. Let me think about this some + get back to you. Any other thoughts of course welcome in the meantime.

ptaoussanis · 2016-05-11T07:18:10Z

What OS are you running your Sente server on? Any idea what your kernel TCP keepalive params are?

ptaoussanis · 2016-05-11T11:00:34Z

@altV Have just pushed [com.taoensso/sente "1.8.3-issue230"] to Clojars. Could I possibly ask you to test this version when you get a chance?

Just added a simple GC mechanism to catch dead WebSocket connections. Default settings will perform GC after 60 seconds of inactivity. I.e. after you enable airplane mode, you should now see the uid correctly disconnect after 60 seconds.

@altV

If a WebSocket connection is closed without normal termination (e.g. as caused by sudden loss of power, airplane mode, etc.) - it can hang around in conns_ for an extended period of time until the underlying TCP connection is identified as dead. This simple mod allows Sente's server to perform WebSocket connection GC for clients that haven't communicated with the server w/in a defined amount of time (must be > client :ws-kalive-ms). Big thanks to @altV for helping to catch + diagnose this issue.

@altV

If a WebSocket connection is closed without normal termination (e.g. as caused by sudden loss of power, airplane mode, etc.) - it can hang around in conns_ for an extended period of time until the underlying TCP connection is identified as dead. This simple mod allows Sente's server to perform WebSocket connection GC for clients that haven't communicated with the server w/in a defined amount of time (must be > client :ws-kalive-ms). Big thanks to @altV for helping to catch + diagnose this issue.

theasp · 2016-05-11T13:05:04Z

@altV I'm curious, how long had you waited? In theory they should be disconnected after a long time, 2 hours and 12 minutes for a Linux server with the default config: http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html

This means that the keepalive routines wait for two hours (7200 secs) before sending the first keepalive probe, and then resend it every 75 seconds. If no ACK response is received for nine consecutive times, the connection is marked as broken.

Edit: The socket would need to be opened with keepalive support by the server for this to take affect, I am unsure if either server does this.

Edit 2: I figured out why i haven't encountered this. I use nginx as a proxy which appanrelty closes connections that have been idle for 60 seconds: https://blog.martinfjordvald.com/2013/02/websockets-in-nginx/

WebSockets Time Out
WebSockets are still affected by proxy_read_timeout which defaults to 60 seconds. This means that if you have an application using WebSockets but not sending any data more than once per 60 seconds you either need to increase the timeout or implement a ping message to keep the connection alive. The ping solution has the added benefit of discovering if the connection was closed unexpectedly.

Keep-Alive & WebSockets
Keep-alive pings aren’t usable for working around the above mentioned timeout as they work on the TCP level are are just empty packets. They aren’t reported to the application and thus the application will never respond to them meaning that proxy_read_timeout will still trigger.

@ptaoussanis Have you considered reversing the ping? Often a server would ping a client, and if the client doesn't respond in a specified time it disconnects them. IRC comes to mind: https://tools.ietf.org/html/rfc1459#section-4.6.2

I don't see a downside to your approach though.

ptaoussanis · 2016-05-11T13:17:32Z

@theasp Hi Andrew,

Have you considered reversing the ping?

I did (was my first inclination too), but it'd require more busywork on the server side. A client->server ping is simpler to implement, and won't affect server performance (the clients bear administrative costs like tracking when pings might/not be necessary due to other messages recently being sent, etc.).

I.e. the current implementation is simpler + more efficient than a server->client ping would be to achieve the ~same result.

Does that make sense?

theasp · 2016-05-11T13:22:07Z

Yes, perfect sense. Thank you.

danielcompton · 2016-05-11T23:31:55Z

I've been thinking about this, and I think this explains why we were seeing Heroku warning us that some connections were timing out. They were successfully killed, but I hadn't really understood why it was happening (sporadically).

altV · 2016-05-12T00:39:38Z

thank you, @ptaoussanis ! it is very nice.

timeout of 60 seconds seems to work (two tests today)
I now understand why my nginx reverse proxy with proxy_read_timeout set to many hours didn't affect websocket connections on my home setup
to address tcp keepalives mentions above by @theasp - unfortunately, I do not have a good understanding of the concept (can I use it on Heroku? can I use it on Linux - will it work across all OS and environments and routers? should I specify it when opening the TCP socket, right? where in Sente or http-kit does it apply then?).
It used to hold these connections for at least 8 hours on ubuntu with nginx as reverse proxy.
I'm thinking of sending a pull-request now for the ability to configure these two values - this 60 second keepalive, and 5 seconds of detecting disconnects
I'm seeing Heroku disconnect warnings as @danielcompton mentions - these are probably about these idle connections
I cannot say I have done enough testing today. Most likely, it works, but to be able to test it thoroughly, I will try to make these values configurable
Today I saw that one safari browser established two websocket connections, but that might be related to code reloading (?), and of course, it is not related to this Sente not detecting non-terminating client disconnects? (e.g. sudden power loss / airplane mode) #230

Just to be 100% sure, am I correct that now clients send application-level heartbeats every 60 seconds?

ptaoussanis · 2016-05-12T04:37:33Z

@altV, @danielcompton Could someone possibly post an example Heroku warning here?

@altV Replies inline:

I'm thinking of sending a pull-request now for the ability to configure these two values

Please note that the values are already configurable with the :ws-conn-gc-ms opt.

Today I saw that one safari browser established two websocket connections, but that might be related to code reloading (?)

Yeah, that sounds like code reloading.

Just to be 100% sure, am I correct that now clients send application-level heartbeats every 60 seconds?

Sente clients have always automatically sent PINGs as a form of keepalive. These default to a 25 second interval, but may be skipped if other data was sent w/in the same window.

The thing that's changed is that the server will now keep track of how long it's been since it has seen a message from each client. If no message has been received in ws-conn-gc-ms time, the Sente server will assume that the connection has died and will close it from the server's end.

danielcompton · 2016-05-12T20:31:07Z

The Heroku warning is an H15:

2016-05-12T08:21:59.319328+00:00 heroku router - - at=error code=H15 desc="Idle connection" method=GET path="/chsk?client-id=1329c96a-296d-476d-94e3-80d472b58921" host=myhost.com request_id=a01f4080-b84b-4fa4-908d-01a3b85ae2d2 fwd="1.1.1.1" dyno=web.1 connect=1ms service=6505617ms status=503 bytes=2262

Reading that page, Heroku enforces a 30 second timeout for the first response back to the client, but web sockets only need to send keep alives every 55 seconds.

… Heroku warnings With the old values, non-terminating connections (e.g. sudden airplane mode) could cause conns to be forcefully terminated by Heroku with an "idle connection" warning message[1]. With the new values, our gc should normally kick in before Heroku's. Waiting on confirmation that this commit actually helps resolve these warnings. [1] Ref. https://devcenter.heroku.com/articles/error-codes#h15-idle-connection

ptaoussanis · 2016-05-13T03:41:38Z

@danielcompton Thanks!

… Heroku warnings With the old values, non-terminating connections (e.g. sudden airplane mode) could cause conns to be forcefully terminated by Heroku with an "idle connection" warning message[1]. With the new values, our gc should normally kick in before Heroku's. Waiting on confirmation that this commit actually helps resolve these warnings. [1] Ref. https://devcenter.heroku.com/articles/error-codes#h15-idle-connection

altV · 2016-05-13T15:08:32Z

Umm. Unfortunately, I cannot provide more testing until the end of next week (pessimistically). We can close the issue and reopen it if there will be issues. I'll reply back by then.

ptaoussanis · 2016-05-14T02:52:27Z

No problem, please take your time :-)

ptaoussanis · 2016-05-15T08:41:39Z

This should in principle be resolved so going to close for now. Please feel free to reopen if you still happen to run into any issues again later. Cheers :-)

altV · 2016-05-20T14:04:54Z

looks like it works, but I will do even more testing. Many thanks for your fix!

Didn't notice that the link was being excluded by .gitignore

@altV

v1.9.0 is a significant, non-breaking upgrade focused on cleaning up Sente's implementation to make it more robust, and to help address a couple of issues that were recently identified (#201 and #230). TODO - [ ] Test new pings, timeouts, etc. (http-kit) [ ] Test new pings, timeouts, etc. (Immutant) [ ] Final round of housekeeping [ ] Cut an alpha1 release This is a squashed commit that includes: - Significant code refactor. - Extended ref example to incl. a pair of buttons to excercise async push features. - Drop (experimental) flexi packer. - [#161] Clojure-side Transit optimizations Specifically: 1. Now cache read-handlers and write-handlers. Can't tell if this actually realistically helps perf since I've got no handler maps to test against; feedback welcome. 2. Now cache (thread-local) writer (allows baos reuse). Doesn't seem to actually realistically help perf from what I can tell. Might nix this later since this does add some complexity to the impl. - [#201] Add support for more flexible conn-type upgrade/downgrade Initial downgrade strategy here is simple, may or may not turn out to be useful; still need to check. Point was to get a more flexible base that we can build on in future. - [#230] Server-side ping to help GC non-terminating WebSocket conns. If a WebSocket connection is closed without normal termination (e.g. as caused by sudden loss of power, airplane mode, etc.) - it could hang around in conns_ for an extended period of time until the underlying TCP connection was identified as dead. This mods Sente's keep-alive ping from client->server to server->client, allowing the server to identify (and auto-gc) dead but abnormally terminated WebSocket connections. Big thanks to @altV for helping to catch + diagnose this issue. - [#230] Experimental: tweak timeout defaults to help suppress possible Heroku warnings. With the old values, abnormally terminated conns (e.g. sudden airplane mode) could cause conns to be forcefully terminated by Heroku with an "idle connection" warning message[1]. With the new values, our gc should normally kick in before Heroku's. Waiting on confirmation that this commit actually helps resolve these warnings. [1] Ref. https://devcenter.heroku.com/articles/error-codes#h15-idle-connection - [#150 #159] Allow server to gc lp conns Using the infrastructure in place for #230, have decided to now initiate Ajax timeouts from the server side instead of the client side. As with #230, this'll help the underlying http server gc any abnormal connections. In particular, this should (?) resolve #159 (for Immutant) w/o the need for the earlier workaround that was introduced.

@ptaoussanis

v1.9.0 is a significant, non-breaking upgrade focused on addressing a couple minor issues that were recently identified (#201 and #230). While in the code, also took the opportunity to refactor some implementation details. - @ptaoussanis This is a squashed commit that includes: - Extend ref example to incl. a pair of buttons to excercise async push features. - Drop (experimental) flexi packer. - [#161] Clojure-side Transit optimizations Specifically: 1. Now cache read-handlers and write-handlers. Can't tell if this actually realistically helps perf since I've got no handler maps to test against; feedback welcome. 2. Now cache (thread-local) writer (allows baos reuse). Doesn't seem to actually realistically help perf from what I can tell. Might nix this later since this does add some complexity to the impl. - [#201] Add support for more flexible conn-type upgrade/downgrade Initial downgrade strategy here is simple, may or may not turn out to be useful; still need to check. Point was to get a more flexible base that we can build on in future. - [#230] Server-side ping to help GC non-terminating WebSocket conns. If a WebSocket connection is closed without normal termination (e.g. as caused by sudden loss of power, airplane mode, etc.) - it could hang around in conns_ for an extended period of time until the underlying TCP connection was identified as dead. This mods Sente's keep-alive ping from client->server to server->client, allowing the server to identify (and auto-gc) dead but abnormally terminated WebSocket connections. Big thanks to @altV for helping to catch + diagnose this issue. This change should also help to suppress possible "idle connection" Heroku warnings, Ref. https://goo.gl/zLR0Gk - [#150 #159] Allow server to gc lp conns Using the infrastructure in place for #230, have decided to now initiate Ajax timeouts from the server side instead of the client side. As with #230, this'll help the underlying http server gc any abnormal connections. In particular, this should (?) resolve #159 (for Immutant) w/o the need for the earlier Immutant-side workaround introduced for that issue.

@ptaoussanis

v1.9.0 is a significant, non-breaking upgrade focused on addressing a couple minor issues that were recently identified (#201 and #230). While in the code, also took the opportunity to refactor some implementation details. - @ptaoussanis This is a squashed commit that includes: - Extend ref example to incl. a pair of buttons to excercise async push features. - Drop (experimental) flexi packer. - [#161] Clojure-side Transit optimizations Specifically: 1. Now cache read-handlers and write-handlers. Can't tell if this actually realistically helps perf since I've got no handler maps to test against; feedback welcome. 2. Now cache (thread-local) writer (allows baos reuse). Doesn't seem to actually realistically help perf from what I can tell. Might nix this later since this does add some complexity to the impl. - [#201] Add support for more flexible conn-type upgrade/downgrade Initial downgrade strategy here is simple, may or may not turn out to be useful; still need to check. Point was to get a more flexible base that we can build on in future. - [#230] Server-side ping to help GC non-terminating WebSocket conns. If a WebSocket connection is closed without normal termination (e.g. as caused by sudden loss of power, airplane mode, etc.) - it could hang around in conns_ for an extended period of time until the underlying TCP connection was identified as dead. This mods Sente's keep-alive ping from client->server to server->client, allowing the server to identify (and auto-gc) dead but abnormally terminated WebSocket connections. Big thanks to @altV for helping to catch + diagnose this issue. This change should also help to suppress possible "idle connection" Heroku warnings, Ref. https://goo.gl/zLR0Gk - [#150 #159] Allow server to gc lp conns Using the infrastructure in place for #230, have decided to now initiate Ajax timeouts from the server side instead of the client side. As with #230, this'll help the underlying http server gc any abnormal connections. In particular, this should (?) resolve #159 (for Immutant) w/o the need for the earlier Immutant-side workaround introduced for that issue.

@ptaoussanis

v1.9.0 is a significant, non-breaking upgrade focused on addressing a couple minor issues that were recently identified (#201 and #230). While in the code, also took the opportunity to refactor some implementation details. - @ptaoussanis This is a squashed commit that includes: - Extend ref example to incl. a pair of buttons to excercise async push features. - Drop (experimental) flexi packer. - [#161] Clojure-side Transit optimizations Specifically: 1. Now cache read-handlers and write-handlers. Can't tell if this actually realistically helps perf since I've got no handler maps to test against; feedback welcome. 2. Now cache (thread-local) writer (allows baos reuse). Doesn't seem to actually realistically help perf from what I can tell. Might nix this later since this does add some complexity to the impl. - [#201] Add support for more flexible conn-type upgrade/downgrade Initial downgrade strategy here is simple, may or may not turn out to be useful; still need to check. Point was to get a more flexible base that we can build on in future. - [#230] Server-side ping to help GC non-terminating WebSocket conns. If a WebSocket connection is closed without normal termination (e.g. as caused by sudden loss of power, airplane mode, etc.) - it could hang around in conns_ for an extended period of time until the underlying TCP connection was identified as dead. This mods Sente's keep-alive ping from client->server to server->client, allowing the server to identify (and auto-gc) dead but abnormally terminated WebSocket connections. Big thanks to @altV for helping to catch + diagnose this issue. This change should also help to suppress possible "idle connection" Heroku warnings, Ref. https://goo.gl/zLR0Gk - [#150 #159] Allow server to gc lp conns Using the infrastructure in place for #230, have decided to now initiate Ajax timeouts from the server side instead of the client side. As with #230, this'll help the underlying http server gc any abnormal connections. In particular, this should (?) resolve #159 (for Immutant) w/o the need for the earlier Immutant-side workaround introduced for that issue.

Quick sketch of a possible implementation. Didn't test, or think about this much. Situation before #230: Clients maintain keep-alive, sending pings to server. On sudden conn break (e.g. airplane mode), clients would know about the break but the server wouldn't Situation after #230: Server maintains keep-alive, sending pings to clients. On sudden conn break (e.g. airplane mode), server would know about the break but clients wouldn't. As of this commit: Server and clients each maintain a keep-alive[1]. I.e. each side will attempt to ping if it hasn't heard from the other side in a prescribed window. On sudden conn break, each side will be made aware of the break when its own scheduled window fires and fails to successfully send a ping. [1] Timeouts should be different. In particular, the client may like to choose a much more aggressive window (e.g. 5s) to provide a rapid indicator to users.

…ane mode) Situation before #230: Clients maintain keep-alive, sending pings to server. On sudden conn break (e.g. airplane mode), clients would know about the break but the server wouldn't Situation after #230: Server maintains keep-alive, sending pings to clients. On sudden conn break (e.g. airplane mode), server would know about the break but clients wouldn't. As of this commit: Server and clients each maintain a keep-alive[1]. I.e. each side will attempt to ping if it hasn't heard from the other side in a prescribed window. On sudden conn break, each side will be made aware of the break when its own scheduled window fires and fails to successfully send a ping. [1] Timeouts should be different. In particular, the client may like to choose a much more aggressive window (e.g. 5s) to provide a rapid indicator to users.

danielcompton added the bug? label May 8, 2016

ptaoussanis changed the title ~~Sente does not detect android client disconnects~~ Sente not detecting android client disconnects? May 9, 2016

ptaoussanis added a commit that referenced this issue May 11, 2016

[#230] Ref project: fix missing /target/main.js link (@altV)

e55bed9

Didn't notice that the link was being excluded by .gitignore

ptaoussanis changed the title ~~Sente not detecting android client disconnects?~~ Sente not detecting non-terminating android client disconnects? (e.g. sudden power loss / airplane mode) May 11, 2016

ptaoussanis changed the title ~~Sente not detecting non-terminating android client disconnects? (e.g. sudden power loss / airplane mode)~~ Sente not detecting non-terminating client disconnects? (e.g. sudden power loss / airplane mode) May 11, 2016

ptaoussanis added bug and removed bug? labels May 11, 2016

ptaoussanis closed this as completed May 15, 2016

ptaoussanis mentioned this issue May 17, 2016

Unit Testing #215

Closed

ptaoussanis added a commit that referenced this issue May 25, 2016

[#230] Ref project: fix missing /target/main.js link (@altV)

b8e3827

Didn't notice that the link was being excluded by .gitignore

mbech mentioned this issue Aug 18, 2016

How to detect loss of network connection client-side in Chrome #259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sente not detecting non-terminating client disconnects? (e.g. sudden power loss / airplane mode) #230

Sente not detecting non-terminating client disconnects? (e.g. sudden power loss / airplane mode) #230

altV commented May 2, 2016

ptaoussanis commented May 2, 2016

altV commented May 7, 2016 •

edited

Loading

danielcompton commented May 7, 2016

altV commented May 8, 2016

danielcompton commented May 8, 2016

ptaoussanis commented May 9, 2016

altV commented May 9, 2016 •

edited

Loading

danielcompton commented May 9, 2016

ptaoussanis commented May 10, 2016 •

edited

Loading

ptaoussanis commented May 10, 2016 •

edited

Loading

altV commented May 10, 2016 •

edited

Loading

altV commented May 10, 2016 •

edited

Loading

altV commented May 10, 2016

ptaoussanis commented May 11, 2016

ptaoussanis commented May 11, 2016

ptaoussanis commented May 11, 2016

altV commented May 11, 2016 •

edited

Loading

altV commented May 11, 2016 •

edited

Loading

ptaoussanis commented May 11, 2016

ptaoussanis commented May 11, 2016

ptaoussanis commented May 11, 2016

ptaoussanis commented May 11, 2016

theasp commented May 11, 2016 •

edited

Loading

ptaoussanis commented May 11, 2016 •

edited

Loading

theasp commented May 11, 2016

danielcompton commented May 11, 2016

altV commented May 12, 2016 •

edited

Loading

ptaoussanis commented May 12, 2016 •

edited

Loading

danielcompton commented May 12, 2016 •

edited

Loading

ptaoussanis commented May 13, 2016

altV commented May 13, 2016

ptaoussanis commented May 14, 2016

ptaoussanis commented May 15, 2016

altV commented May 20, 2016

Sente not detecting non-terminating client disconnects? (e.g. sudden power loss / airplane mode) #230

Sente not detecting non-terminating client disconnects? (e.g. sudden power loss / airplane mode) #230

Comments

altV commented May 2, 2016

ptaoussanis commented May 2, 2016

altV commented May 7, 2016 • edited Loading

danielcompton commented May 7, 2016

altV commented May 8, 2016

danielcompton commented May 8, 2016

ptaoussanis commented May 9, 2016

altV commented May 9, 2016 • edited Loading

danielcompton commented May 9, 2016

ptaoussanis commented May 10, 2016 • edited Loading

ptaoussanis commented May 10, 2016 • edited Loading

altV commented May 10, 2016 • edited Loading

altV commented May 10, 2016 • edited Loading

altV commented May 10, 2016

ptaoussanis commented May 11, 2016

ptaoussanis commented May 11, 2016

ptaoussanis commented May 11, 2016

altV commented May 11, 2016 • edited Loading

altV commented May 11, 2016 • edited Loading

ptaoussanis commented May 11, 2016

ptaoussanis commented May 11, 2016

ptaoussanis commented May 11, 2016

ptaoussanis commented May 11, 2016

theasp commented May 11, 2016 • edited Loading

ptaoussanis commented May 11, 2016 • edited Loading

theasp commented May 11, 2016

danielcompton commented May 11, 2016

altV commented May 12, 2016 • edited Loading

ptaoussanis commented May 12, 2016 • edited Loading

danielcompton commented May 12, 2016 • edited Loading

ptaoussanis commented May 13, 2016

altV commented May 13, 2016

ptaoussanis commented May 14, 2016

ptaoussanis commented May 15, 2016

altV commented May 20, 2016

altV commented May 7, 2016 •

edited

Loading

altV commented May 9, 2016 •

edited

Loading

ptaoussanis commented May 10, 2016 •

edited

Loading

ptaoussanis commented May 10, 2016 •

edited

Loading

altV commented May 10, 2016 •

edited

Loading

altV commented May 10, 2016 •

edited

Loading

altV commented May 11, 2016 •

edited

Loading

altV commented May 11, 2016 •

edited

Loading

theasp commented May 11, 2016 •

edited

Loading

ptaoussanis commented May 11, 2016 •

edited

Loading

altV commented May 12, 2016 •

edited

Loading

ptaoussanis commented May 12, 2016 •

edited

Loading

danielcompton commented May 12, 2016 •

edited

Loading