Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sente doesn't work well on very low quality connections. #270

Closed
kamituel opened this issue Sep 22, 2016 · 5 comments
Closed

Sente doesn't work well on very low quality connections. #270

kamituel opened this issue Sep 22, 2016 · 5 comments

Comments

@kamituel
Copy link

I'm trying to use Sente in AJAX fallback mode over satellite link and I'm experiencing severe difficulties. Sente is trying to deliver any given message for less then a second and if it fails, it drops it silently and doesn't even print a warning.

It's all in send-buffered-server-evs>ajax-clients! that uses these defaults:

nmax-attempts 7
ms-base 90
ms-rand 90
;; (* 7 (+ 90 (/ 90 2.0))) ~= 945ms

While it works well in most cases, it's way to little for a satellite link or for old 2G GPRS or EDGE mobile connection.

I'm able to remedy this by redef'ing send-buffered-server-evs>ajax-clients! with much larger values (ms-base and ms-rand set to 200 ms, and nmax-attempts set to 100 - although I'm still tweaking it).

So I think it'd make sense to do one or few out of those:

  • make ms-base, ms-rand and nmax-attempts configurable
  • replace those conf options with a custom callback that would take a single argument - n (number of attempt) and would return a number of milliseconds to wait before next attempt, or -1 if no attempt should be made.
  • add better logging in Sente, something along those lines:
            ;; (tracef "now-satisfied: %s" now-satisfied)
            (cond
              (>= n nmax-attempts)
              (errorf "giving up on delivering message after %d retries" n)

              (some (complement now-satisfied) client-ids-unsatisfied)
              (do
                (debugf "delivery attempt %d out of max %d" n nmax-attempts)
                ;; Allow some time for possible poller reconnects:
                (<! (async/timeout (+ ms-base (rand-int ms-rand))))
                (recur (inc n) now-satisfied)))
@ptaoussanis
Copy link
Member

ptaoussanis commented Sep 22, 2016

Hi there, thanks for the clear report.

I'm hesitant to make these parameters configurable for the reason that async server broadcasts will (and can) never be fully reliable. The mechanism you've identified here (in send-buffered-server-evs>ajax-clients!) is primarily intended to help smooth over polling reconnects, not as a way to ensure delivery.

You'll note, for example, that there's no analogous retry/deliverability mechanism for WebSockets.

Generally speaking, if you want delivery confirmation (and/or more control over fail cases) - you'll want to initiate the request from the client (where you have timeouts, and callbacks, etc.).

Or, a fairly simple pattern: broadcast from the server as usual, and expect a confirmation response / ack from the clients. If no ack is received (with whatever parameters you prefer), you can rebroadcast.

Sente provides the primitives to do this easily and efficiently, without getting in your way re: exactly how you should implement app-specific concerns like how deliverability should be handled. As an example, it's quite common for different broadcast messages to have different priority. Some you can happily drop. Some you'll want to retry once. Some you'll want to retry 50 times until they're received.

Trying to pack features like this into Sente to make them automatic can seem appealing at first, but tend to lead to a bloated design with tons of config that still lacks the real flexibility that you'd want when designing a high-performance production application. Again, for example: how would you provide support for per-broadcast configuration? Trying to load all that onto Sente quickly becomes unwieldy and doesn't buy you much since Clojure already provides really nice tools for the hard parts (core.async, etc.).

So I'd perhaps suggest restating the problem as: "Async applications don't work well on very low quality connections, unless you design your applications carefully."

I do regret that I haven't had time yet to write much documentation re: good async application design with Sente. A lot of the issues I'm seeing lately seem to relate to this kind of high-level usage, so I'll try make it a priority when I have some time again.

In the meantime, for your case- would suggest just implementing a simple broadcast->ack for async messages that need deliverability guarantees. When possible, also try initiate important requests from the client end. That'll then solve your problem in a configurable, protocol-agnostic way.

Does that seem reasonable / make sense?

@kamituel
Copy link
Author

Thanks. I see your point, however I think it's the case that if there's connectivity issue, plain raw WebSocket would fail and close. I found this in RFC 6455:

If at any point the underlying transport layer connection is
unexpectedly lost, the client MUST Fail the WebSocket Connection.

Also, application can register to the "close" event and learn that something broke, and - if it feels like it - it can also use this oppotunity to recover.

However, when Sente falls back, or is configured to use, HTTP polling, and connectivity fails for whatever reason, no error is being logged or returned to the application - it fails silently and there's no way to recover reliably. Sente would reconnect, but one or few messages might get lost.

I recognise that Sente wouldn't ever protect application from it's own errors, just like TCP or WebSocket don't, but it should expose all the tools needed to do so - which means either deliver message or fail in a way that would let application to recover. This isn't the case when HTTP polling is used.

@ptaoussanis
Copy link
Member

ptaoussanis commented Sep 22, 2016

however I think it's the case that if there's connectivity issue, plain raw WebSocket would fail and close. I found this in RFC 6455

Indeed, yes. That's why I wanted to draw attention to the fact that there was no automatic reliability mechanism for WebSockets either. I.e. this isn't something that affects long polling specifically, and isn't something that I believe we should try solve at the long-polling level.

no error is being logged or returned to the application - it fails silently and there's no way to recover reliably

No error is being logged but, again, that's intentional. The async broadcast API promise is:

Delivery will be attempted.
A million different things can go wrong, some of them impossible to detect, so we're
not even going to bother. Instead, if you want reliability: design your application to
provide an ack response from clients. You have full control over the reliability
design (and logging) that you want.

Does that make sense?

I recognise that Sente wouldn't ever protect application from it's own errors, just like TCP or WebSocket don't, but it should expose all the tools needed to do so

Sente can't reasonable provide what you're asking for without a terribly cumbersome API, or without being unreasonably inflexible. Instead, Sente provides primitives that compliment the primitives already made available by the platform (in this case, Clojure/Script).

You can build applications as reliable as you like, under any sort of connection conditions, and relatively easily - with a simple application-level response/ack model.

Again, it's important that it be application-level in order to have the kind of flexibility that you're likely to want for real production applications. Building async systems requires designing for async systems - and that's something that Sente mostly, intentionally leaves up to you.

On a good connection, you mostly don't have to think about any of these things because everything "just works". As you've seen, a bad connection will illuminate design inefficiencies. (e.g. the lack of an ack model when you need it). But in principle the same patterns should ideally be used for robustness, even when good connections are present.

Hope some of that is useful?

Like I say, I do recognise that I really need to write more docs re: high-level application design - it's just quite an undertaking and haven't had any free time for quite a while.

@kamituel
Copy link
Author

Hey, yeah it's useful, as it explains rationale behind Sente that I wasn't aware of in 100%, to be honest. I think your post above would be a great addition to the docs as it makes plenty of things clear.

I think you can close this issue. Thanks!

@ptaoussanis
Copy link
Member

No problem, thanks for the question. Will try add some related docs when I have the time; otherwise can point people here so long.

Cheers :-)

ptaoussanis added a commit that referenced this issue Oct 2, 2016
Slightly smarter strategy re: waiting for possible Ajax repolling.

In particular, should provide more reliable async broadcasting to
Ajax clients on poor connections that take very long to repoll. In
these cases, send buffering should also be increased.

Note that this doesn't change the general advice given in #270.
Applications that need guaranteed async broadcasts should still use
an appropriate application-level ack design when possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants