-
Notifications
You must be signed in to change notification settings - Fork 271
Added the ability to "prod" the WebSocketConnection #45
Conversation
Adds a method to the SignalServiceMessagePipe, that can be called periodically, in order to avoid issues when the device enters sleep, disrupting timing functions like Thread.sleep() and Object.wait(). // FREEBIE
Nice investigation, but this PR breaks the abstraction layer. The websocket is responsible for keeping itself alive, not the caller. If sleep or wait aren't doing the job, then they need to be replaced with something else which works but maintains the abstraction layer. |
Thank you for your prompt review. I'll comment here for the sake of completeness, for anyone else who might stumble upon this (as, from my research, it seems that many projects face similar problems). Let me start by saying that I don't have much more experience with Android, or Java, than what I've gained in researching this issue, so feel free to take what follows with a grain of salt. Hopefully someone who knows more, will be able to fill in the blanks, or correct any mistakes on my part. As I've mentioned in signalapp/Signal-Android#6644, but failed to mention here, there seems to be no other call that does the job. All timing-related constructs on the Java level, seem to use the same clock. I assume that this is not coincidental and that it is meant to force applications to use higher-level mechanisms, such as the Although I understand your concerns about braking the abstraction layer of the overall design, in a certain sense it is already broken. Conceptually the socket should take care of itself, but that is not the case. On GCM-enabled devices, it manages to do so, solely because the GCM notification receiver acquires a wake lock, forcing the device to stay awake while a message is received. As far as I can see this is otherwise not really necessary, since my patched GCM-free version works just fine without it. I believe its basic effect is to keep the socket-related threads from falling into the kind of narcoleptic state, they fall into otherwise, but this should only be necessary in order to keep the The only alternatives I can see are the following:
In the absence of any such mechanism, my understanding is that Signal can generally not function without GCM, so I would recommend removing the existing infrastructure altogether, as it would only constitute broken code that confuses users, expecting Signal to function on their device. |
If AlarmManager is the thing that works, then use the AlarmManager, but just maintain the interface. |
Would you care to sketch me a broad outline of how that can be done, in an acceptable way, or is it understood that I just go on building ready baked and fully tested PRs, that you can then summarily reject, with one-line apocryphal feedback?
Perhaps, as I have suggested, a separate service could be incorporated inside libsignal-service-android, for the sole purpose of registering and receiving an alarm to wake the socket. As far as I can tell (and I may well be wrong), that would keep the pipe's API intact, but the service still has to be started from somewhere and that somewhere is the application. Introducing a new service inside a library and having the application start it, could well be construed as a change to the interface. Would that nevertheless be acceptable? |
It sounds like you're suggesting that Thread.sleep() doesn't work on Android. If that's true, it makes sense to me that you would want to replace Thread.sleep() with an interface like SignalThread.sleep() or whatever. The java code can implement that interface as Thread.sleep(), and the Android code can implement that interface as a rendevouz that uses the alarm manager. The calling code can then pass in the implementation of the interface they desire on construction. |
Thanks for getting back to me. I think you misunderstand the issue: So when the thread starts sleeping for 55 seconds before sending the next keep-alive, if you push the power button on your device to turn off the screen 5 seconds later, then put it down and pick it up 5 hours later, the thread will wake up 50 seconds after you've picked up the device again and turned the screen on. During those 5 hours the socket will have long closed and Signal will be trapped with a stale pipe, waiting to send a keep-alive in one thread and to read something from the pipe in the other. The effect is of course the same, as if you had no connection during that time. If some other process acquires a wakelock during that period and wakes the CPU up, the clock will run while the lock is held and the threads may wake earlier, but in general, not in time to keep the socket alive. Note also, that the problem is not necessarily confined to the socket. Any code that depends on sleeping for a more-or-less precise amount of time, without keeping a wake-lock during that time, is inherently broken. That is the case whether the device is GCM-enabled or not. GCM only fixes things in this case, because it keeps a wakelock while the message notified by the push is being handled. All this has nothing to do with whether the From within If there is a better way to handle this issue, I'm all for it and will happily adopt it. And again, some of the above may be wrong, as my understanding of Java and Android is limited. I'm being a bit verbose in my descriptions both to better explain the problem, and in the hope that you'll spot any incorrect assumptions on my part. |
No, I'm pretty sure I understand the issue. You're saying The problem is with the implementation of This repository builds two artifacts, java and android. The latter depends on the former. If you define an interface such as The caller can then pass in whichever implementation of the interface they want to use (depending if they are, for instance, a java or android app) and that's it. |
Well, it seems then, that it's me who's misunderstanding. Sorry, I hoped that I'd be able to get by with whatever Java/Android I could pick up on the way, in order to make this work, but it's evidently not working out. Hopefully someone will come along who's in a better position to put this fix together. Thank you for your time. |
You're pretty close, just move some of your code around and you should be there. |
@moxie0 I think I understand what solution you have in mind, but in my opinion that will lead to a solution that is more complex and less readable than it needs to be. Adding an interface that mimics the sleep behavior of java and does Android alarm stuff behind is probably doable, but it adds just another layer between the android app and libsignal. In my opinion the library should Both changes would be very small and would not break any existing behavior. They just add a small additional feature. And with that it would be easy to set up an Alarm receiver in the android app and address that problem strait the android way. Is this something we can talk about? |
In my opinion abstraction layers make things more readable, not less. It seems that we disagree about that, but I'm going to want it done that way since I'm the one who's going to have maintain this code long term. |
Of course it is your project and your decission. I will try to implement a sleep method based on android alarms to see where that will lead us. But apart the question which implementation is more readable, fact is that we lose flexibility this way. I have two usecases in mind:
If we just mimic Thread.sleep then those two usecases are not (directly) solveable in the future. (Especially for the first one I have some code in mind. The second one is a rather complex task and would need a lot of work and experimenting.) And as a side note: We do not disagree on whether abstraction layers make things more readable. But in my opinion BOTH variants break the layer, because it is needed in this case. One solution does it by clearly opening the layer, the other one does it through the backdoor by allowing to inject platform dependend code into the library. But as long as we agree that this is a problem that needs to be solved I am positive that we find a solution. |
Adds a method to the
SignalServiceMessagePipe
, that can be calledperiodically, in order to avoid issues when the device enters sleep,
disrupting timing functions like
Thread.sleep()
andObject.wait()
.// FREEBIE
The Problem
This pull request is part of a solution to issue signalapp/Signal-Android#6644. The problem is that when the device enters sleep mode (basically as soon as the power button is pressed, provided no wake locks are held), the
uptimeMillis()
clock on which most interval timing functions such asThread.sleep(millls)
andObject.wait(millis)
depend stops ticking.The result is that WebSocket-handling code, that performs blocking wait, never awakes as the timeout never expires. This seems to be of no consequence on GCM-enabled devices, as the GCM notification wakes the device, which afterwards acquires a wake lock, until the message is received, but can make the application unusable when GCM is not available.
Note though, that it does happen in the presence of GCM as well: When the wake-lock is released and the device falls asleep (after any messages have been processed), return from
readRequest()
can be delayed indefinitely and hold the socket open. The keep alive is also missed and the socket is closed with anEOFException
. I'm not sure whether this can have any practical effect on the overall operation of the application though. In any case, the solution proposed below can work with GCM-enabled devices as well, if deemed necessary.This problem arises in three cases:
When waiting to issue the next keep alive inside
KeepAliveSender
's thread: Here the typical result is that keep-alives are missed, as soon as the device falls asleep and the socket closes soon afterwards. Reception of any messages arriving after that, is delayed for an indefinite amount of time, typically until the user wakes the device (or until some other process wakes the device, long enough for the timeout to expire).When blocking while waiting for the data to become available on the socket: As far as I can see, the only result of any consequence here, is that the shutdown of the pipe corresponding to the socket can be delayed indefinitely. I'm not sure whether this can have any adverse effects on operation.
When blocking after a disconnection and before attempting to reconnect: This has the obvious consequence that reconnection, say after a socket error, can be delayed indefinitely, possibly leading to connectivity issues.
The attempted solution
The solution, or at least the half that concerns the socket, is to provide a method to jolt the socket-related threads out of sleep periodically (called "proding" the pipe/socket). Note that this is not the same as polling: The prod only substitutes for the expiration of
Thread.sleep()
andObject.wait()
, which will never come. The basic nature of the operation of the socket, remains unchanged.The only essential purpose of the prod, is to wake the
KeepAliveSender
's thread and ensure delivery of the keep-alive. The approach is to substitute the call toThread.sleep()
, with await()
on a lock, that is global to all threads. Each time the socket is prodded,notifyAll()
is called on the lock and the threads awake. This approach nicely accommodates the possibility of more than one sockets co-existing at any one time.When the
KeepAliveSender
's thread awakens from a prod, it alsonotifyAll()
s theWebSocketConnection
, in order to make sure it's not stuck blocking inreadRequest()
oronClosed()
. Since the prods will practically arrive 1 minute apart, we cannot depend on them to wake us when blocking before a reconnection, so this wait, has been converted to a hybrid approach, where the first few attempts are busy-waits, to ensure a fast reconnection. The wait has also been moved, so as to only occur when a reconnection will follow. It therefore has practically no effect on GCM-enabled devices, which generally won't need to reconnect very often, as they close the pipe after each operation is concluded.