-
Notifications
You must be signed in to change notification settings - Fork 193
self._sock is None inside of _single_read #256
Comments
So the This is definitely a behaviour I don't love: if the remote peer closes the connection with error code NO_ERROR, hyper doesn't handle it very gracefully. The ideal behaviour would be to quiesce the connection. That's quite a substantial change, however. In the meantime, can I get a feel for how you're using the code? Are you running multithreaded, or from a single thread? |
In the short term the easiest fix is probably to raise an exception in all connection termination cases: at least that way your code would be able to spot this and handle it as appropriate. |
I am using the library through pyapns2: https://github.com/Pr0Ger/PyAPNs2 We schedule an apple push notification to an iOS device as a background job, and a thread from a thread pool in a background job handles the request and uses pyapns2 (which uses hyper) to send a push notification through apple's http2 api backend. I am going to try to see if it makes a difference if I use Greenlet thread workers rather than python native threads. But because I see it happen simultaneously on different servers I suspect it is apple's API dropping connections. In that case, an exception on connection termination would be fine by me because it would be much easier for my application to correctly handle. A TypeError exception is hard to do anything intelligent about. |
Could you make it raise a Connection exception if the connection closes and there are still open stream requests associated with the connection? Not sure if that is possible. Because I think if the connection gets closed and there are no outstanding requests there is no real problem. The next request would cause a new connection to be made. It's only if you are waiting on a response or in the middle of sending a request that there is a problem. |
Looking at the code, here's where I can see a problem: https://github.com/Lukasa/hyper/blob/development/hyper/http20/connection.py#L678-L682 If you have multiple streams outstanding and the connection terminates, you close the connection, making self._sock = None. then if other threads are in progress reading from different streams on the same connection at the same time they could run into a problem in lots of other spots. |
The answer is yes, we can make it throw exceptions if there are active streams. I think, in fact, that there is an issue here related to that. Certainly, however, I think that's the best way to go. =) |
since you are inside a read_lock, seems like you may just need to check if self._sock is None: https://github.com/Lukasa/hyper/blob/development/hyper/http20/connection.py#L743-L744 And maybe handle by stashing the connection error on close() so that you can re-use that reason in the stream context. |
The race condition I am seeing is two threads arrive at this line in the code simultaneously: one obtains the read lock. it goes inside of _single_read and hits this line, closing the connection: https://github.com/Lukasa/hyper/blob/development/hyper/http20/connection.py#L682 Then the lock is released and the second thread obtains a read lock and moves into _single_read only now self._sock is None and nothing else checks it. |
Yup, that seems sensible to me. Are you interested in writing a PR for this? |
Yeah I can take a stab at it. I don't have a lot of context because I never even looked at the hyper library source code until last night. I doubt I could write a great test for it to prove the issue is fixed. But I will give it a try and see how it goes. |
Thanks! =) I'm happy to help you through it. |
Looking at the design, I see one other potential problem. Imagine this scenario: request A and B are in flight. request A obtains the lock. It hits some kind of connection error and we close(). Then request C comes in and makes a request BEFORE request B tries to get the response. This causes a new _sock connection to be populated. At that point, request B has a stream id that wasn't issued by the old _sock connection. What would the behavior be in that case? It seems like what we need is for streams to be tied to a given _sock and once the sock is closed it won't get paired up with a different sock accidentally. |
I also realized that raising an exception on close is not enough because if it is a threaded application, the exception will raise in the thread that first encounters it and the other threads won't hit the exception. So we'd need to also raise exceptions when the stream read happens in the other threads. Which means saving the association between thread and sock and the state associated with that sock. |
@72squared So the reason the socket goes away is, in essence, to reset the state of the connection object so it can be transparently re-used. That's not necessarily a good thing to do automatically. It may be better to keep a kind of 'close state' around and let users manually reset the state if they need to. That would allow us to throw the exceptions from multiple threads if needed. |
The way I would probably solve this is to have a lightweight object wrapper around self._sock. This object would be passed into to a stream wrapper constructor at the time of the outbound request so that when we go to read the stream we use that same sock wrapper that was used when the stream id was issued. And this will also allow us to make sure if the connection is closed, that the stream continues to see that state, even if a new self._sock object has been created. I need to study the abstraction layer around streams first though. |
What if on close you do something like this:
then inside of _recv_cb method where we are waiting with the read lock, once we obtain it, we make sure the stream isn't closed and that the stream id is still inside of the self.streams. Actually, I think self.streams is reset to an empty set on close() so maybe just checking to see if stream_id is in self.streams ? |
We probably want a way to signal to the stream that it has been forcibly closed, but yeah, this could work. =) |
Review the unit tests. I think this will solve it. let me know what you think. |
I opened this with the apns2 package maintainer since that is where I discovered the bug but I think this is really a problem inside of hyper:
Pr0Ger/PyAPNs2#11
self._sock is None when it hits this line:
https://github.com/Lukasa/hyper/blob/development/hyper/http20/connection.py#L644
It is a rare occurance but it does happen. Any thoughts on what could be causing it?
The text was updated successfully, but these errors were encountered: