Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add readable method to Socket #238

Closed
wants to merge 1 commit into from
Closed

Add readable method to Socket #238

wants to merge 1 commit into from

Conversation

guyskk
Copy link

@guyskk guyskk commented Dec 17, 2017

I need this method for HTTP keep-alive check, and I found there is a writeable method but no readable method, here is the demo code for my use case:

try:
    await timeout_after(self.keep_alive_timeout, cli_sock.readable())
except TaskTimeout as ex:
    await cli_sock.close()

@guyskk
Copy link
Author

guyskk commented Dec 18, 2017

The test test_worker_timeout failed, but I had no idea how to fix it.
And should I replace all await _read_wait(self._fileno) and await _write_wait(self._fileno) to
await self.readable() and await self.writeable() ?

@dabeaz
Copy link
Owner

dabeaz commented Dec 19, 2017

I'm not really sure I fully understand the utility of a separate readable() method. Wouldn't a call to readable() almost always be followed by a call to actually read the data? If so, wouldn't it make more sense to put the timeout on the read operation instead?

For what it's worth, the writeable() method is there primarily to enable a non-greedy write. Normally, Curio, tries to write first and only blocks if the write fails. writeable() is there to flip the order around if you want--you can check to make sure a socket is writeable first and then only write if it would succeed. To be honest, I'm somewhat inclined to remove the writeable() method rather than give it a readable() sibling.

@imrn
Copy link

imrn commented Dec 19, 2017 via email

@dabeaz
Copy link
Owner

dabeaz commented Dec 19, 2017

No. Any blocking operation whatsoever can have a timeout applied to it. I'm not going to special case timeouts at the socket/stream level. What I mean, is why wouldn't you just do this:

 async with timeout_after(timeout):
          data = await sock.recv(maxsize)       # Or whatever

I don't see what having a separate blocking readable() method is really buying you.

@guyskk
Copy link
Author

guyskk commented Dec 19, 2017

@dabeaz
Yes, a call to readable() is followed by a call to actually read the data, but the caller of readable() is something like connection manager, which should't actually read the data,
and the caller of read() is something like http parser, which will consume the data.
I don't like to implement the connection manager logic (eg: keep_alive_timeout) in http parser.

@imrn
Copy link

imrn commented Dec 19, 2017 via email

@imrn
Copy link

imrn commented Dec 19, 2017 via email

@dabeaz
Copy link
Owner

dabeaz commented Dec 19, 2017

One of the reasons why there is no timeout parameter on calls is that literally every single blocking call in the whole library potentially allows a timeout. Early on, the API for that was getting to be insane. It made much more sense (to me anyway) to have some kind of wrapper that could apply a timeout to whatever you were doing if you really wanted it.

@imrn
Copy link

imrn commented Dec 19, 2017 via email

@CreatCodeBuild
Copy link

In my own uses, writable() and readable() were never needed.

@njsmith
Copy link
Contributor

njsmith commented Dec 20, 2017

The use case for writable is when you have a latency-sensitive protocol, and you want to delay making a decision about which bits to send until the kernel is ready to accept them - see #83. For example, imagine a screen-sharing protocol that loops on taking a screenshot and sending it -- if the network isn't ready to accept data, then you want to delay taking the screenshot until it is, rather than queue up stale data.

The use case for readable is very specific: if you have an HTTP client with a connection pool, then after using a connection you put it back in the pool and ignore it for a while. Later, when you decide to make another request to the same server, you pull the connection back out again to re-use. However, in the mean time, the server might have gotten bored and closed the connection (possibly after sending some 408 Request Timeout response). So when retrieving a connection from the pool, you want to do something like:

  1. check if the socket is readable. If so, the server has given up. This isn't an error; it just means we need to close this connection and make a new one.
  2. Once we have a connection that seems to be valid, send our request, while simultaneously listening for a response. At this point, an early response or socket close indicates an error → this request failed.

So you really do want a non-blocking "is there data there to read?" query here. In my efforts to port urllib3 to Trio, my plan so far is to just drop down to the lower level socket API here and call select or whatever, since this is such a weird and specialized thing that I don't want to put it into my abstract stream API.

Note in particular that for a TLS-encrypted connection, there isn't in general any way to check if the logical, application-level data stream is readable without actually trying to read data, but in this case that's not what we want anyway – here we really just want to check if there's any data on the underlying socket level.

@guyskk
Copy link
Author

guyskk commented Dec 20, 2017

Let me guess: You are trying to implement http1.1
pipelining or http2.0 server or a middleware like
proxy... Right?

I'm tring to implement http1.1 persistent connection in server side, I try to show the wholistic view
in code below:

async def _worker(self, cli_sock, cli_addr):
    # browsers may preconnect but didn't send request immediately
    # keep-alive connection has similar behaviors
    # just close the connection if keep_alive_timeout exceed
    try:
        await timeout_after(self.keep_alive_timeout, cli_sock.readable())
    except TaskTimeout as ex:
        await cli_sock.close()
        return
    # parse request, will send `408 Request Timeout` response if request_timeout exceed
    parser = RequestParser(cli_sock, cli_addr)
    try:
        request = await timeout_after(self.request_timeout, parser.parse())
    except TaskTimeout as ex:
        response = Response(408)  # Request Timeout
    else:
        response = await self.app(request)
    # send response
    async for chunk in response:
        await cli_sock.sendall(chunk)
    # close the connection or wait for next request(keep-alive)
    if not response.keep_alive:
        await cli_sock.close()
    else:
        await spawn(self._worker(cli_sock, cli_addr), daemon=True)

The use case for readable is very specific: if you have an HTTP client with a connection pool, then after using a connection you put it back in the pool and ignore it for a while. Later, when you decide to make another request to the same server, you pull the connection back out again to re-use. However, in the mean time, the server might have gotten bored and closed the connection (possibly after sending some 408 Request Timeout response)

The ConnectionPool will check if the socket is readable,
but this can't be implemented by await sock.readable() because
await sock.readable() will block until socket is readable.
It needs a sync and non-blocking is_readable() method actually.

I implemented this check very ugly, in curequests(https://github.com/guyskk/curequests/blob/master/curequests/connection_pool.py#L75):

def _is_peer_closed(self):
    """check if socket in close-wait state"""
    # the socket is non-blocking mode, read 1 bytes will return EOF
    # which means peer closed, or raise exception means alive
    try:
        r = self.sock._socket_recv(1)  # FIXME: I use a private method, bad!
    except WantRead:
        return False
    except WantWrite:
        return False
    assert r == b'', "is_peer_closed shouldn't be called at this time!"
    return True

@njsmith
Copy link
Contributor

njsmith commented Dec 20, 2017

Oh, I see, right, I got is_readable and wait_readable mixed up...

For the server, I don't see why you want a wait_readable. I can see how t might be useful if you want to provide a timeout for time-to-first-request-byte, but (a) I'm not sure what that specific timeout semantics are especially useful -- e.g. in this example curio web server I just imposed a timeout on time-to-end-of-request-headers. And (b) it doesn't seem like it'd be that hard to pass some argument into parser.parse saying "please wrap your first call in conn.recv (or whatever curio calls it) in a timeout_after with this timeout".

@njsmith
Copy link
Contributor

njsmith commented Dec 20, 2017

Oh, and speaking of removing writable, I'm actually considering one approach for doing that in trio here: python-trio/trio#371

But I doubt Dave will like it :-)

@imrn
Copy link

imrn commented Dec 20, 2017 via email

@dabeaz
Copy link
Owner

dabeaz commented Dec 20, 2017

The only reason I wouldn't like the Trio approach is that it's trying to make Trio do something--I don't want Curio to do anything. Doing things is the whole problem.

That said, I'm kind of inclined to remove writeable() and readable(). Mainly in the interest of having a smaller API, but also because they're kind of misleading (i.e., are people confused into thinking that such calls are necessary?). If one really needs to test is a socket is readable, I concur that one could run it through a zero-timeout select() call and find out. I'd consider it to be a special case. I'd also consider all uses of writeable() to be a special case that Curio doesn't really need to address directly.

@njsmith
Copy link
Contributor

njsmith commented Dec 20, 2017

@dabeaz I was thinking you wouldn't like it because it would force every call to sendall to yield to the kernel, which was a point we disagreed on before :-). Probably also relevant to note though that that trio issue is talking about Trio's higher-level stream interface, the equivalent of curio.SocketStream rather than curio.io.Socket. (However, trio.SocketStream.wait_send_all_might_not_block is of course implemented using trio.socket.SocketType.wait_writable.)

@guyskk
Copy link
Author

guyskk commented Dec 20, 2017

I agree with "having a smaller API" since my needs can be implemented without readable() method.
It's easy to add features but not easy to remove features.

@dabeaz
Copy link
Owner

dabeaz commented Dec 23, 2017

This is more of a meta-question, but has anyone ever implemented a "screen-sharing application" or something similar to it in Python? If so, what library was used?

@njsmith
Copy link
Contributor

njsmith commented Dec 23, 2017

@dabeaz Part of the reason that use case comes to mind so quickly for me is that long ago I wrote a little app called Xpra, which is still in reasonably wide use under different management. For networking I originally used glib (since I was using gtk for the UI anyway), and then later switched to threads (sigh) for better Windows compatibility. I'm not sure what they use these days.

Screen sharing is also the example that Apple used in their WWDC talk on TCP_NOTSENT_LOWAT and related things, I guess because it's so dramatic.

HTTP/2 is another protocol that acts like screen sharing in this regard.

@guyskk guyskk closed this Dec 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants