Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection not closed when request is cancelled #253

Closed
nbraem opened this issue Jan 21, 2015 · 17 comments · Fixed by #254
Closed

Connection not closed when request is cancelled #253

nbraem opened this issue Jan 21, 2015 · 17 comments · Fixed by #254
Labels

Comments

@nbraem
Copy link
Contributor

nbraem commented Jan 21, 2015

When making a lot of concurrent client requests to the same host using connection pooling, TCPConnector will create a lot of connections. When cancelling these client requests, not all connections are closed.

This is not easy to reproduce. I have a long running server process, and I can see the open sockets increasing over time.

Here's a gist that reproduces the conditions. When the code prints DONE, leave the process running and look at the open sockets (e.g. lsof -p | grep http). After the keepalive_timeout of 30 seconds, most connections are closed, but occasionally (1 in 5 runs for me), there is a dangling socket. You may need to change the sleep timeout for cancellation depending on how fast your connection is:
https://gist.github.com/nbraem/0e5178288ffd94372062

I have a fix, I'll send a pull request shortly.

@fafhrd91
Copy link
Member

try to add "yield from r.release()" after this line "r = yield from request('get', 'http://yahoo.com', connector=connector)"

@fafhrd91
Copy link
Member

here is proper make_request() function:

    @asyncio.coroutine
    def make_request():
        r = yield from request('get', 'http://yahoo.com', connector=connector)
        yield from r.release()
        finished.append(r)

@nbraem
Copy link
Contributor Author

nbraem commented Jan 21, 2015

Same thing.

@nbraem
Copy link
Contributor Author

nbraem commented Jan 22, 2015

Ran the fix overnight, here are graphs of the open sockets before and after the fix.

Before:
screen shot 2015-01-22 at 10 34 59 am

After:
screen shot 2015-01-22 at 10 35 41 am

@kxepal
Copy link
Member

kxepal commented Jan 22, 2015

Cannot reproduce the issue. Well, actually I can create connections leak by using high request rate when connections are opens much faster than OS is able to clean them. Using yield from r.release() doesn't closes the connection, but turn it into TIME_WAIT state where it remains for some time until your OS will finally close it. That's ok. By explicit call r.close() you terminates the connection completely pushing it to CLOSED state. The proposed fix doesn't change the whole picture how me: nothing leaks.

P.S. aiohttp 0.14.1

@nbraem
Copy link
Contributor Author

nbraem commented Jan 22, 2015

Ok, I'll see if I can make a new gist that can reproduce it, that resembles my setup more closely.

@nbraem
Copy link
Contributor Author

nbraem commented Jan 22, 2015

Ok, here's a gist that more closely reproduces my problem, using this code, I can reproduce it every time:
https://gist.github.com/nbraem/05972f50d22d63869796

It creates an origin server and a proxy server. The proxy server will create many connections to the origin server. When it blows up, sockets are left open.

I hammer the proxy server like so:

ab -n 10000000 -c 1000 http://127.0.0.1:1234/

Thanks a lot for trying to help me, I really appreciate it!

@fafhrd91
Copy link
Member

i think this is related keep-alive, by default TCPConnector uses keepalive_timeout=30, try to reduce it to 1

@nbraem
Copy link
Contributor Author

nbraem commented Jan 22, 2015

I'm callilng response.read() first and then response.release().

Just tried keepalive_timeout=1, same result though. It cleans up some sockets, but not all of them.

@fafhrd91
Copy link
Member

if keepalive_timeout helps then it is different problem. you open too many connections, aiohttp does not release connections fast enough. you may need to write custom TCPConnector that opens only limited number of connections, or you can recycle TCPConnector after some amount of processed requests.

@nbraem
Copy link
Contributor Author

nbraem commented Jan 22, 2015

No, that's not it. Once the connection between the client (apachebench in this case) and the aiohttp server is lost, the aiohttp server will cancel the current request (throws a CancelledError in the handle_request coroutine), which causes the BaseConnector to lose references to some Connection objects. The underlying sockets of those Connection objects are never closed. I have a fix for that exact case. So if you have a look, you'll see how this can happen: nbraem@73016c4

This is not so far fetched, because connections to an aiohttp server can break and the server might also create a lot of connections itself, e.g. to a database or to a cache.

You are totally right that TCPConnector should have a configurable maximum of open connections per host, which would also solve my problem. I was planning on also doing that. I think it would be a great addition to aiohttp, because of the database connection use case.

@fafhrd91
Copy link
Member

i'm not sure why removing this line "self._conns.pop(key, None)" helps.

@nbraem
Copy link
Contributor Author

nbraem commented Jan 22, 2015

I have a second commit in the pull request which is better: nbraem@5dd86d0

In the case I'm describing, self._conns[key] will contain an array of hundreds of open connections to the same host. But if one of them satisfies this condition:

if should_close or (reader.output and not reader.output.at_eof()):

Then the entire array of connections is popped, and the cleanup task will never be able to close the sockets (transports) when the keepalive_timeout hits. So you've lost the reference to the sockets, and they're never closed.

@fafhrd91
Copy link
Member

Ok, that's better

On Thursday, January 22, 2015, Nicolas Braem notifications@github.com
wrote:

I have a second commit in the pull request which is better: nbraem/aiohttp@5dd86d0
nbraem@5dd86d0

In the case I'm describing, self._conns[key] will contain an array of
hundreds of open connections to the same host. But if one of them satisfies
this condition:

if should_close or (reader.output and not reader.output.at_eof()):

Then the entire array of connections is popped, and the cleanup task will
never be able to close the sockets (transports) when the keepalive_timeout
hits. So you've lost the reference to the sockets, and they're never closed.


Reply to this email directly or view it on GitHub
#253 (comment).

fafhrd91 added a commit that referenced this issue Jan 22, 2015
asvetlov added a commit that referenced this issue Jan 23, 2015
@asvetlov
Copy link
Member

Grr. I would like to have at lease one unittest for any bugfix, especially for very subtle one as the issue.

I've done it in d5e4d6a but please keep our test suite strict.

@asvetlov
Copy link
Member

Oops, sorry @fafhrd91
I've missed your 54c5423

mpaolini pushed a commit to elastic-coders/aiohttp that referenced this issue Aug 22, 2015
Issues aio-libs#253 and aio-libs#254 implemented a `_conns` key evince logic in the
function that actually **adds** items to `_conns`

Issue aio-libs#406 tweaked this logic even more, making early and eviction
of reusable items in the pool possible.

Here we put the key eviction logic where it belongs: in the method
that **removes** items from the `_conns` pool.
asvetlov pushed a commit that referenced this issue Aug 28, 2015
Issues #253 and #254 implemented a `_conns` key evince logic in the
function that actually **adds** items to `_conns`

Issue #406 tweaked this logic even more, making early and eviction
of reusable items in the pool possible.

Here we put the key eviction logic where it belongs: in the method
that **removes** items from the `_conns` pool.
@lock
Copy link

lock bot commented Oct 29, 2019

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

@lock lock bot added the outdated label Oct 29, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants