Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDPTransport.sendto spends a lot of time validating the address #214

Closed
jlaine opened this issue Jan 8, 2019 · 8 comments
Closed

UDPTransport.sendto spends a lot of time validating the address #214

jlaine opened this issue Jan 8, 2019 · 8 comments

Comments

@jlaine
Copy link
Contributor

jlaine commented Jan 8, 2019

  • uvloop version: 0.11.3 and 0.12.0rc2
  • Python version: 3.7
  • Platform: Linux

I have been trying to optimize the performance of aiortc, an asyncio-based WebRTC implementation. aiortc uses aioice (an Interactive Connectivity Establishment library) for its network access. Using uvloop provides some performance improvement, but not as much as I had hoped.

During my optimization work I noticed that when called with an address, UDPTransport.sendto spends a noticeable amount of time calling into Python code, looking up the socket's family and type properties. Looking at the code, this corresponds to validation checks on the destination address which are run on every single packet (which involves a call to getaddrinfo whose result is ultimately discarded):

if addr is not None and self.sock.family != socket.AF_UNIX:

I am aware that specifying a remote address when calling create_datagram_endpoint would probably work around this issue, but in the case of ICE this is not practical as the remote party may have multiple addresses, which are not known in advance and may change over time.

To assess the impact of the validation code I tried a basic patch which stores the last validated address and skips the validation if sendto is called for the same address again. I then ran the following test:

import asyncio
import socket
import time

import uvloop


class DummyProtocol(asyncio.DatagramProtocol):
    pass


async def run(loop):
    addr = ('127.0.0.1', 1234)
    data = b'M' * 1500
    transport, protocol = await loop.create_datagram_endpoint(
        lambda: DummyProtocol(),
        family=socket.AF_INET)

    start = time.process_time()
    for i in range(1000000):
        transport.sendto(data, addr)
    print('elapsed', time.process_time() - start)


asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
loop = asyncio.get_event_loop()
loop.run_until_complete(run(loop))

On my laptop the results of sending out a million packets:

  • without the patch: 7.8s
  • with the patch: 3.0s

Questions:

  • do we need the validation code at all?
  • if we do, would you consider a patch which caches the last successfully validated address and skips the validation if sendto is called again with the same address?
  • alternatively, could we at the very least store the socket type and family to avoid the call into Python to fetch these two properties?
@jlaine
Copy link
Contributor Author

jlaine commented Jan 8, 2019

The trivial patch I used:

diff --git a/uvloop/handles/udp.pxd b/uvloop/handles/udp.pxd
index 4f03650..25a7e61 100644
--- a/uvloop/handles/udp.pxd
+++ b/uvloop/handles/udp.pxd
@@ -3,6 +3,7 @@ cdef class UDPTransport(UVBaseTransport):
         object sock
         UVPoll poll
         object address
+        object last_address
         object buffer
 
     cdef _init(self, Loop loop, object sock, object r_addr)
diff --git a/uvloop/handles/udp.pyx b/uvloop/handles/udp.pyx
index 81ad198..6ecc489 100644
--- a/uvloop/handles/udp.pyx
+++ b/uvloop/handles/udp.pyx
@@ -6,6 +6,7 @@ cdef class UDPTransport(UVBaseTransport):
         self.sock = None
         self.poll = None
         self.buffer = col_deque()
+        self.last_address = None
         self._has_handle = 0
 
     cdef _init(self, Loop loop, object sock, object r_addr):
@@ -143,7 +144,7 @@ cdef class UDPTransport(UVBaseTransport):
             raise ValueError(
                 'Invalid address: must be None or {}'.format(self.address))
 
-        if addr is not None and self.sock.family != socket.AF_UNIX:
+        if addr not in (None, self.last_address) and self.sock.family != socket.AF_UNIX:
             addrinfo = __static_getaddrinfo_pyaddr(
                 addr[0], addr[1],
                 uv.AF_UNSPEC, self.sock.type, self.sock.proto, 0)
@@ -155,6 +156,7 @@ cdef class UDPTransport(UVBaseTransport):
                 raise ValueError(
                     'UDP.sendto(): {!r} socket family mismatch'.format(
                         addr))
+            self.last_address = addr
 
         if self._conn_lost and self._address:
             if self._conn_lost >= LOG_THRESHOLD_FOR_CONNLOST_WRITES:

@1st1
Copy link
Member

1st1 commented Jan 8, 2019

On my laptop the results of sending out a million packets:

  • without the patch: 7.8s
  • with the patch: 3.0s

Great improvement!

Questions:

  • do we need the validation code at all?

Ideally yes. DNS lookup would be blocking and that would be quite hard to debug.

if we do, would you consider a patch which caches the last successfully validated address and skips the validation if sendto is called again with the same address?

Yes, please submit a PR. You can probably add a function with an @lru_cache decorator to keep track of last N checked addresses.

@jlaine
Copy link
Contributor Author

jlaine commented Jan 8, 2019

I also tried something else: simply storing the socket's family, proto and type as int members of UDPTransport. This doesn't skip the call to getaddrinfo but gives a significant performance gain which holds no matter whether it's an address we have already seen or not. The same test runs in 4.3s with this patch.

Regarding lru_cache wouldn't that mean dropping out of C and back into Python?

@1st1
Copy link
Member

1st1 commented Jan 8, 2019

I also tried something else: simply storing the socket's family, proto and type as int members of UDPTransport. This doesn't skip the call to getaddrinfo but gives a significant performance gain which holds no matter whether it's an address we have already seen or not. The same test runs in 4.3s with this patch.

Caching family/proto/type is OK too.

Regarding lru_cache wouldn't that mean dropping out of C and back into Python?

lru_cache is implemented in C since 3.6 (IIRC), but yeah, it involves quite a few CPython C API calls. You'd probably want to profile it to check if it's fast enough.

jlaine added a commit to jlaine/uvloop that referenced this issue Jan 8, 2019
When UDPTransport.sendto is called with a destination address, only
perform the validation once and store the result in an LRU cache. Also
store some of the socket's properties (family, proto, type) as integers
to avoid repeatedly fetching these properties and converting them to integers.
@jlaine
Copy link
Contributor Author

jlaine commented Jan 8, 2019

It looks like lru_cache performance is really not an issue. With the patch I submitted in #215 my test now runs in 3.1s, so almost identical to the "single address cache" version!

@1st1
Copy link
Member

1st1 commented Jan 8, 2019

Using uvloop provides some performance improvement, but not as much as I had hoped.

Speaking of which -- when libuv 2.0 is released we'll likely be able to switch to native UDP libuv implementation which should be significantly (up to 10x) faster than what we have now.

jlaine added a commit to jlaine/uvloop that referenced this issue Jan 8, 2019
When UDPTransport.sendto is called with a destination address, only
perform the validation once and store the result in an LRU cache. Also
store some of the socket's properties (family, proto, type) as integers
to avoid repeatedly fetching these properties and converting them to integers.
@jlaine
Copy link
Contributor Author

jlaine commented Jan 8, 2019

Speaking of which -- when libuv 2.0 is released we'll likely be able to switch to native UDP libuv implementation which should be significantly (up to 10x) faster than what we have now.

I'm very eager to see this in action, and would be happy to help if I can! It doesn't look as though UDP is getting as much love in the asyncio world as TCP is, but that might change with the likes of HTTP/3.

@1st1
Copy link
Member

1st1 commented Jan 9, 2019

I'm very eager to see this in action, and would be happy to help if I can!

For the UDP part specifically we're blocked until libuv 2.0 lands. Meanwhile feel free to help with other parts of the project! For instance, I still can't find time to implement loop.sendfile() and start_serving—new asyncio 3.7 features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants