Potential memory leak in aiohttp server #4478

mvalkon · 2019-12-31T11:47:20Z

Long story short

I have a small HTTP API written with aiohttp as the backend. I am seeing a constantly growing memory footprint which results in segmentation faults in production. The segfaults seem to occur somewhat randomly. I am not able to get the core dumps at the moment for analysis, so I have resorted to debugging this locally. I do not have conclusive proof, but I am looking for some pointers on where to go from here.

The API makes two external calls per request; one to external API using aiottp.ClientSession and another to DynamoDB to fetch data. In both cases we maintain a separate session for the lifetime of the application.

Expected behaviour

Stable memory consumption

Actual behaviour

Growing memory footprint at low rps. The following graph and tracemalloc data is from a short test where the API was run locally and traffic was generated at roughly 45 requests per second.

Tracemalloc snapshot comparison between test start and finish points to RequestHandler.data_received()-method and TCPConnector._wrap_create_connection()-method having the largest increase in memory.

aiohttp/web_protocol.py:275: size=86.3 MiB (+81.0 MiB), count=316057 (+295361), average=286 B
aiohttp/connector.py:936: size=13.1 MiB (+2577 KiB), count=765 (+109), average=17.6 KiB
basictracer/text_propagator.py:30: size=2454 KiB (+2306 KiB), count=44869 (+42176), average=56 B
traceback.py:357: size=797 KiB (+207 KiB), count=9108 (+2303), average=90 B
thrift/transport/THttpClient.py:153: size=0 B (-183 KiB), count=0 (-1)
lightstep/tracer.py:106: size=287 KiB (+143 KiB), count=1296 (+670), average=227 B
traceback.py:285: size=385 KiB (+141 KiB), count=4552 (+1678), average=87 B
lightstep/thrift_converter.py:61: size=40.8 KiB (-136 KiB), count=713 (-2378), average=59 B
lightstep/util.py:37: size=37.7 KiB (-126 KiB), count=594 (-1984), average=65 B
json/decoder.py:353: size=5208 KiB (+91.4 KiB), count=51545 (+1052), average=103 B

Ojbgraph points to a large number of CIMultiDict-objects in memory and the following object graph can be generated (not sure how helpful this is)

Additionally I am seeing errors reported in #3535 when traffic generation stops.

<uvloop.loop.SSLProtocol object at 0x108602bd0>: Fatal error on transport
Traceback (most recent call last):
  File "uvloop/sslproto.pyx", line 571, in uvloop.loop.SSLProtocol._do_shutdown
  File "/usr/local/opt/pyenv/versions/3.7.5/lib/python3.7/ssl.py", line 778, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2629)

Any pointers on where to go from here for further debugging would be much appreciated.

Steps to reproduce

Unfortunately none at the moment. I will try to isolate a reproduceable snippet.

Your environment

The memory consumption is increased on both os x (my laptop) and on an ubuntu based docker image running on Kubernetes.

aiohttp version is 3.6.2 with uvloop on python 3.7.5, server and client.

The text was updated successfully, but these errors were encountered:

webknjaz · 2019-12-31T22:22:08Z

Try upgrading the multidict package. There's been a huge refactoring with a number of subsequent fixes and patch releases.
You could play with different versions and see if pre-rewrite don't have leaks.

mvalkon · 2020-01-02T11:57:07Z

Thanks for the tip @webknjaz. Upgrading multidict to version 4.7.3 (was 4.7.1) changes the memory profile a lot, but does not fix the leak.

Using objgraph.show_most_common_types() I can still see a growing number of CIMultiDict-objects (this is from a shorter test run again)

(Pdb) objgraph.show_most_common_types()
function          18050
dict              15242
CIMultiDict       14775
_KeysView         14462
tuple             10103
OrderedDict       9816
list              6121
FrameSummary      5029
weakref           4935
getset_descriptor 2881

Getting the most leaking objects

roots = objgraph.get_leaking_objects()
(Pdb) objgraph.show_most_common_types(objects=roots)
_KeysView  14462
dict       1309
set        241
tuple      33
list       11
SignalDict 8
weakref    5
method     5
slice      2
CTypeDescr 2

webknjaz · 2020-01-02T12:01:43Z

How about downgrading?

asvetlov · 2020-01-02T16:59:59Z

After testing multidict in different scenarios I was unable to detect any memory leak; everything is returned back to the allocator.

Sorry, I cannot perform analyzing without a leaking code.

gjcarneiro · 2020-01-06T15:25:56Z

I have not seen this memory leak either. Likely the fault is in the application code, rather than aiohttp itself. Is your app code storing requests somewhere, by any chance?...

You should try to create a minimal example that reproduces the leak, and post it.

mvalkon · 2020-01-07T09:47:13Z

@asvetlov @gjcarneiro @webknjaz thanks for looking into this and sorry for not being able to provide an isolated example.

I have managed to isolate this issue to a middleware function which creates an opentracing-span for every incoming request. I am not sure why this causes a memory leak but I think that it has nothing to do with aiohttp, so I think this issue can be closed.

The middleware in question does this, and I don't see anything obvious here that leaks memory. Perhaps something in the vendor implementation causes it, will debug further.

@web.middleware
async def opentracing_middleware(request: web.Request, handler: Callable):
    """ Tracing middleware function which is applied for all handlers. Extracts a
    span context from the request and creates a new span using the context as the
    parent. If there is no context, starts a new span without a reference. """

    # Avoid polluting the traces by ignoring the health endpoint.
    if request.rel_url.path == "/health":
        return await handler(request)

    try:
        span_context = opentracing.tracer.extract(
            format=Format.HTTP_HEADERS, carrier=request.headers
        )
    except (
        opentracing.InvalidCarrierException,
        opentracing.SpanContextCorruptedException,
    ):
        span_context = None

    with opentracing.tracer.start_active_span(
        child_of=span_context,
        operation_name=request.match_info.handler.__name__,
        finish_on_close=True,
        tags=default_server_tags(request),
    ) as scope:  # noqa
        return await handler(request)

webknjaz added bug server labels Dec 31, 2019

webknjaz assigned asvetlov and gyermolenko Jan 2, 2020

mvalkon closed this as completed Jan 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential memory leak in aiohttp server #4478

Potential memory leak in aiohttp server #4478

mvalkon commented Dec 31, 2019 •

edited

Loading

webknjaz commented Dec 31, 2019

mvalkon commented Jan 2, 2020

webknjaz commented Jan 2, 2020

asvetlov commented Jan 2, 2020

gjcarneiro commented Jan 6, 2020

mvalkon commented Jan 7, 2020

Potential memory leak in aiohttp server #4478

Potential memory leak in aiohttp server #4478

Comments

mvalkon commented Dec 31, 2019 • edited Loading

Long story short

Expected behaviour

Actual behaviour

Steps to reproduce

Your environment

webknjaz commented Dec 31, 2019

mvalkon commented Jan 2, 2020

webknjaz commented Jan 2, 2020

asvetlov commented Jan 2, 2020

gjcarneiro commented Jan 6, 2020

mvalkon commented Jan 7, 2020

mvalkon commented Dec 31, 2019 •

edited

Loading