Multipart reader fails on Transfer-Encoding: chunked #8643

doublex · 2024-08-08T09:45:56Z

Describe the bug

The multipart reader will not parse if Transfer-Encoding is chunked.
The reader merges two different parts into one larger multipart.

This request-dump is also required (125 MB; github limits to 25 MB)
http://doppelbauer.name/request-data.txt

To Reproduce

The test case is a bit clumsy, but reproduces the problem.

import asyncio, base64, hashlib
from aiohttp.multipart import MultipartReader

headers = {
    'Content-Type': 'multipart/form-data; boundary=f9nvedV9b8roNRgHQEYH75IQSkw6UCOJsJSIzCXWhVAXCwPSMrs73NjPQTH5rKARGsWAFa',
}

# http://doppelbauer.name/request-data.txt
fp = open('request-data.txt')
class FakeReader:
    async def read(self, n=-1):
        kind, data = fp.readline().split('\t')
        assert kind == 'read'
        return base64.b64decode( data.encode() )

    async def readline(self, separator=b"\n"):
        kind, data = fp.readline().split('\t')
        assert kind == 'readuntil'
        return base64.b64decode( data.encode() )

    def at_eof(self):
        return False

    def unread_data(self, data):
        pass

async def test_multipart():
    multipart = MultipartReader( headers, FakeReader() )
    part_index = 0
    async for part in multipart:
        part_index += 1
        body = b''
        while True:
            chunk = await part.read_chunk()
            if not chunk:
                break
            body += chunk
        blake2 = hashlib.blake2b( body ).hexdigest()

        if int(part.headers['Content-Length']) == len(body) and part.headers['Content-BLAKE2'] == blake2:
            continue

        # Huh!? Boundary within body
        assert b'f9nvedV9b8roNRgHQEYH75IQSkw6UCOJsJSIzCXWhVAXCwPSMrs73NjPQTH5rKARGsWAFa' in body
        print( 'Multipart', '#' + str(part_index), part.filename )
        print( 'Size invalid:', len(body), 'instead of:', int(part.headers['Content-Length']) )
        print( 'Hash:', blake2, part.headers['Content-BLAKE2'],  )
        print()

loop = asyncio.get_event_loop()
asyncio.ensure_future(test_multipart())
loop.run_forever()
loop.close()

Expected behavior

Correctly parse multipart

Logs/tracebacks

Python Version

python 3.12

aiohttp Version

Version: 3.10.1

multidict Version

yarl Version

OS

Linux

Related component

Server

Additional context

No response

Code of Conduct

I agree to follow the aio-libs Code of Conduct

The text was updated successfully, but these errors were encountered:

Dreamsorcerer · 2024-08-08T15:45:17Z

Sorry, what is the expected output here?

From your code, it looks like you are comparing Content-Length headers to the length of the part, which is nothing to do with multipart. If you expect these to match, the problem is with the creation (or actually, they shouldn't have been included at all), as multipart does not use Content-Length for anything.
https://datatracker.ietf.org/doc/html/rfc7578#section-4.8

I also don't see how that relates to chunked transfer encoding, so maybe I'm misunderstanding the problem.

doublex · 2024-08-08T16:07:46Z

Sometimes the multipart reader skips a multipart-boundary.
In that case the yielded multipart-chunk contains 2 chunks.
This only happens if the network-packages have a certain size (hence this awkward test case).

This assert should never be triggered:

assert b'f9nvedV9b8roNRgHQEYH75IQSkw6UCOJsJSIzCXWhVAXCwPSMrs73NjPQTH5rKARGsWAFa' in body

Dreamsorcerer · 2024-08-08T16:10:00Z

The assert doesn't trigger when I run it? I just get the prints underneath.

doublex · 2024-08-08T16:15:15Z

Sorry, the assert should trigger: the boundary must separate chunks.
But this does not always happen (rare case)

I have been tracking this error for about a year - until I managed to build a test case.

Dreamsorcerer · 2024-08-08T16:17:39Z

Right, I'm with you. Let me see if I can figure out anything further. It's a bit awkward with such large data to isolate the problem..

doublex · 2024-08-08T16:19:36Z

It only happens rarely, maybe once a week.
Strangely enough, five times in this test case.

Dreamsorcerer · 2024-08-08T16:39:11Z

Think it's because the boundary is split across chunks:

...ed\xcf\xc1\x05\xf8\xb9\xebD\xac\x0e\x12\xfan\x7f\xaeA\x1c\xbb\\5\xbf\'d \xdf:\xb0\x12\xf1+\\\xb0\x04\x95\x15?\x16\x98\xae\x01\x00\x00\x00\r\n--f9nvedV9b8roN'
b'RgHQEYH75I'
b'QSkw6UCOJsJSIzCXWhVAXCwPSMrs73NjPQTH5rKARGsWAFa\r\nContent-Disposition: form-data; name="elozrpcf9f6aybxsl3ll7plwc2j22b0ky01ja0n.webp"; filename="elozrpcf9f6aybxsl3ll7plwc2j22b0ky01ja0n.webp"\r\nContent-Type: image/webp\r\nContent-Length: 582794\r\nContent-BLAKE2: 2db3da...

Dreamsorcerer · 2024-08-08T17:02:13Z

OK, I think it's only when split across 3+ chunks. Problem is that little chunk in the middle. The chunk size is asserted to be larger than the boundary to protect against this, but I'm guessing there's an edge case where reading from the content stream gives a smaller size than the requested chunk size...

Dreamsorcerer · 2024-08-08T17:18:37Z

Yeah, it doesn't account for StreamReader.read() returning data smaller than n, which it may do if it already has data in the buffer.

doublex added the bug label Aug 8, 2024

Dreamsorcerer added the needs-info Issue is lacking sufficient information and will be closed if not provided label Aug 8, 2024

Dreamsorcerer mentioned this issue Aug 8, 2024

Fix multipart reading with split boundary #8657

Merged

Dreamsorcerer closed this as completed in #8657 Aug 8, 2024

This was referenced Aug 8, 2024

[PR #8657/6c6ecfaf backport][3.10] Fix multipart reading with split boundary #8658

Merged

[PR #8657/6c6ecfaf backport][3.11] Fix multipart reading with split boundary #8659

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multipart reader fails on Transfer-Encoding: chunked #8643

Multipart reader fails on Transfer-Encoding: chunked #8643

doublex commented Aug 8, 2024 •

edited

Loading

Dreamsorcerer commented Aug 8, 2024 •

edited

Loading

doublex commented Aug 8, 2024 •

edited

Loading

Dreamsorcerer commented Aug 8, 2024

doublex commented Aug 8, 2024 •

edited

Loading

Dreamsorcerer commented Aug 8, 2024

doublex commented Aug 8, 2024

Dreamsorcerer commented Aug 8, 2024

Dreamsorcerer commented Aug 8, 2024

Dreamsorcerer commented Aug 8, 2024

Multipart reader fails on Transfer-Encoding: chunked #8643

Multipart reader fails on Transfer-Encoding: chunked #8643

Comments

doublex commented Aug 8, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Logs/tracebacks

Python Version

aiohttp Version

multidict Version

yarl Version

OS

Related component

Additional context

Code of Conduct

Dreamsorcerer commented Aug 8, 2024 • edited Loading

doublex commented Aug 8, 2024 • edited Loading

Dreamsorcerer commented Aug 8, 2024

doublex commented Aug 8, 2024 • edited Loading

Dreamsorcerer commented Aug 8, 2024

doublex commented Aug 8, 2024

Dreamsorcerer commented Aug 8, 2024

Dreamsorcerer commented Aug 8, 2024

Dreamsorcerer commented Aug 8, 2024

doublex commented Aug 8, 2024 •

edited

Loading

Dreamsorcerer commented Aug 8, 2024 •

edited

Loading

doublex commented Aug 8, 2024 •

edited

Loading

doublex commented Aug 8, 2024 •

edited

Loading