Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipart reader fails on Transfer-Encoding: chunked #8643

Closed
1 task done
doublex opened this issue Aug 8, 2024 · 9 comments · Fixed by #8657
Closed
1 task done

Multipart reader fails on Transfer-Encoding: chunked #8643

doublex opened this issue Aug 8, 2024 · 9 comments · Fixed by #8657
Labels
bug needs-info Issue is lacking sufficient information and will be closed if not provided

Comments

@doublex
Copy link

doublex commented Aug 8, 2024

Describe the bug

The multipart reader will not parse if Transfer-Encoding is chunked.
The reader merges two different parts into one larger multipart.

This request-dump is also required (125 MB; github limits to 25 MB)
http://doppelbauer.name/request-data.txt

To Reproduce

The test case is a bit clumsy, but reproduces the problem.

import asyncio, base64, hashlib
from aiohttp.multipart import MultipartReader

headers = {
    'Content-Type': 'multipart/form-data; boundary=f9nvedV9b8roNRgHQEYH75IQSkw6UCOJsJSIzCXWhVAXCwPSMrs73NjPQTH5rKARGsWAFa',
}

# http://doppelbauer.name/request-data.txt
fp = open('request-data.txt')
class FakeReader:
    async def read(self, n=-1):
        kind, data = fp.readline().split('\t')
        assert kind == 'read'
        return base64.b64decode( data.encode() )

    async def readline(self, separator=b"\n"):
        kind, data = fp.readline().split('\t')
        assert kind == 'readuntil'
        return base64.b64decode( data.encode() )

    def at_eof(self):
        return False

    def unread_data(self, data):
        pass

async def test_multipart():
    multipart = MultipartReader( headers, FakeReader() )
    part_index = 0
    async for part in multipart:
        part_index += 1
        body = b''
        while True:
            chunk = await part.read_chunk()
            if not chunk:
                break
            body += chunk
        blake2 = hashlib.blake2b( body ).hexdigest()

        if int(part.headers['Content-Length']) == len(body) and part.headers['Content-BLAKE2'] == blake2:
            continue

        # Huh!? Boundary within body
        assert b'f9nvedV9b8roNRgHQEYH75IQSkw6UCOJsJSIzCXWhVAXCwPSMrs73NjPQTH5rKARGsWAFa' in body
        print( 'Multipart', '#' + str(part_index), part.filename )
        print( 'Size invalid:', len(body), 'instead of:', int(part.headers['Content-Length']) )
        print( 'Hash:', blake2, part.headers['Content-BLAKE2'],  )
        print()

loop = asyncio.get_event_loop()
asyncio.ensure_future(test_multipart())
loop.run_forever()
loop.close()

Expected behavior

Correctly parse multipart

Logs/tracebacks

#

Python Version

python 3.12

aiohttp Version

Version: 3.10.1

multidict Version

$

yarl Version

$

OS

Linux

Related component

Server

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@doublex doublex added the bug label Aug 8, 2024
@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Aug 8, 2024

Sorry, what is the expected output here?

From your code, it looks like you are comparing Content-Length headers to the length of the part, which is nothing to do with multipart. If you expect these to match, the problem is with the creation (or actually, they shouldn't have been included at all), as multipart does not use Content-Length for anything.
https://datatracker.ietf.org/doc/html/rfc7578#section-4.8

I also don't see how that relates to chunked transfer encoding, so maybe I'm misunderstanding the problem.

@Dreamsorcerer Dreamsorcerer added the needs-info Issue is lacking sufficient information and will be closed if not provided label Aug 8, 2024
@doublex
Copy link
Author

doublex commented Aug 8, 2024

Sometimes the multipart reader skips a multipart-boundary.
In that case the yielded multipart-chunk contains 2 chunks.
This only happens if the network-packages have a certain size (hence this awkward test case).

This assert should never be triggered:

assert b'f9nvedV9b8roNRgHQEYH75IQSkw6UCOJsJSIzCXWhVAXCwPSMrs73NjPQTH5rKARGsWAFa' in body

@Dreamsorcerer
Copy link
Member

The assert doesn't trigger when I run it? I just get the prints underneath.

@doublex
Copy link
Author

doublex commented Aug 8, 2024

Sorry, the assert should trigger: the boundary must separate chunks.
But this does not always happen (rare case)

I have been tracking this error for about a year - until I managed to build a test case.

@Dreamsorcerer
Copy link
Member

Right, I'm with you. Let me see if I can figure out anything further. It's a bit awkward with such large data to isolate the problem..

@doublex
Copy link
Author

doublex commented Aug 8, 2024

It only happens rarely, maybe once a week.
Strangely enough, five times in this test case.

@Dreamsorcerer
Copy link
Member

Think it's because the boundary is split across chunks:

...ed\xcf\xc1\x05\xf8\xb9\xebD\xac\x0e\x12\xfan\x7f\xaeA\x1c\xbb\\5\xbf\'d \xdf:\xb0\x12\xf1+\\\xb0\x04\x95\x15?\x16\x98\xae\x01\x00\x00\x00\r\n--f9nvedV9b8roN'
b'RgHQEYH75I'
b'QSkw6UCOJsJSIzCXWhVAXCwPSMrs73NjPQTH5rKARGsWAFa\r\nContent-Disposition: form-data; name="elozrpcf9f6aybxsl3ll7plwc2j22b0ky01ja0n.webp"; filename="elozrpcf9f6aybxsl3ll7plwc2j22b0ky01ja0n.webp"\r\nContent-Type: image/webp\r\nContent-Length: 582794\r\nContent-BLAKE2: 2db3da...

@Dreamsorcerer
Copy link
Member

OK, I think it's only when split across 3+ chunks. Problem is that little chunk in the middle. The chunk size is asserted to be larger than the boundary to protect against this, but I'm guessing there's an edge case where reading from the content stream gives a smaller size than the requested chunk size...

@Dreamsorcerer
Copy link
Member

Yeah, it doesn't account for StreamReader.read() returning data smaller than n, which it may do if it already has data in the buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs-info Issue is lacking sufficient information and will be closed if not provided
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants