Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bare and percent-encoded pluses in path variables become spaces in match_info #1816

Closed
iillyyaa opened this issue Apr 14, 2017 · 9 comments
Closed

Comments

@iillyyaa
Copy link

iillyyaa commented Apr 14, 2017

Long story short

When match info is constructed for variable resource routes, '+' characters are converted to spaces. I do not believe this is the expected behavior.

Furthermore, even '%2B' codes are converted to spaces, limiting workarounds.

Note that similar behavior yarl's path parsing, reported in yarl/issues/59, has been corrected in 88799aec. A similar fix is needed in match_info parsing.

Example

In our use case, the version string matched by the route is a semantic version that may include a plus sign followed by the build name, which runs afoul of the eager plus-to-space conversion.

import aiohttp.web
app = aiohttp.web.Application()
async def handler(request):
    print(request.url)
    print('path: ' + request.url.path)
    print('match: ' + request.match_info['version'])
    # This assertion does not hold IF(F?) version contains a plus sign
    if request.url.path.endswith(request.match_info['version']):
        return aiohttp.web.HTTPAccepted()
    else:
        return aiohttp.web.HTTPExpectationFailed()
app.router.add_get('/resource/{version}', handler)
aiohttp.web.run_app(app, host='127.0.0.1', port=8088)

Suppose the handler is reached via a URL('http://server/resources/1.0.0+build'):

curl http://127.0.0.1:8088/resource/1.0.0+build

# Note: this workaround fails too:
curl http://127.0.0.1:8088/resource/1.0.0%2Bbuild

Expected behavior

Inside the handler, I expect the plus sign to be preserved in the portion of the URL path that has been captured by match info:

request.url.raw_path == '/resources/1.0.0+build'
request.url.path == '/resources/1.0.0+build'
request.match_info['version'] == '1.0.0+build'

Actual behavior

Plus sign is preserved in the parsed path, but replaced by space in the match info:

request.url.raw_path == '/resources/1.0.0+build'
request.url.path == '/resources/1.0.0+build'
request.match_info['version'] == '1.0.0 build'  # UNEXPECTED

Your environment

Xubuntu 16.04, python3.5, yarl==0.10.0, aiohttp==1.2.0 (but confirmed with aiohttp==2.0.7)

Code inspection / suggested fix

The implementation of DynamicResource._match in aiohttp/web_urldispatcher.py is causing the problem.

return {key: unquote(value) for key, value in

    def _match(self, path):
        match = self._pattern.fullmatch(path)
        if match is None:
            return None
        else:
            return {key: unquote(value) for key, value in
                    match.groupdict().items()}

Similarly to 88799aec in yarl, if the unquote(value) is replaced with unquote(value, unsafe='+'), then the issue I demonstrated above is fixed. I am unclear on whether '+:' should be used here as well. Nor am I confident that the same fix is needed in StaticResource (I believe it is) or other resources.

If someone more familiar with the code validates the approach and helps determine which resources need this, I am more than happy to compose and file a PR.

@fafhrd91
Copy link
Member

I am not sure if unsafe="+" is right solution for url encoding. but '%2B' should be fixed.

@fafhrd91
Copy link
Member

@kxepal ^^^

@fafhrd91
Copy link
Member

I think current behavior is right. we can add some settings, but this would amount of parameters we already have

@iillyyaa
Copy link
Author

@fafhrd91

Nikolay, thank you for responding.

The issue is that with the current behavior there is no way to distinguish between pluses and spaces in any of the matches. The path matching behavior has been provided as a helpful tool; but this would limit its use to only those cases where plusses may not occur.

I believe keeping both '+' and '%2B' as '+' in the match would be ideal, because this behavior would be stable and least surprising, and it would make the behavior identical regardless of whether or not the aiohttp server is sitting behind a proxy (that may do the mapping of '%2B' to '+' before forwarding the request.)

If you insist on unescaping '+' to space, at the very least, I would then expect '%2B' to be unescaped to '+'. However, this would be harder to maintain, as it is requires that escape/unescape (or quote/unquote, as it's called in the code) be called only a fixed number of times.

I agree that adding new settings is rarely the right solution.

Sincerely, Ilya

@fafhrd91
Copy link
Member

you are right.

@fafhrd91
Copy link
Member

added "unsafe='+'"

@iillyyaa
Copy link
Author

@fafhrd91 : Thank you! 👍

@BolshakovNO
Copy link

This problem comes again in >=3.5.4

@asvetlov
Copy link
Member

Perhaps you use slow pure-python yarl build; it has a bug with different behavior for Cython optimized version (correct) and Pure Python (incorrect).
The yarl master is fixed but not released yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants