Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect quoting path part of URI. #495

Closed
tarhan opened this issue Sep 5, 2015 · 3 comments
Closed

Incorrect quoting path part of URI. #495

tarhan opened this issue Sep 5, 2015 · 3 comments
Labels
Milestone

Comments

@tarhan
Copy link

tarhan commented Sep 5, 2015

I think there is error with constructing target url in request object.
There are lines 177-178 within aiohttp/client_reqrep.py (method update_path in ClientRequest):

self.path = urllib.parse.urlunsplit(
            ('', '', urllib.parse.quote(path, safe='/%:'), query, fragment))

It says to quote path part with safe characters not needed quoting forward slash, percent sign and colon.
But according to RFC3986 section 2.3 there are unreserved characters that does not need escaping:

unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

Python's urllib.parse.quote library does not recognize tilde sign. So it must be within safe keyword argument.
Also according of section 2.2 of same RFC there reserved characters which used as delimers or subdelimers of path sub parts. As request method receives either full URL and URL without query substring you can not if any of delimers is delimer or part of path subcomponent.
So it must be library uses's responsibility to quote or exclude reserved characters from path sub component.
Reserved characters (RFC3986 section 2.3):

reserved    = gen-delims / sub-delims

gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

sub-delims  = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

Again urllib.parse.quote does not quote only forward slash. Since it is so many reserved characters quoted so I think it is mistake from your side to escape UTF-8 characters using urllib.parse.quote method.

Where such behavior is problem?. All Akamai CDNs heavy uses comma as delimer within path part of URLs. So there no way to forbid your library to stop quoting commas within path part of URL.
Example Akamai's URL:

http://o2-f.akamaihd.net/z/onthewings/20150903/20150903-onthewings-hd-,150000,300000,500000,800000,1000000,1300000,1500000,2500000,.mp4.csmil

Your library quotes commas and Akamai CDN response with 404 for such quoted URL.

My suggestions are either include all reserved & unreserved characters as safe keyword argument or provide additional keyword argument to block quoting of path part of URI in request method.

@asvetlov
Copy link
Member

asvetlov commented Sep 5, 2015

Couple days ago I've committed new version of url requoting, the code is borrowed from requests library.

Please take a look on https://github.com/KeepSafe/aiohttp/blob/master/aiohttp/helpers.py#L356 and https://github.com/KeepSafe/aiohttp/blob/master/aiohttp/client_reqrep.py#L183

Is the fix satisfies your requirements?

@asvetlov asvetlov added this to the 0.18 milestone Sep 5, 2015
@tarhan
Copy link
Author

tarhan commented Sep 5, 2015

Thanks.
For time being I will use your git master version.

@tarhan tarhan closed this as completed Sep 5, 2015
@lock
Copy link

lock bot commented Oct 29, 2019

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

@lock lock bot added the outdated label Oct 29, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants