Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.parse.urlunsplit makes relative path to absolute (http:g -> http:///g) #85110

Closed
openandclose mannequin opened this issue Jun 10, 2020 · 8 comments
Closed

urllib.parse.urlunsplit makes relative path to absolute (http:g -> http:///g) #85110

openandclose mannequin opened this issue Jun 10, 2020 · 8 comments
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@openandclose
Copy link
Mannequin

openandclose mannequin commented Jun 10, 2020

BPO 40938
Nosy @orsenthil, @openandclose, @jaswdr

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2020-06-10.11:18:48.328>
labels = ['3.7', '3.8', 'type-bug', 'library', '3.9']
title = 'urllib.parse.urlunsplit makes relative path to absolute (http:g -> http:///g)'
updated_at = <Date 2021-05-13.14:11:56.189>
user = 'https://github.com/openandclose'

bugs.python.org fields:

activity = <Date 2021-05-13.14:11:56.189>
actor = 'op368'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2020-06-10.11:18:48.328>
creator = 'op368'
dependencies = []
files = []
hgrepos = []
issue_num = 40938
keywords = []
message_count = 7.0
messages = ['371179', '393544', '393574', '393576', '393577', '393578', '393583']
nosy_count = 3.0
nosy_names = ['orsenthil', 'op368', 'jaswdr']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue40938'
versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

Linked PRs

@openandclose
Copy link
Mannequin Author

openandclose mannequin commented Jun 10, 2020

path 'g' in 'http:g' becomes '/g'.

    >>> urlsplit('http:g')
    SplitResult(scheme='http', netloc='', path='g', query='', fragment='')
    >>> urlunsplit(urlsplit('http:g'))
    'http:///g'
    >>> urlsplit('http:///g')
    SplitResult(scheme='http', netloc='', path='/g', query='', fragment='')

    >>> urljoin('http://a/b/c/d', 'http:g')
    'http://a/b/c/g'
    >>> urljoin('http://a/b/c/d', 'http:///g')
    'http://a/g'

The problematic part of the code is:

    def urlunsplit(components):
        [...]
        if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'):
--->        if url and url[:1] != '/': url = '/' + url
            url = '//' + (netloc or '') + url

Note also that urllib has decided on the interpretation of 'http:g' (in test).

    def test_RFC3986(self):
        [...]
        #self.checkJoin(RFC3986_BASE, 'http:g','http:g') # strict parser
        self.checkJoin(RFC3986_BASE, 'http:g','http://a/b/c/g') #relaxed parser

@openandclose openandclose mannequin added 3.8 (EOL) end of life type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Jun 10, 2020
@wyz23x2 wyz23x2 mannequin added 3.7 (EOL) end of life 3.9 only security fixes labels Aug 11, 2020
@jaswdr
Copy link
Mannequin

jaswdr mannequin commented May 12, 2021

@op368 I don't think that this is a bug, [1] literally uses this exact example and shows the expected behaviour.

[1] https://datatracker.ietf.org/doc/html/rfc3986#section-5.4.2

@openandclose
Copy link
Mannequin Author

openandclose mannequin commented May 13, 2021

hello, @jaswdr, but I can't understand what's wrong with my point.
What is 'the expected behaviour'?

@jaswdr
Copy link
Mannequin

jaswdr mannequin commented May 13, 2021

@op368 as far as I can see, regarding of any miss interpretation, yes, the RFC has this section:

  "http:g"        =  "http:g"         ; for strict parsers
                  /  "http://a/b/c/g" ; for backward compatibility

What I can understand is that for "http:g" it will be translated to "http:///g" because of backward compatibility, this seems to be an edge case for the parser, since the RFC text also mention that this should be avoided.

@openandclose
Copy link
Mannequin Author

openandclose mannequin commented May 13, 2021

'http:///g' has absolute path '/g',
and as urljoin shows:

    >>> urljoin('http://a/b/c/d', 'http:///g')
    'http://a/g'  # 'a' is netloc

So you are proposing third interpretation.

  "http:g"        =  "http:g"         ; for strict parsers
                  /  "http://a/b/c/g" ; for backward compatibility
                  /  "http://a/g"     ; (yours)

@jaswdr
Copy link
Mannequin

jaswdr mannequin commented May 13, 2021

Not exactly, in the RFC example they use a/b/c for the path, but when using http:g there is no nested path, so it should be http:///g, no?

@openandclose
Copy link
Mannequin Author

openandclose mannequin commented May 13, 2021

I tried hard (even read RFC1630),
but I think no.

@serhiy-storchaka
Copy link
Member

The problem is that urljoin(base, urlunsplit(urlsplit(relurl))) is not equivalent to urljoin(base, relurl). I think that the problem cannot be solved without large breaking changes (like #67041), but this particular case can be mitigated. urlunsplit) can be made preserving non-empty relative path with non-empty scheme and empty netloc.

An exception for empty path was added to not break an existing test for #104139. All other existing tests are passed with this change.

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Aug 20, 2024
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Aug 20, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 21, 2024
…b.parse.urlunsplit() (pythonGH-123179)

(cherry picked from commit 90c892e)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 21, 2024
…b.parse.urlunsplit() (pythonGH-123179)

(cherry picked from commit 90c892e)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
blhsing pushed a commit to blhsing/cpython that referenced this issue Aug 22, 2024
Yhg1s pushed a commit that referenced this issue Sep 2, 2024
…ib.parse.urlunsplit() (GH-123179) (#123187)

gh-85110: Preserve relative path in URL without netloc in urllib.parse.urlunsplit() (GH-123179)
(cherry picked from commit 90c892e)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ambv pushed a commit that referenced this issue Sep 6, 2024
…ib.parse.urlunsplit() (GH-123179) (#123188)

(cherry picked from commit 90c892e)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@ambv ambv closed this as completed Sep 6, 2024
jodal added a commit to jodal/mopidy that referenced this issue Sep 15, 2024
marmarek added a commit to marmarek/salt that referenced this issue Oct 4, 2024
Python 3.13 fixed handling relative paths in urllib.parse module.
Specifically, relative file URL is now constructed as file:path instead
of converting it to absolute file:///path. This breaks
salt.utils.url.create which expects file:/// specifically. The mismatch
results in for example changing salt://top.sls into salt://.sls and thus
not finding the top file.

Fix this by handling both prefixes.

Relevant python change: python/cpython#85110
Fixes: saltstack#66898
marmarek added a commit to marmarek/salt that referenced this issue Oct 4, 2024
Python 3.13 fixed handling relative paths in urllib.parse module.
Specifically, relative file URL is now constructed as file:path instead
of converting it to absolute file:///path. This breaks
salt.utils.url.create which expects file:/// specifically. The mismatch
results in for example changing salt://top.sls into salt://.sls and thus
not finding the top file.

Fix this by handling both prefixes.

Relevant python change: python/cpython#85110
Fixes: saltstack#66898
marmarek added a commit to marmarek/salt that referenced this issue Oct 7, 2024
Python 3.13 fixed handling relative paths in urllib.parse module.
Specifically, relative file URL is now constructed as file:path instead
of converting it to absolute file:///path. This breaks
salt.utils.url.create which expects file:/// specifically. The mismatch
results in for example changing salt://top.sls into salt://.sls and thus
not finding the top file.

Fix this by handling both prefixes.

Relevant python change: python/cpython#85110
Fixes: saltstack#66898
marmarek added a commit to marmarek/salt that referenced this issue Oct 12, 2024
Python 3.13 fixed handling relative paths in urllib.parse module.
Specifically, relative file URL is now constructed as file:path instead
of converting it to absolute file:///path. This breaks
salt.utils.url.create which expects file:/// specifically. The mismatch
results in for example changing salt://top.sls into salt://.sls and thus
not finding the top file.

Fix this by handling both prefixes.

Relevant python change: python/cpython#85110
Fixes: saltstack#66898
marmarek added a commit to marmarek/salt that referenced this issue Oct 12, 2024
Python 3.13 fixed handling relative paths in urllib.parse module.
Specifically, relative file URL is now constructed as file:path instead
of converting it to absolute file:///path. This breaks
salt.utils.url.create which expects file:/// specifically. The mismatch
results in for example changing salt://top.sls into salt://.sls and thus
not finding the top file.

Fix this by handling both prefixes.

Relevant python change: python/cpython#85110
Fixes: saltstack#66898
marmarek added a commit to marmarek/salt that referenced this issue Oct 12, 2024
Python 3.13 fixed handling relative paths in urllib.parse module.
Specifically, relative file URL is now constructed as file:path instead
of converting it to absolute file:///path. This breaks
salt.utils.url.create which expects file:/// specifically. The mismatch
results in for example changing salt://top.sls into salt://.sls and thus
not finding the top file.

Fix this by handling both prefixes.

Relevant python change: python/cpython#85110
Fixes: saltstack#66898
marmarek added a commit to marmarek/salt that referenced this issue Oct 13, 2024
Python 3.13 fixed handling relative paths in urllib.parse module.
Specifically, relative file URL is now constructed as file:path instead
of converting it to absolute file:///path. This breaks
salt.utils.url.create which expects file:/// specifically. The mismatch
results in for example changing salt://top.sls into salt://.sls and thus
not finding the top file.

Fix this by handling both prefixes.

Relevant python change: python/cpython#85110
Fixes: saltstack#66898
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants