Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leading slash in uri followed by column fails #6643

Closed
pha6d opened this issue Feb 21, 2024 · 5 comments · Fixed by #6644
Closed

Leading slash in uri followed by column fails #6643

pha6d opened this issue Feb 21, 2024 · 5 comments · Fixed by #6644

Comments

@pha6d
Copy link

pha6d commented Feb 21, 2024

Leading slash in uri followed by column fails.

Expected Result

requests.get('http://127.0.0.1:10000//v:h')
<Response [200]>

Actual Result

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/url.py", line 425, in parse_url
    host, port = _HOST_PORT_RE.match(host_port).groups()  # type: ignore[union-attr]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'groups'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.11/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py", line 711, in urlopen
    parsed_url = parse_url(url)
                 ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/urllib3/util/url.py", line 451, in parse_url
    raise LocationParseError(source_url) from e
urllib3.exceptions.LocationParseError: Failed to parse: //v:h

Reproduction Steps

import requests
requests.get('http://127.0.0.1:10000//v:h')

System Information

$ python -m requests.help
{
  "chardet": {
    "version": null
  },
  "charset_normalizer": {
    "version": "3.3.2"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.6"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.11.8"
  },
  "platform": {
    "release": "5.10.209-198.812.amzn2.x86_64",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.31.0"
  },
  "system_ssl": {
    "version": "300000b0"
  },
  "urllib3": {
    "version": "2.2.1"
  },
  "using_charset_normalizer": true,
  "using_pyopenssl": false
}
@sigmavirus24
Copy link
Contributor

I can reproduce this with

import urllib3

urllib3.PoolManager().urlopen(method="GET", url="http://127.0.0.1:10000//v:h")

cc @sethmlarson @pquentin

@sigmavirus24
Copy link
Contributor

Ah I see the problem, both the PoolManager and requests are sending the path (//v:h) to urlopen: https://github.com/urllib3/urllib3/blob/d4ffa29ee1862b3d1afe584efb57d489a7659dac/src/urllib3/poolmanager.py#L444

and urlopen is now probably taking on too many responsibilities: https://github.com/urllib3/urllib3/blob/d4ffa29ee1862b3d1afe584efb57d489a7659dac/src/urllib3/connectionpool.py#L711-L712

@sigmavirus24
Copy link
Contributor

Also, yes, I verified that RFC3986 allows : as a non-percent-encoded character in the path: https://datatracker.ietf.org/doc/html/rfc3986.html#section-3.3

@sigmavirus24
Copy link
Contributor

And I think the problem is the // in the path which is tripping up the parsing as // is the delimiter for what should be host and port after that. So in reality nothing is wrong here. I wonder if we need some kind of pre-parsing of the URL before sending it to urlopen to trim this down. Something like re.sub('^/+', '/') would likely fix this in both Requests and urllib3.

sigmavirus24 added a commit to sigmavirus24/requests that referenced this issue Feb 22, 2024
A URL with excess leading / (path-separator)s would cause urllib3 to
attempt to reparse the request-uri as a full URI with a host and port.
This bypasses that logic in ConnectionPool.urlopen by replacing these
leading /s with just a single /.

Closes psf#6643
sigmavirus24 added a commit to sigmavirus24/requests that referenced this issue Feb 22, 2024
A URL with excess leading / (path-separator)s would cause urllib3 to
attempt to reparse the request-uri as a full URI with a host and port.
This bypasses that logic in ConnectionPool.urlopen by replacing these
leading /s with just a single /.

Closes psf#6643
@sigmavirus24
Copy link
Contributor

Also reported this up to urllib3 urllib3/urllib3#3352

sigmavirus24 added a commit to sigmavirus24/requests that referenced this issue Feb 23, 2024
A URL with excess leading / (path-separator)s would cause urllib3 to
attempt to reparse the request-uri as a full URI with a host and port.
This bypasses that logic in ConnectionPool.urlopen by replacing these
leading /s with just a single /.

Closes psf#6643
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants