-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parsing of URL requirements with git+file scheme #264
Conversation
Gentle nudge. See also pypa/pip#7650 (comment) for context. I believe the test error is a transient. |
On a quick glance the test error seems relevant? The failing test is the one you've added in this PR, though not sure why it's only failing for PyPy. |
@di thanks for looking into this PR. Right, transient is probably not the right word to qualify the test failure. Indeed the failing test is one that I added and I can reproduce by running |
@sbidoul Consider adding |
079fd6c
to
8f124fc
Compare
Okay this was actually fixed in #267, so I rebased and it's green now. |
It seems like the actual issue here is the difference in behavior between how >>> urlparse.urlparse('git+file:///data/repo')
ParseResult(scheme='git+file', netloc='', path='/data/repo', params='', query='', fragment='')
>>> urlparse.urlparse('git+file://localhost/data/repo')
ParseResult(scheme='git+file', netloc='localhost', path='/data/repo', params='', query='', fragment='') And how PEP 440 interprets an omitted
Which means that using
I'm not sure what the right thing to do here is. Should we always treat an omitted Should we just consider it Or should we update the PEP to just not make this assumption? |
I think it'd be useful to figure out why the PEP makes this assumption. :) |
@di @pradyunsg I think there are two separate questions here:
Question 1 PEP440's assumption that an omitted host ( RFC 8089 says this:
One subtle issue w/ PEP440's phrasing is that is says omitted is assumed to be localhost, when it might be more precise to say that "omitted and localhost are assumed to be equivalent" when talking about file URIs. Question 2 I would argue that urllib's behavior is correct here. Furthermore, absent a file URI specific parser, I would think that it is the responsibility of the consumer of the urls (in this case Note 1 - RFC8089 was first drafted in 2015 and finalized in 2017 so both PEP440 and PEP508 predate this RFC. However, RFC8089 is backwards compatible and the assumption about empty hostnames is sound going back to RFC1738. Note 2 - RFC 8089 Appendix A lists the differences between 8089 and previous specs and Appendix B lists some examples. Cross-linking #120 since it's related. |
Nice analysis @sgg! I think I agree. Looks like we could fix this just with: if parsed_url.scheme == "file":
if urlparse.urlunparse(parsed_url) != req.url:
raise InvalidRequirement("Invalid URL given")
- elif not (parsed_url.scheme and parsed_url.netloc) or (
- not parsed_url.scheme and not parsed_url.netloc
- ):
+ elif not parsed_url.scheme:
raise InvalidRequirement("Invalid URL: {0}".format(req.url)) However this causes this test to fail: packaging/tests/test_requirements.py Lines 105 to 109 in 61672bf
And based on my understanding, this URL is just as valid here, since there's no restriction on what schemes are valid:
So we should probably remove that test as well, do you agree? |
@di Acknowledging that there are some known issues/bugs w/ If the intent is for |
I also agree any valid URL should be accepted. I updated the PR according to @di's suggestion above. |
@di good point, I added a test with a couple of exotic URLs. I also added more tests for absolute and relative paths... and I found a potential issue: it now accepts |
Also, it seems the PEP 508 grammar does allow paths without scheme?
The more I look into this the more confused I get.. |
self._assert_requirement(req, "name", "git+file:///data/repo") | ||
|
||
@pytest.mark.parametrize( | ||
"url", ["gopher:/foo/com", "mailto:me@example.com", "c:/foo/bar"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that here, c
is an URL scheme, not a drive letter.
Rebased and squashed. |
I’m wondering, what is a relative ref supposed to mean? In normal contexts, it means using the current entity’s scheme and replace the domain and path etc., but where should such a URL in a requirement resolve against? It’s likely possible to come up with a reasonable answer for every possible way to install a distribution (wheel, sdist, local VCS, remote VCS, local VCS clone from a remote, an on-disk source tree, etc.), but there would be way too much subtlty I doubt the end result would be useful for many, and can easily see it be a minefield that every tutorial on the internet gets wrong and causes endless pain for users and packaging tools alike. TBH I’d be much happier if we could just pretend the relative ref thing doesn’t exist until someone actually comes up with a coherent plan (and probably an Informational PEP) on how it can be used. |
I don't remember the rationale per se, but my assumption would be that we were just ensuring that file URLs were processed the same way browsers, curl, etc process them: three forward slashes implies "localhost" for "file" URLs, since it's an intrinsically local protocol. |
For the relative reference support in the grammar, re-reading PEP 508 suggests to me that that is a setuptools feature that inadvertently slipped into the spec when the grammar was derived from the pkg_resources one. PEP 440 doesn't allow relative URLs, and PEP 508 doesn't define semantics for them either. |
Can we say this PR is an improvement it itself, and refinements to URI parsing / validation could be deferred ? |
I think we can say that. :) |
I'm only willing to consider this is a net improvement if the "exotic URLs" continue to be disallowed. Otherwise we are making the parsing logic from being too restrictive to too permissive, which is at best a wash (and arguably worse because it'd be a pain to remove support for those later on if people start to depend on those "features"). |
What do we want to do with this PR? There's no attached issue and I'm not enough of an expert when it comes to the URL schemes to know what to do. |
In #120 (comment) @uranusjr and @brettcannon seemed to be happy with relaxing URL validation in In its current state, this PR, which I just rebased,
This may let invalid URls that were caught before go through in downstream tools but I personally don't think this can create much more trouble than harder to understand downstream errors when passed exotic URLs. |
Closing in favor of #684 |
Issue this PR fixes:
Requirement("name @ git+file:///data/repo")
fails when it should succeed, much likeRequirement("name @ git+https://g.c/u/r.git")
.Such requirements are allowed by PEP 440.