Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict #egg= fragments to valid PEP 508 names #11617

Merged
merged 12 commits into from
Dec 28, 2022

Conversation

woodruffw
Copy link
Member

@woodruffw woodruffw commented Nov 22, 2022

For users landing here from the error message: The egg fragment should be a bare PEP 508 project name. Anything else is not guaranteed to work and that is what the error is telling you.

If you have something like:

path/to/package#egg=package_name>=1.0

You can change it to something like:

path/to/package#egg=package_name

Or, even completely drop the #egg fragment in many situations.


This should help reduce user confusion about what can go in a URI's egg fragment.

I haven't added a NEWS entry yet since this is a small correctness change, but if it meets the notability bar I'm happy to add one!

Fixes #11567.

Signed-off-by: William Woodruff william@trailofbits.com

This should help reduce user confusion about what can go in a URI's
egg fragment.

Fixes pypa#11567.

Signed-off-by: William Woodruff <william@trailofbits.com>
pfmoore
pfmoore previously approved these changes Nov 22, 2022
@woodruffw
Copy link
Member Author

Should have run tests locally first 😅

Looks like there are a couple of unit tests that assert that #egg= fragments are more than just bare project names (i.e. they can contain extras like foo[dev]). I can loosen the check here to allow for extras as well, if that sounds reasonable to you @pfmoore.

@pfmoore pfmoore dismissed their stale review November 22, 2022 20:15

Test failures need further investigation.

@pfmoore
Copy link
Member

pfmoore commented Nov 22, 2022

IMO this does need a news fragment, as it's a behaviour change.

I can loosen the check here to allow for extras as well, if that sounds reasonable to you @pfmoore.

Lol, I approved too fast, as well 🙂

Given that the documentation says that only a project name is allowed, here, I'm not sure what those tests are doing. I don't think we shoud just adjust the check to match the tests, I think we should establish properly what is a valid #egg fragment, and what it means, and then adjust the check, the tests and the documentation to reflect that reality.

Sorry, I appreciate that's more work than you'd originally expected...

@woodruffw
Copy link
Member Author

IMO this does need a news fragment, as it's a behaviour change.

Sounds good! I'll add one once we work out exactly what's changing here 🙂

I think we should establish properly what is a valid #egg fragment, and what it means, and then adjust the check, the tests and the documentation to reflect that reality.

Makes sense to me. Thinking about it more, it makes (some) sense to me that the #egg fragment can contain extras. For example, I'd expect this requirement line:

git+https://github.com/foo/bar.git@some-rev#egg=bar[dev]

to install package bar with the dev extra. That being said, I'm a little hazy on the difference between these two lines (if there is one?):

git+https://github.com/foo/bar.git@some-rev#egg=bar[dev]
bar[dev] @ git+https://github.com/foo/bar.git@some-rev

...or worse, those lines and this one:

bar[dev] @ git+https://github.com/foo/bar.git@some-rev#egg=bar[dev]

(pip doesn't seem to parse the egg fragment at all in that last case, meaning that my code here doesn't even get triggered).

This should now be consistent with existing tests (without establishing
that those tests are actually well-specified).

Signed-off-by: William Woodruff <william@trailofbits.com>
@woodruffw
Copy link
Member Author

woodruffw commented Nov 22, 2022

It looks like support for extras in egg fragments has been around for a long time, since at least 2012: #7

Edit: and another example of someone intentionally using extras in the egg fragment, and pip honoring them: https://stackoverflow.com/questions/70772938/multiple-eggs-in-pips-requirements-file

@woodruffw
Copy link
Member Author

Based on the above, I think the right approach here is:

  1. Update the docs to clarify that an egg fragment is a PEP 508 package name, with an optional PEP 508 extras specifier
  2. Keep the existing unit tests as they are (since they correspond to actual uses of pip with reasonable behavior)
  3. Make egg fragment parsing eager rather than lazy (as it currently is), to produce errors on invalid egg fragments even when the foo @ ... syntax is used
  4. Round out the unit tests to handle all of the above

Does this seem reasonable to you @pfmoore? If so, I can do it in this same PR (or break the docs out into a separate one; whatever's easier).

@pfmoore
Copy link
Member

pfmoore commented Nov 22, 2022

OK. I'm still not clear what git+https://github.com/foo/bar.git@some-rev#egg=bar[dev] means. From what you linked to, I assume it's somehow saying "install the package at URL git+https://github.com/foo/bar.git@some-rev, which will create a package called bar, and install the dev extra when installing that package". Is that correct?

If that's the interpretation, then the extra seems to me to not be part of the #egg fragment, but rather an extra appended to the package specification git+https://github.com/foo/bar.git@some-rev#egg=bar. Maybe as a matter of expediency in the code, we carry the extra in the fragment, but logically it doesn't seem to be part of the fragment. Of course, there's then the further question of where an extra should go in vcs+protocol://repo_url/#egg=pkg&subdirectory=pkg_dir. As part of the #egg fragment, or at the end? I suspect the answer is again that logically it should be at the end but in reality it gets attached to the #egg fragment...

Whatever the answer, I think the docs need to be clear on what is going on here, so we need some documentation change. The simplest change would be to just say that the egg fragment allows extras in addition to the project name. Personally, I think that's ducking the issue somewhat, in that we should really document the semantics, not just the syntax, but I'm not going to block this PR just to be that much of a perfectionist 🙂

I don't think I've mentioned it for a while, but I really, really hate extras 🙁

@pfmoore
Copy link
Member

pfmoore commented Nov 22, 2022

Based on the above, I think the right approach here is:

lol, yet again we're both typing at the same time 🙂

I'm not quite sure what you mean in item (3), but you clearly understand what you want to achieve there, so that's fine. And I think your overall approach is correct, yes.

@woodruffw
Copy link
Member Author

Is that correct?

That's my understanding of it, at least 😅 -- I agree that the extra doesn't actually make sense in that position, but it looks like the rest of the requirements handling code assumes that it can be there and respects it (probably for expediency, like you said).

I'm not quite sure what you mean in item (3), but you clearly understand what you want to achieve there, so that's fine. And I think your overall approach is correct, yes.

Sorry for the confusion. What I mean is that this currently (even with my changes) doesn't produce an error:

bar[dev] @ git+https://github.com/foo/bar.git@some-rev#egg=bar[dev]==1.2.3

...since the egg fragment is never actually evaluated in the presence of the bar[dev] @ ... prefix. My suggestion was to make the egg fragment parsing "eager" so that we turn that requirements line into an error, since it isn't well-formed.

This doesn't actually address the semantics of extras in the
egg fragment.

Signed-off-by: William Woodruff <william@trailofbits.com>
This should prevent us from accepting malformed egg fragments
that are shadowed by other parts of the requirement specifier.

Signed-off-by: William Woodruff <william@trailofbits.com>
This exercises our expectation that egg fragments don't include version
specifiers and are evaluated eagerly.

Signed-off-by: William Woodruff <william@trailofbits.com>
@woodruffw
Copy link
Member Author

Okay, all of the behavioral changes should be there. If this looks reasonable I'll do the NEWS entry next 🙂

@pradyunsg
Copy link
Member

pradyunsg commented Nov 24, 2022

Here's an alternative suggestion: Let's warn when we detect a non-identifier #egg, with a deprecation warning for 2 releases (so, aiming for removal in 23.2) and just present that. Starting 23.2, if #egg is non-identifier, we can error out.

It doesn't "solve" the issue "immediately", but it does present the warning to users for a 6 month period allowing them to adapt per our deprecation policy. And, we'll have a fuller fix on the other side of that deprecation period.

(yes, this means also erroring out on foobar[extra] in the future)

@woodruffw
Copy link
Member Author

woodruffw commented Nov 24, 2022

That sounds pretty good to me -- IIUC, nobody should ever need to use extras in an egg fragment: foo[extra] @ ... should always be strictly equivalent, and is the suggested form anyways.

In that case, I can rewrite this PR to spit out a warning instead, but leave the docs tweaked (maybe with an additional warning that the behavior is changing).

@woodruffw
Copy link
Member Author

Turkey time is over, so I'll resume on this this week...

@woodruffw
Copy link
Member Author

I've been thinking about this a little bit more, and...maybe it makes sense to deprecate egg= fragments entirely? My understanding is that the req @ url spec is equivalent and preferred in all cases, meaning that the presence of egg= at all is entirely avoidable and quickly fixable (even automatically!).

I can still leave this PR to just the subject of restricting egg fragments to PEP 508 names, but IMO that's another long-term option worth thinking about.

woodruffw and others added 3 commits December 2, 2022 21:09
Signed-off-by: William Woodruff <william@trailofbits.com>
This turns invalid egg fragments into a soft error, with a scheduled
deprecation period of two releases.

Signed-off-by: William Woodruff <william@trailofbits.com>
@woodruffw
Copy link
Member Author

Rewrote this into a deprecation instead. I've kept the unit test around and made it into an xfail, so that it can be quickly toggled when this behavior does actually switch over.

Signed-off-by: William Woodruff <william@trailofbits.com>
Signed-off-by: William Woodruff <william@trailofbits.com>
Signed-off-by: William Woodruff <william@trailofbits.com>
@woodruffw
Copy link
Member Author

CI failures look unrelated:

assert 'User for localhost:44517' in 'Looking in indexes: https://localhost:44517/simple\nCould not fetch URL https://localhost:44517/simple/simple/: There was a problem confirming the ssl certificate: [SSL: KRB5_S_TKT_NYV] unexpected eof while reading (_ssl.c:2570) - skipping\n'
 +  where 'Looking in indexes: https://localhost:44517/simple\nCould not fetch URL https://localhost:44517/simple/simple/: There was a problem confirming the ssl certificate: [SSL: KRB5_S_TKT_NYV] unexpected eof while reading (_ssl.c:2570) - skipping\n' = <tests.lib.TestPipResult object at 0x7f8d13203290>.stdout
FAILED tests/functional/test_install_config.py::test_do_not_prompt_for_authentication - AssertionError: assert 'ERROR: HTTP error 401' in 'ERROR: Could not find a version that satisfies the requirement simple (from versions: none)\nERROR: No matching distribution found for simple\n'
 +  where 'ERROR: Could not find a version that satisfies the requirement simple (from versions: none)\nERROR: No matching distribution found for simple\n' = <tests.lib.TestPipResult object at 0x7f8d130b6fd0>.stderr
FAILED tests/functional/test_install_config.py::test_prompt_for_keyring_if_needed[True] - AssertionError: Script returned code: 1
FAILED tests/functional/test_install_config.py::test_prompt_for_keyring_if_needed[False] - AssertionError: Script returned code: 1

Copy link
Member

@sbidoul sbidoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Can you rebase to see if we get a green CI?

Related: in #11676 I update the docs to clarify that the egg fragment is only necessary for editable VCS URLs.

@woodruffw
Copy link
Member Author

Merged; let me know if you're prefer a full rebase instead 🙂

@pradyunsg pradyunsg merged commit dca39dd into pypa:main Dec 28, 2022
@woodruffw woodruffw deleted the ww/restrict-egg-fragement branch December 28, 2022 17:00
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 13, 2023
@pradyunsg
Copy link
Member

I updated the description here, since users are ending up on this page from the link in the error message added in this PR. :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restrict unused version pins in URL/VCS requirements?
4 participants