Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of constraints on requirements with extras #12095

Merged
merged 47 commits into from
Oct 5, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
5bebe85
take non-extra requirements into account for extra installs
sanderr Jun 20, 2023
937d8f0
partial improvement
sanderr Jun 21, 2023
5f8f40e
refinements
sanderr Jun 22, 2023
d09431f
fixes
sanderr Jun 22, 2023
49027d7
cleanup
sanderr Jun 22, 2023
cb0f97f
reverted troublesome changes
sanderr Jun 22, 2023
3160293
improvement
sanderr Jun 22, 2023
1038f15
stray todo
sanderr Jun 22, 2023
8aa1758
dropped unused attribute
sanderr Jul 13, 2023
faa3289
use regex for requirement update
sanderr Jul 13, 2023
7e8da61
clarification
sanderr Jul 13, 2023
ff9aeae
added resolver test case
sanderr Jul 25, 2023
3fa373c
added test for comes-from reporting
sanderr Jul 25, 2023
e569017
added test case for report bugfix
sanderr Jul 25, 2023
cc6a2bd
added second report test case
sanderr Jul 25, 2023
fc86308
Merge branch 'main' into issue/11924-requirements-on-extras
sanderr Jul 25, 2023
4ae829c
news entries
sanderr Jul 25, 2023
dc01a40
py38 compatibility
sanderr Jul 25, 2023
292387f
py37 compatibility
sanderr Jul 25, 2023
39e1102
fixed minor type errors
sanderr Jul 25, 2023
e6333bb
linting
sanderr Jul 26, 2023
1207389
made primary news fragment of type feature
sanderr Jul 26, 2023
6663b89
added final bugfix news entry
sanderr Jul 26, 2023
314d7c1
simplified regex
sanderr Jul 26, 2023
cc909e8
reverted unnecessary changes
sanderr Jul 26, 2023
6ed231a
added unit tests for install req manipulation
sanderr Jul 26, 2023
55e9762
windows compatibility
sanderr Jul 26, 2023
504485c
lint
sanderr Jul 26, 2023
f4a7c0c
cleaned up windows fix
sanderr Jul 26, 2023
32e95be
exclude brackets
sanderr Jul 26, 2023
ba761cd
Merge branch 'main' into issue/11924-requirements-on-extras
sanderr Jul 26, 2023
3f3ae6f
Merge branch 'main' into issue/11924-requirements-on-extras
sanderr Sep 6, 2023
21bfe40
use more stable sort key
sanderr Sep 6, 2023
0de374e
review comment: return iterator instead of list
sanderr Sep 6, 2023
5a01679
Update src/pip/_internal/req/constructors.py
sanderr Sep 6, 2023
9041602
Merge branch 'issue/11924-requirements-on-extras' of github.com:sande…
sanderr Sep 6, 2023
4e73e3e
review comment: subclass instead of constructor flag
sanderr Sep 6, 2023
50cd318
review comment: renamed and moved up ExtrasCandidate._ireq
sanderr Sep 6, 2023
ff9e15d
Merge branch 'main' into issue/11924-requirements-on-extras
sanderr Sep 6, 2023
f5602fa
added message to invariant assertions
sanderr Sep 6, 2023
449522a
minor fixes and linting
sanderr Sep 6, 2023
952ab6d
Update src/pip/_internal/resolution/resolvelib/factory.py
sanderr Sep 7, 2023
fbda0a2
Update tests/unit/resolution_resolvelib/test_requirement.py
sanderr Sep 8, 2023
ce94946
fixed argument name in docstring
sanderr Sep 13, 2023
46707a4
Merge branch 'issue/11924-requirements-on-extras' of github.com:sande…
sanderr Sep 13, 2023
89b68c6
Merge branch 'main' into issue/11924-requirements-on-extras
sanderr Sep 13, 2023
0f543e3
made assertions more robust
sanderr Sep 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 45 additions & 2 deletions src/pip/_internal/req/constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,15 @@
InstallRequirement.
"""

import copy
import logging
import os
import re
from typing import Dict, List, Optional, Set, Tuple, Union
from typing import Collection, Dict, List, Optional, Set, Tuple, Union

from pip._vendor.packaging.markers import Marker
from pip._vendor.packaging.requirements import InvalidRequirement, Requirement
from pip._vendor.packaging.specifiers import Specifier
from pip._vendor.packaging.specifiers import Specifier, SpecifierSet

from pip._internal.exceptions import InstallationError
from pip._internal.models.index import PyPI, TestPyPI
Expand Down Expand Up @@ -504,3 +505,45 @@ def install_req_from_link_and_ireq(
config_settings=ireq.config_settings,
user_supplied=ireq.user_supplied,
)


def install_req_drop_extras(ireq: InstallRequirement) -> InstallRequirement:
"""
Creates a new InstallationRequirement using the given template but without
any extras. Sets the original requirement as the new one's parent
(comes_from).
"""
req = Requirement(str(ireq.req))
req.extras = {}
Copy link
Member

@uranusjr uranusjr Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of trying to modify a Requirement in-place (which IIRC is not documented to be an officially supported operation), it may be better to strip the extras in the requirement string instead. This should not be too dificult with some regex since both the project and extra names are quite predictable.

(I’m not committed to this though and wouldn’t object if packaging maintainers are OK with this cc @pradyunsg)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change though I don't like the fact that it feels a bit more brittle. On the other hand I do agree with your sentiment that I shouldn't something that isn't part of packaging.Requirement's contract. @pradyunsg do you have an opinion on this (faa3289)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy if packaging made it a supported operation, but the existing docs state that the Requirement class is for parsing the string form of a requirement, which to me doesn't imply the result is mutable. You could read it as saying that it parses the string into a structured form, which you are allowed to them mutate if you want, but I don't think that automatically follows.

So as things stand, I agree with @uranusjr.

As @pradyunsg is a packaging maintainer, if he says it's OK, then I'm fine with that (although I'd ask that the docs be changed to explicitly state what's supported).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, don't modify this in place. It doesn't break things as implemented right now but making this a frozen dataclass is something I want to do in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your input. A frozen dataclass would be nice, that would then allow for dataclasses.replace, which I'd always prefer over in-place modifications.

return InstallRequirement(
req=req,
comes_from=ireq,
editable=ireq.editable,
link=ireq.link,
markers=ireq.markers,
use_pep517=ireq.use_pep517,
isolated=ireq.isolated,
global_options=ireq.global_options,
hash_options=ireq.hash_options,
constraint=ireq.constraint,
extras=[],
config_settings=ireq.config_settings,
user_supplied=ireq.user_supplied,
permit_editable_wheels=ireq.permit_editable_wheels,
)


def install_req_extend_extras(
ireq: InstallRequirement,
extras: Collection[str],
) -> InstallRequirement:
"""
Returns a copy of an installation requirement with some additional extras.
Makes a shallow copy of the ireq object.
"""
result = copy.copy(ireq)
req = Requirement(str(ireq.req))
req.extras.update(extras)
result.req = req
result.extras = {*ireq.extras, *extras}
return result
22 changes: 16 additions & 6 deletions src/pip/_internal/resolution/resolvelib/candidates.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,7 @@ def _prepare(self) -> BaseDistribution:
def iter_dependencies(self, with_requires: bool) -> Iterable[Optional[Requirement]]:
requires = self.dist.iter_dependencies() if with_requires else ()
for r in requires:
yield self._factory.make_requirement_from_spec(str(r), self._ireq)
yield from self._factory.make_requirements_from_spec(str(r), self._ireq)
yield self._factory.make_requires_python_requirement(self.dist.requires_python)

def get_install_requirement(self) -> Optional[InstallRequirement]:
Expand Down Expand Up @@ -392,7 +392,7 @@ def iter_dependencies(self, with_requires: bool) -> Iterable[Optional[Requiremen
if not with_requires:
return
for r in self.dist.iter_dependencies():
yield self._factory.make_requirement_from_spec(str(r), self._ireq)
yield from self._factory.make_requirements_from_spec(str(r), self._ireq)

def get_install_requirement(self) -> Optional[InstallRequirement]:
return None
Expand Down Expand Up @@ -427,9 +427,19 @@ def __init__(
self,
base: BaseCandidate,
extras: FrozenSet[str],
ireq: Optional[InstallRequirement] = None,
) -> None:
"""
:param ireq: the InstallRequirement that led to this candidate, if it
differs from the base's InstallRequirement. This will often be the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel maybe we should remove the None case and make this always an ireq instance (i.e. if we didn’t do extra manipulation this would just be base.ireq) to avoid an unnecessary branching.

It might be useful for this argument (and attribute) to have a different name to indicate it’s only for reporting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me. I personally prefer to be explicit when reverting to defaults but it's a thin line in this case.

I moved it up so that the attribute is not optional (50cd318), but I think it would be cleaner to leave the constructor argument optional for two reasons:

  1. caller interface remains simple for the common case
  2. _ireq is a "private" attribute on BaseCandidate. Accessing it from within the same module is one thing, doing so from factory feels to me like a violation of the interface. We could of course rename the attribute if we want.

With this context, how would you like me to proceed?

case in the sense that this candidate's requirement has the extras
while the base's does not. Unlike the InstallRequirement backed
candidates, this requirement is used solely for reporting purposes,
it does not do any leg work.
pradyunsg marked this conversation as resolved.
Show resolved Hide resolved
"""
self.base = base
self.extras = extras
self._ireq = ireq

def __str__(self) -> str:
name, rest = str(self.base).split(" ", 1)
Expand Down Expand Up @@ -502,11 +512,11 @@ def iter_dependencies(self, with_requires: bool) -> Iterable[Optional[Requiremen
)

for r in self.base.dist.iter_dependencies(valid_extras):
requirement = factory.make_requirement_from_spec(
str(r), self.base._ireq, valid_extras
yield from factory.make_requirements_from_spec(
str(r),
self._ireq if self._ireq is not None else self.base._ireq,
valid_extras,
)
if requirement:
yield requirement

def get_install_requirement(self) -> Optional[InstallRequirement]:
# We don't return anything here, because we always
Expand Down
95 changes: 69 additions & 26 deletions src/pip/_internal/resolution/resolvelib/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,13 +138,16 @@ def _fail_if_link_is_unsupported_wheel(self, link: Link) -> None:
raise UnsupportedWheel(msg)

def _make_extras_candidate(
self, base: BaseCandidate, extras: FrozenSet[str]
self,
base: BaseCandidate,
extras: FrozenSet[str],
ireq: Optional[InstallRequirement] = None,
) -> ExtrasCandidate:
cache_key = (id(base), extras)
try:
candidate = self._extras_candidate_cache[cache_key]
except KeyError:
candidate = ExtrasCandidate(base, extras)
candidate = ExtrasCandidate(base, extras, ireq=ireq)
self._extras_candidate_cache[cache_key] = candidate
return candidate

Expand All @@ -161,7 +164,7 @@ def _make_candidate_from_dist(
self._installed_candidate_cache[dist.canonical_name] = base
if not extras:
return base
return self._make_extras_candidate(base, extras)
return self._make_extras_candidate(base, extras, ireq=template)

def _make_candidate_from_link(
self,
Expand Down Expand Up @@ -223,7 +226,7 @@ def _make_candidate_from_link(

if not extras:
return base
return self._make_extras_candidate(base, extras)
return self._make_extras_candidate(base, extras, ireq=template)

def _iter_found_candidates(
self,
Expand Down Expand Up @@ -385,16 +388,21 @@ def find_candidates(
if ireq is not None:
ireqs.append(ireq)

# If the current identifier contains extras, add explicit candidates
# from entries from extra-less identifier.
# If the current identifier contains extras, add requires and explicit
# candidates from entries from extra-less identifier.
with contextlib.suppress(InvalidRequirement):
parsed_requirement = get_requirement(identifier)
explicit_candidates.update(
self._iter_explicit_candidates_from_base(
requirements.get(parsed_requirement.name, ()),
frozenset(parsed_requirement.extras),
),
)
if parsed_requirement.name != identifier:
explicit_candidates.update(
self._iter_explicit_candidates_from_base(
requirements.get(parsed_requirement.name, ()),
frozenset(parsed_requirement.extras),
),
)
for req in requirements.get(parsed_requirement.name, []):
_, ireq = req.get_candidate_lookup()
if ireq is not None:
ireqs.append(ireq)

# Add explicit candidates from constraints. We only do this if there are
# known ireqs, which represent requirements not already explicit. If
Expand Down Expand Up @@ -437,18 +445,33 @@ def find_candidates(
and all(req.is_satisfied_by(c) for req in requirements[identifier])
)

def _make_requirement_from_install_req(
def _make_requirements_from_install_req(
self, ireq: InstallRequirement, requested_extras: Iterable[str]
) -> Optional[Requirement]:
) -> list[Requirement]:
"""
Returns requirement objects associated with the given InstallRequirement. In
most cases this will be a single object but the following special cases exist:
- the InstallRequirement has markers that do not apply -> result is empty
- the InstallRequirement has both a constraint and extras -> result is split
in two requirement objects: one with the constraint and one with the
extra. This allows centralized constraint handling for the base,
resulting in fewer candidate rejections.
"""
if not ireq.match_markers(requested_extras):
logger.info(
"Ignoring %s: markers '%s' don't match your environment",
ireq.name,
ireq.markers,
)
return None
return []
if not ireq.link:
return SpecifierRequirement(ireq)
if ireq.extras and ireq.req.specifier:
return [
SpecifierRequirement(ireq, drop_extras=True),
SpecifierRequirement(ireq),
]
else:
return [SpecifierRequirement(ireq)]
self._fail_if_link_is_unsupported_wheel(ireq.link)
cand = self._make_candidate_from_link(
ireq.link,
Expand All @@ -466,8 +489,8 @@ def _make_requirement_from_install_req(
# ResolutionImpossible eventually.
if not ireq.name:
raise self._build_failures[ireq.link]
return UnsatisfiableRequirement(canonicalize_name(ireq.name))
return self.make_requirement_from_candidate(cand)
return [UnsatisfiableRequirement(canonicalize_name(ireq.name))]
return [self.make_requirement_from_candidate(cand)]

def collect_root_requirements(
self, root_ireqs: List[InstallRequirement]
Expand All @@ -488,30 +511,50 @@ def collect_root_requirements(
else:
collected.constraints[name] = Constraint.from_ireq(ireq)
else:
req = self._make_requirement_from_install_req(
reqs = self._make_requirements_from_install_req(
ireq,
requested_extras=(),
)
if req is None:
if not reqs:
continue
if ireq.user_supplied and req.name not in collected.user_requested:
collected.user_requested[req.name] = i
collected.requirements.append(req)

template = reqs[0]
if ireq.user_supplied and template.name not in collected.user_requested:
collected.user_requested[template.name] = i
collected.requirements.extend(reqs)
# Put requirements with extras at the end of the root requires. This does not
# affect resolvelib's picking preference but it does affect its initial criteria
# population: by putting extras at the end we enable the candidate finder to
# present resolvelib with a smaller set of candidates to resolvelib, already
# taking into account any non-transient constraints on the associated base. This
# means resolvelib will have fewer candidates to visit and reject.
# Python's list sort is stable, meaning relative order is kept for objects with
# the same key.
collected.requirements.sort(key=lambda r: r.name != r.project_name)
return collected

def make_requirement_from_candidate(
self, candidate: Candidate
) -> ExplicitRequirement:
return ExplicitRequirement(candidate)

def make_requirement_from_spec(
def make_requirements_from_spec(
self,
specifier: str,
comes_from: Optional[InstallRequirement],
requested_extras: Iterable[str] = (),
) -> Optional[Requirement]:
) -> list[Requirement]:
"""
Returns requirement objects associated with the given specifier. In most cases
this will be a single object but the following special cases exist:
- the specifier has markers that do not apply -> result is empty
- the specifier has both a constraint and extras -> result is split
in two requirement objects: one with the constraint and one with the
extra. This allows centralized constraint handling for the base,
resulting in fewer candidate rejections.
"""
ireq = self._make_install_req_from_spec(specifier, comes_from)
return self._make_requirement_from_install_req(ireq, requested_extras)
return self._make_requirements_from_install_req(ireq, requested_extras)

def make_requires_python_requirement(
self,
Expand Down
24 changes: 19 additions & 5 deletions src/pip/_internal/resolution/resolvelib/requirements.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from pip._vendor.packaging.utils import NormalizedName, canonicalize_name

from pip._internal.req.req_install import InstallRequirement
from pip._internal.req.constructors import install_req_drop_extras

from .base import Candidate, CandidateLookup, Requirement, format_name

Expand Down Expand Up @@ -40,13 +41,23 @@ def is_satisfied_by(self, candidate: Candidate) -> bool:


class SpecifierRequirement(Requirement):
def __init__(self, ireq: InstallRequirement) -> None:
def __init__(
self,
ireq: InstallRequirement,
*,
drop_extras: bool = False,
uranusjr marked this conversation as resolved.
Show resolved Hide resolved
) -> None:
"""
:param drop_extras: Ignore any extras that are part of the install requirement,
making this a requirement on the base only.
"""
assert ireq.link is None, "This is a link, not a specifier"
self._ireq = ireq
self._extras = frozenset(ireq.extras)
self._drop_extras: bool = drop_extras
self._ireq = ireq if not drop_extras else install_req_drop_extras(ireq)
uranusjr marked this conversation as resolved.
Show resolved Hide resolved
self._extras = frozenset(self._ireq.extras)

def __str__(self) -> str:
return str(self._ireq.req)
return str(self._ireq)

def __repr__(self) -> str:
return "{class_name}({requirement!r})".format(
Expand All @@ -61,7 +72,10 @@ def project_name(self) -> NormalizedName:

@property
def name(self) -> str:
return format_name(self.project_name, self._extras)
return format_name(
self.project_name,
self._extras,
)

def format_for_error(self) -> str:
# Convert comma-separated specifiers into "A, B, ..., F and G"
Expand Down
15 changes: 14 additions & 1 deletion src/pip/_internal/resolution/resolvelib/resolver.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import contextlib
import functools
import logging
import os
Expand All @@ -11,6 +12,7 @@
from pip._internal.cache import WheelCache
from pip._internal.index.package_finder import PackageFinder
from pip._internal.operations.prepare import RequirementPreparer
from pip._internal.req.constructors import install_req_extend_extras
from pip._internal.req.req_install import InstallRequirement
from pip._internal.req.req_set import RequirementSet
from pip._internal.resolution.base import BaseResolver, InstallRequirementProvider
Expand All @@ -19,6 +21,7 @@
PipDebuggingReporter,
PipReporter,
)
from pip._internal.utils.packaging import get_requirement

from .base import Candidate, Requirement
from .factory import Factory
Expand Down Expand Up @@ -101,9 +104,19 @@ def resolve(
raise error from e

req_set = RequirementSet(check_supported_wheels=check_supported_wheels)
for candidate in result.mapping.values():
# sort to ensure base candidates come before candidates with extras
for candidate in sorted(result.mapping.values(), key=lambda c: c.name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I’m not sure if sorting the entire list would break some things (we are generally quite delibrate to keep the input order in most cases). But I guess if it’s otherwise too cumbersome to sort a base candidate before extras we can just do this for now until (if) anyone complains.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make a good point. I don't think the order is very important at this late stage in the process, but best not to mess with it too much just in case. I changed this to a more conservative sort key in 21bfe40. I think this could be a nice middle ground: it only moves the extras to the rear, which should hardly affect anything. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good middle ground if it does not affect performance too much. (Another approach I was considering is to loop through the list once to process the non-extras ones, while collecting extras to be processed in a second loop.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to argue with respect to readability but now that I look it over once more I actually prefer the two-pass approach you suggest. Since we already branch on the presence of the extra, there's really no reason to keep it in the same loop. So I'll make the change.

Out of curiosity, I did run a quick test on the performance difference. I would have expected it to be negligible but this snippet seems to indicate that there is a slight performance hit for the sorting approach, though it's difficult to make it completely representative. It's still in the same order of magnitude though, which is all that matters for many practical cases.

import random
import timeit


nb_elements: int = 10_000
nb_iterations: int = 10_000

l = random.choices([0, 1], weights=[9, 1], k=nb_elements)


def with_sort():
    for i in sorted(l, key=lambda i: i):
        pass


def pass_twice():
    for i in l:
        if i == 0:
            pass
    for i in l:
        if i == 1:
            pass


def pass_twice_cache():
    second_pass: list[int] = []
    for i in l:
        if i == 0:
            pass
        else:
            second_pass.append(i)
    for i in second_pass:
        pass


print("with_sort", timeit.timeit(with_sort, number=nb_iterations))
print("pass_twice", timeit.timeit(pass_twice, number=nb_iterations))
print("pass_twice_cache", timeit.timeit(pass_twice_cache, number=nb_iterations))

result:

with_sort 7.437978116000522
pass_twice 4.571586305999517
pass_twice_cache 2.8610327940004936  # difference becomes less pronounced as the weights near 50-50

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already branch on the presence of the extra

This is not right, sorry about that. We branch on the presence of the extra iff there is no ireq. Extras candidates with an ireq require the same logic as base candidates with an ireq, so unless you think the performance difference is sufficiently significant to warrant the change I'm inclined to leave it as is after all.

This also means that the "middle ground" still reorders a bit more than I initially thought (because not all extras candidates have a corresponding base, meaning their processing may shift to the rear). Let me know if you think this is unacceptable, then I'll make an effort to implement smarter (but more complex) sorting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think performance is the issue here, the thing I’m worried about is mostly doing to much reordering may affect semantics, as mentioned above. I’m willing to opt for the easiest logic possible and see if it really affects anyone in practice.

ireq = candidate.get_install_requirement()
if ireq is None:
if candidate.name != candidate.project_name:
# extend existing req's extras
with contextlib.suppress(KeyError):
req = req_set.get_requirement(candidate.project_name)
req_set.add_named_requirement(
install_req_extend_extras(
req, get_requirement(candidate.name).extras
)
)
continue

# Check if there is already an installation under the same name,
Expand Down
Loading