Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a method to simplify a SpecifierSet #776

Open
pfmoore opened this issue Jan 18, 2024 · 14 comments
Open

Add a method to simplify a SpecifierSet #776

pfmoore opened this issue Jan 18, 2024 · 14 comments

Comments

@pfmoore
Copy link
Member

pfmoore commented Jan 18, 2024

It would be useful when reasoning about SpecifierSet values to have a means of simplifying them. For example, SpecifierSet("<1.0, <2.0") is equivalent to SpecifierSet("<1.0"). For example, knowing that a complex specifier set is guaranteed to be empty can be useful when resolving requirements.

I have a prototype implementation which currently only merges multiple "<", "<=", ">" and ">=" specifiers, but which could be extended to simplify uses of other specifiers. I'd be willing to develop this into a full implementation and contribute it if there was interest.

One technical issue which would need discussion is around pre-release specifiers. The specifier set >1.0a1, >2.0 is equivalent to >2.0. However, the former "explicitly mentions" a prerelease version, whereas the latter does not. I'm not 100% sure what the packaging library's policy is on whether this matters when checking if a version satisfies a specifier. There are a number of options for this case - simplify anyway, refuse to simplify, or simplify and return a flag saying that an explicit pre-release version was eliminated in the simplification.

@notatallshaw
Copy link
Member

notatallshaw commented Jan 18, 2024

FYI, I also have a use case for this.

I have an idea to inspect specifiers to help optimize resolving requirements. Simplifying specifiers first would reduce complexity, and having unambiguous answers to the questions raised here would save having to make guesses on how they should behave.

@pfmoore
Copy link
Member Author

pfmoore commented Jan 18, 2024

Yes, this was prompted by #760, because a specifier like >1.0, <1.0 is empty, but the check suggested in that issue wouldn't identify that. Simplifying and then checking would.

At the moment, I'm assuming a separate method, simplified = spec.simplify(), but maybe there's a case for maintaining specifier sets in simplified form at all times? The only methods that create specifier sets are the constructor and __and__, and it would be easy enough to simplify as a final step in each of those.

@pfmoore
Copy link
Member Author

pfmoore commented Jan 18, 2024

I'm not 100% sure what the packaging library's policy is on whether this matters when checking if a version satisfies a specifier.

OK, so I tested:

>>> SpecifierSet(">=1.0.0a1,>1.1.0").contains("1.3.0a1")
True
>>> SpecifierSet(">1.1.0").contains("1.3.0a1")
False
>>> SpecifierSet(">1.1.0",prereleases=True).contains("1.3.0a1")
True

To be correct, it looks like SpecifierSet(">=1.0.0a1,>1.1.0") would have to simplify to SpecifierSet(">1.1.0", prereleases=True). But I'm a little concerned that the default handling of prereleases (when it's not specified, or None, in the constructor arguments) doesn't behave as I expected:

>>> SpecifierSet(">1.1.0a1").prereleases
False

I'd have been perfectly happy with either True or None here - but False just seems wrong to me. Can one of the maintainers explain the logic here?

@pfmoore
Copy link
Member Author

pfmoore commented Jan 19, 2024

Can one of the maintainers explain the logic here?

It appears that the logic is here - only inclusive operators count as "explicitly requesting" a pre-release, so >1.1.0a1 does not match pre-releases. I can't say that I particularly agree with this interpretation, but it's deliberate and explicit, so I guess I'll need to follow it when simplifying.

@pfmoore
Copy link
Member Author

pfmoore commented Jan 19, 2024

Yes, this was prompted by #760, because a specifier like >1.0, <1.0 is empty, but the check suggested in that issue wouldn't identify that. Simplifying and then checking would.

Wait. It appears that I misunderstood what "empty" meant in that issue. SpecifierSet("") is empty in the sense that its length is 0. But this means that it contains every version (it has no specifiers that limit the allowed versions). What I was thinking was that it was empty because it contained no versions.

I can't actually think of a way to write a specifier that is empty in the sense that it contains no versions, that is simpler than <v,>v (for any version v). Is that correct?

@notatallshaw
Copy link
Member

notatallshaw commented Jan 19, 2024

I can't actually think of a way to write a specifier that is empty in the sense that it contains no versions, that is simpler than <v,>v (for any version v). Is that correct?

Correct, I had a think about that in this issue: #762.

You probably got this misunderstanding from me, as I had that misunderstanding when I wrote #760. A “simplify” would still be very useful, i.e. ">2,>1" gets simplified to ">2".

But what is also needed, at least for my use case, is the ability to ask "for this Specifier Set is it completely contradictory? I.e. there are no possible versions that could match it", this would have to be a different method than "simplify" as the result is not expressible itself as a Specifier Set. Also this method may need to be careful with its return values, it probably needs to be explicit in having a "can not determine" value.

Apologies if this is outside the scope of what you were thinking.

@dstufft
Copy link
Member

dstufft commented Jan 19, 2024

I'd have been perfectly happy with either True or None here - but False just seems wrong to me. Can one of the maintainers explain the logic here?

SpecifierSet.prereleases encapsulates the logic to handle both explicitly specified via kwarg and determined via inspecting the specifier.

I'm pretty sure the answer is just that when I wrote the code 10 years ago, I was trying to prevent !=1.0a1 from triggering the "should include the prerelease`, and so I included a check not just on what versions were mentioned in the specifier, but whether that specifier logically was "including" that version or whether it was "excluding" that version.

I think this makes general sense, !=1.0a1 is explicitly saying you don't want that version, so it not requesting prereleases, while ==1.0a1 is explicitly saying you do want that version, so it is requesting prereleases.

The > and < operators kind of sit in a weird place though, >1.0a1 is not requesting 1.0a1, it's requesting anything of a higher version than that, so it currently does not trigger the "user has requested prereleases" logic, but that creates a weird asymmetry with >=1.0a1. To make matters worse, It's not possible to get a prerelease version that is less than 1.0a0.dev0, so does <1.0a0.dev0 include or exclude prereleases?

I don't remember thinking to hard about it, and just did the ones that explicitly included the pre-release that was being mentioned, which meant that > and < were excluded from being considered triggering the "user has requested a prerelease" logic... but I dont' feel really strongly about that one way or another.

@pfmoore
Copy link
Member Author

pfmoore commented Jan 19, 2024

No, it's precisely in scope of what I'm thinking about. And the result is expressible as a specifier set (after all, ">0,<0" is precisely such a set) - it's just that there's no immediately obvious "canonical" answer. A simplifier could certainly just pick a value and consider it canonical (and reduce any other provably-unsatisfiable specifier sets to that value).

What's actually far more of an issue is the handling of prerelease versions, which I'm starting to think may actually make this impossible. Consider the following example:

>>> Version("0.dev0") in Specifier("<0")
False
>>> Version("0.dev0") in Specifier("<=0.dev0")
True
>>> Version("0.dev0") in SpecifierSet("<0,<=0.dev0")
False

Pretending for a moment that we don't realise that things are going to get messy because "0.dev0" is a pre-release, the above is perfectly reasonable. However:

>>> Version("0.dev0") < Version("0")
True

So, of the two upper bounds, "<=0.dev0" is the tighter. However:

>>> Version("0.dev0") in SpecifierSet("<=0.dev0")
True

The problem here is that "<0" is actually a tighter bound in the sense that it excludes pre-release versions even if they are <0. And setting the prereleases flag on the specifier set doesn't address this (or at least, I can't prove to my satisfaction that it does) because the set would then behave differently (you can't combine it using & with a set that says prereleases=True for example).

@pfmoore
Copy link
Member Author

pfmoore commented Jan 19, 2024

I don't remember thinking to hard about it, and just did the ones that explicitly included the pre-release that was being mentioned, which meant that > and < were excluded from being considered triggering the "user has requested a prerelease" logic... but I dont' feel really strongly about that one way or another.

Yeah, I'm pretty sure the whole "prereleases" logic (here, in the spec, and in pip) was all based on an instinctive desire to have things "do what I mean", but failed to create something that was mathematically or logically consistent. Regrettably, though, I think we're probably stuck with what we have, short of a "Version specifiers 2.0" like you mentioned on Discourse...

@dstufft
Copy link
Member

dstufft commented Jan 19, 2024

Yea, in hindsight the logic probably should have been either simplified or pushed to the edges (e.g. in pip/poetry/etc), and left the core library and spec alone to implement the logically consistent thing.

@pfmoore
Copy link
Member Author

pfmoore commented Jan 19, 2024

I guess a fair question for the packaging maintainers is whether they (you) would support a breaking change that did just that - made prereleases just like any other version, and left it to tools to implement the "do what I mean" semantics.

To be clear, I'm not even sure that's possible, but unless the maintainers have the stomach for that level of breakage, it's not worth putting effort into trying to find a workable approach. (Personally, I expect the answer to be "no, this is too much for too little benefit" - I know I wouldn't be happy trying to deal with the knock-on effects of such a change in pip).

@dstufft
Copy link
Member

dstufft commented Jan 19, 2024

Brett and Pradyun's opinions should weigh much higher than mine, since they've been the primary maintainers for awhile now.

Personally I think it's something we should do, but I have a bit of a preference for coupling it to the hypothetical PEP 440v2, but if someone felt strongly about getting that particular change out and figured out a good way to manage the breakage, I wouldn't (personally) be super upset about that.

But again, I'd consider my opinion to be less of a maintainer of packaging at this point and more just a guy whose been around for awhile :)

@notatallshaw
Copy link
Member

notatallshaw commented Jan 19, 2024

The problem here is that "<0" is actually a tighter bound in the sense that it excludes pre-release versions even if they are <0. And setting the prereleases flag on the specifier set doesn't address this (or at least, I can't prove to my satisfaction that it does) because the set would then behave differently (you can't combine it using & with a set that says prereleases=True for example).

Yes, I arrived at similar concerns through different means. For example, I was worried that for something like ">0,<0," there might actually be versions that would pass this Specifier Set if one of the constructing Specifiers was prereleases=True (it seems from this discussion that's not the case in this example).

I still think it's possible to determine in many cases if a Specifier Set is contradictory, e.g., ">2,<1", in this case, it doesn't matter if one of them is a pre-release. But, yeah, the boundary conditions are particularly difficult, exponentially so because of pre-releases.

I had an idea of writing a testing mechanism that generates all possible versions around a boundary version (e.g., 1.0), up to some topological equivalence. Then, it creates all possible Specifiers around that boundary version and checks what versions are contained in that specifier. This can be used to test various things, like if Specifier Sets are logically consistent with their underlying Specifiers, and to test a function that determines if a Specifier Set is completely exclusionary or not. I have this mechanism mostly working, but there were some nuances I hadn't ironed out which this thread, I think, should immensely help with when I have a moment to work on it again.

@notatallshaw
Copy link
Member

FYI I've made a PR (#794) to follow the spec, which will also fix this imbalance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants