From 12059340def4b6b1b686bfae7bc03c298721982a Mon Sep 17 00:00:00 2001 From: Zanie Blue Date: Tue, 6 Aug 2024 13:18:23 -0500 Subject: [PATCH] Update resolver reference documentation (#5823) --- docs/concepts/resolution.md | 5 + docs/reference/resolver-internals.md | 207 +++++++++++++++------------ 2 files changed, 121 insertions(+), 91 deletions(-) diff --git a/docs/concepts/resolution.md b/docs/concepts/resolution.md index 95fbcfd89159..aacffb91bc31 100644 --- a/docs/concepts/resolution.md +++ b/docs/concepts/resolution.md @@ -271,3 +271,8 @@ packages. To ensure reproducibility, messages for unsatisfiable resolutions will not mention that distributions were excluded due to the `--exclude-newer` flag — newer distributions will be treated as if they do not exist. + +## Learn more + +For more details about the internals of the resolver, see the +[resolver reference](../reference/resolver-internals.md) documentation. diff --git a/docs/reference/resolver-internals.md b/docs/reference/resolver-internals.md index 76056d49672b..7ad3f02ed997 100644 --- a/docs/reference/resolver-internals.md +++ b/docs/reference/resolver-internals.md @@ -1,111 +1,136 @@ -# Resolution internals - -This page explains some of the internal workings of uv, its resolver and the lockfile. For using uv, -see [Resolution](../concepts/resolution.md). - -## Dependency resolution with PubGrub - -If you look into a textbook, it will tell you that finding a set of version to install from a given -set of requirements is equivalent to the -[SAT problem](https://en.wikipedia.org/wiki/Boolean_satisfiability_problem) and thereby NP-complete, -i.e., in the worst case you have to try all possible combinations of all versions of all packages -and there are no general fast algorithms. In practice, this is fairly misleading for a number of -reasons: - -- The slowest part of uv is loading package and version metadata, even if it's cached. -- Certain solution are more preferable than others, for example we generally want to use latest - versions. -- Requirements follow lots of patterns: We use continuous versions ranges and not arbitrary boolean - inclusion/exclusions of versions, adjacent release have the same or similar requirements, etc. -- For the majority of resolutions, we wouldn't even need to backtrack, just picking versions - iteratively is sufficient. If we have preferences from a previous resolution we often barely need - to anything at all. -- We don't just need either a solution or a message that there is no solution (like for SAT), we - need an understandable error trace that tell you which packages are involved in away to allows you - to remove the conflict. +# Resolver internals + +!!! tip + + This document focuses on the internal workings of uv's resolver. For using uv, see the + [resolution concept](../concepts/resolution.md) documentation. + +## Resolver + +As defined in a textbook, resolution, or finding a set of version to install from a given set of +requirements, is equivalent to the [SAT +problem](https://en.wikipedia.org/wiki/Boolean_satisfiability_problem) and thereby NP-complete: in +the worst case you have to try all possible combinations of all versions of all packages and there +are no general, fast algorithms. In practice, this is misleading for a number of reasons: + +- The slowest part of resolution in uv is loading package and version metadata, even if it's cached. +- There are many possible solutions, but some are preferable than others. For example we generally + prefer using the latest version of packages. +- Package's dependencies are complex, e.g., there are contiguous versions ranges — not arbitrary + boolean inclusion/exclusions of versions, adjacent releases often have the same or similar + requirements, etc. +- For most resolutions, the resolver doesn't need to backtrack, picking versions iteratively is + sufficient. If there are version preferences from a previous resolution, barely any work needs to + be done. +- When resolution fails, more information is needed than a message that there is no solution (as is + seen in SAT solvers). Instead, the resolver should produce an understandable error trace that + states which packages are involved in away to allows a user to remove the conflict. uv uses [pubgrub-rs](https://github.com/pubgrub-rs/pubgrub), the Rust implementation of [PubGrub](https://nex3.medium.com/pubgrub-2fb6470504f), an incremental version solver. PubGrub in uv works in the following steps: -- We have a partial solution that tells us for which packages we already picked versions and for - which we still need to decide. -- From the undecided packages we pick the one with the highest priority. Package with URLs - (including file, git, etc.) have the highest priority, then those with more exact specifiers (such - as `==`), then those with less strict specifiers. Inside each category, we order packages by when - we first saw them, making the resolution deterministic. -- For that package with the highest priority, pick a version that works with all specifiers from the - packages with versions in the partial solution and that is not yet marked as incompatible. We - prefer versions from a lockfile (`uv.lock` or `-o requirements.txt`) and installed versions, then - we go from highest to lowest (unless you changed the resolution mode). You can see this happening - by the `Selecting ...` messages in `uv lock -v`. -- Add all requirements of this version to pubgrub. Start prefetching their metadata in the - background. -- Now we either we repeat this process with the next package or we have a conflict. Let's say we - pick picked, among other packages, `a` 2 and then `b` 2, and those have requirements `a 2 -> c 1` - and `b 2 -> c 2`. When trying to pick a version for `c`, we see there is no version we can pick. - Using its internal incompatibilities store, PubGrub traces this back to `a 2` and `b 2` and adds - an incompatibility for `{a 2, b 2}`, meaning when either is picked we can't select the other. We - restore the state with `a` 2 before picking `b` 2 with the new learned incompatibility and pick a - new version for `b`. - -Eventually, we either have picked compatible versions for all packages and get a successful -resolution, or we get an incompatibility for the virtual root package, that is whatever versions of -the root dependencies and their transitive dependencies we'd pick, we'll always get a conflict. From -the incompatibilities in PubGrub, we can trace which packages were involved and format an error -message. For more details on the PubGrub algorithm, see -[Internals of the PubGrub algorithm](https://pubgrub-rs-guide.pages.dev/internals/intro). +- Start with a partial solution that declares which packages versions have been selected and + which are undecided. Initially, this may be all undecided. +- The highest priority package is selected from the undecided packages. Package with URLs (including + file, git, etc.) have the highest priority, then those with more exact specifiers (such as `==`), + then those with less strict specifiers. Inside each category, packages are ordered by when they + were first seen (i.e. order in a file), making the resolution deterministic. +- A version is picked for the selected package. The version must works with all specifiers from the + requirements in the partial solution and must not be previously marked as incompatible. The + resolver prefers versions from a lockfile (`uv.lock` or `-o requirements.txt`) and that are + installed in the current environment. Versions are checked from highest to lowest (unless using an + alternative [resolution strategy](../concepts/resolution.md#resolution-strategy)). +- All requirements of the selected package version are added to the undecided packages. uv + prefetches their metadata in the background to improve performance. +- The process is either repeated with the next package unless a conflict is detected, in which the + resolver will backtrack. For example, if the partial solution contains, among other packages, `a + 2` then `b 2` with the requirements `a 2 -> c 1` and `b 2 -> c 2`. No compatible version of `c` + can be found. PubGrub can determine this was caused by `a 2` and `b 2` and add the incompatibility + `{a 2, b 2}`, meaning that when either is picked, the other cannot be selected. The partial solution is + restored to `a 2` with the tracked incompatibility and the resolver attempts to pick a new version + for `b`. + +Eventually, the resolver either picks compatible versions for all packages (a successful resolution) +or there is an incompatibility including the "root" package which defines the versions requested by +the user. An incompatibility with the root package indicates that whatever versions of the root +dependencies and their transitive dependencies are picked, there will always be a conflict. From the +incompatibilities tracked in PubGrub, an error message is constructed to enumerate the involved +packages. + + +!!! tip + + For more details on the PubGrub algorithm, see [Internals of the PubGrub + algorithm](https://pubgrub-rs-guide.pages.dev/internals/intro). ## Forking -Python historically didn't have backtracking version resolution, and even with version resolution, -it was usually limited to single environment, which one specific architecture, operating system, -python version and python implementation. Some packages use contradictory requirements for different -environments, something like: +Python resolvers historically didn't support backtracking, and even with backtracking, resolution +was usually limited to single environment, which one specific architecture, operating system, +Python version, and Python implementation. Some packages use contradictory requirements for different +environments, for example: ```text numpy>=2,<3 ; python_version >= "3.11" numpy>=1.16,<2 ; python_version < "3.11" ``` -Since Python only allows one version package, just version resolution would error here. Inspired by -[poetry](https://github.com/python-poetry/poetry), we instead use forking: Whenever there are -multiple requirements with different for one package name in the requirements of a package, we split -the resolution around these requirements. In this case, we take our partial solution and then once -solve the rest for `python_version >= "3.11"` and once for `python_version < "3.11"`. If some -markers overlap or are missing a part of the marker space, we add additional forks. There can be -more than 2 forks per package and we nest forks. You can see this in the log of `uv lock -v` by -looking for `Splitting resolution on ...`, `Solving split ... (requires-python: ...)` and -`Split ... resolution took ...`. - -One problem is that where and how we split is dependent on the order we see packages, which is in -turn dependent on the preference you get e.g. from `uv.lock`. So it can happen that we solve your -requirements with specific forks, write this to the lockfile, and when you call `uv lock` again, -we'd do a different resolution even if nothing changed because the preferences cause us to use -different fork points. To avoid this we write the `environment-markers` of each fork and each -package that diverges between forks to the lockfile. When doing a new resolution, we start with the -forks from the lockfile and use fork-dependent preference (from the `environment-markers` on each -package) to keep the resolution stable. When requirements change, we may introduce new forks from -the saved forks. We also merge forks with identical packages to keep the number of forks low. +Since Python only allows one version of each package, a naive resolver would error here. Inspired by +[Poetry](https://github.com/python-poetry/poetry), uv uses a forking resolver: whenever there are +multiple requirements for a package with different markers, the resolution is split. + +In the above example, the partial solution would be split into two resolutions, one for +`python_version >= "3.11"` and one for `python_version < "3.11"`. + +If markers overlap or are missing a part of the marker space, the resolver splits additional times — +there can be many forks per package. For example, given: + +```text +flask > 1 ; sys_platform == 'darwin' +flask > 2 ; sys_platform == 'win32' +flask +``` + +A fork would be created for `sys_platform == 'darwin'`, for `sys_platform == 'win32'`, and for +`sys_platform != 'darwin' and sys_platform != 'win32'`. + +Forks can be nested, e.g., each fork is dependent on any previous forks that occurred. Forks with +identical packages are merged to keep the number of forks low. + +!!! tip + + Forking can be observed in the logs of `uv lock -v` by looking for + `Splitting resolution on ...`, `Solving split ... (requires-python: ...)` and `Split ... resolution + took ...`. + +One difficulty in a forking resolver is that where splits occur is dependent on the order packages +are seen, which is in turn dependent on the preferences, e.g., from `uv.lock`. So it is possible for +the resolver to solve the requirements with specific forks, write this to the lockfile, and when the +resolver is invoked again, a different solution is found because the preferences result in different +fork points. To avoid this, the `environment-markers` of each fork and each package that diverges +between forks is written to the lockfile. When performing a new resolution, the forks from the +lockfile are used to ensure the resolution is stable. When requirements change, new forks may be +added to the saved forks. ## Requires-python -To ensure that a resolution with `requires-python = ">=3.9"` can actually be installed for all those -python versions, uv requires that all dependency support at least that python version. We reject -package versions that declare e.g. `requires-python = ">=3.10"` because we already know that a -resolution with that version can't be installed on Python 3.9, while the user explicitly requested -including 3.9. For simplicity and forward compatibility, we do however only consider lower bounds -for requires-python. If a dependency declares `requires-python = ">=3.8,<4"`, we don't want to -propagate that `<4` marker. +To ensure that a resolution with `requires-python = ">=3.9"` can actually be installed for the +included Python versions, uv requires that all dependencies have the same minimum Python version. +Package versions that declare a higher minimum Python version, e.g., `requires-python = ">=3.10"`, +are rejected, because a resolution with that version can't be installed on Python 3.9. For +simplicity and forward compatibility, only lower bounds in `requires-python` are respected. For +example, if a package declares `requires-python = ">=3.8,<4"`, the `<4` marker is not propagated +to the entire resolution. ## Wheel tags -While our resolution is universal with respect to requirement markers, this doesn't extend to wheel -tags. Wheel tags can encode Python version, Python interpreter, operating system and architecture, -e.g. `torch-2.4.0-cp312-cp312-manylinux2014_aarch64.whl` is only compatible with CPython 3.12 on -arm64 Linux with glibc >= 2.17 (the manylinux2014 policy), while `tqdm-4.66.4-py3-none-any.whl` -works with all Python 3 versions and interpreters on any operating system and architecture. Most -projects have a (universally compatible) source distribution we can fall back to when we try to -install a package version and there is no compatible wheel, but some, such as `torch`, don't have a -source distribution. In this case an installation on e.g. Python 3.13 or an uncommon operating -system or architecture will fail with a message about a missing matching wheel. +While uv's resolution is universal with respect to environment markers, this doesn't extend to wheel +tags. Wheel tags can encode the Python version, Python implementation, operating system, and +architecture. For example, `torch-2.4.0-cp312-cp312-manylinux2014_aarch64.whl` is only compatible +with CPython 3.12 on arm64 Linux with `glibc>=2.17` (per the `manylinux2014` policy), while +`tqdm-4.66.4-py3-none-any.whl` works with all Python 3 versions and interpreters on any operating +system and architecture. Most projects have a universally compatible source distribution that can be +used when attempted to install a package that has no compatible wheel, but some packages, such as +`torch`, don't publish a source distribution. In this case an installation on, e.g., Python 3.13, an +uncommon operating system, or architecture, will fail and complain that there is no matching wheel.