Skip to content

Commit

Permalink
Implement marker trees using algebraic decision diagrams (#5898)
Browse files Browse the repository at this point in the history
This PR rewrites the `MarkerTree` type to use algebraic decision
diagrams (ADD). This has many benefits:
- The diagram is canonical for a given marker function. It is impossible
to create two functionally equivalent marker trees that don't refer to
the same underlying ADD. This also means that any trivially true or
unsatisfiable markers are represented by the same constants.
- The diagram can handle complex operations (conjunction/disjunction) in
polynomial time, as well as constant-time negation.
- The diagram can be converted to a simplified DNF form for user-facing
output.

The new representation gives us a lot more confidence in our marker
operations and simplification, which is proving to be very important
(see #5733 and
#5163).

Unfortunately, it is not easy to split this PR into multiple commits
because it is a large rewrite of the `marker` module. I'd suggest
reading through the `marker/algebra.rs`, `marker/simplify.rs`, and
`marker/tree.rs` files for the new implementation, as well as the
updated snapshots to verify how the new simplification rules work in
practice. However, a few other things were changed:
- [We now use release-only comparisons for `python_full_version`, where
we previously only did for
`python_version`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/algebra.rs#L522).
I'm unsure how marker operations should work in the presence of
pre-release versions if we decide that this is incorrect.
- [Meaningless marker expressions are now
ignored](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/parse.rs#L502).
This means that a marker such as `'x' == 'x'` will always evaluate to
`true` (as if the expression did not exist), whereas we previously
treated this as always `false`. It's negation however, remains `false`.
- [Unsatisfiable markers are written as `python_version <
'0'`](https://github.com/astral-sh/uv/blob/ibraheem/canonical-markers/crates/pep508-rs/src/marker/tree.rs#L1329).
- The `PubGrubSpecifier` type has been moved to the new `uv-pubgrub`
crate, shared by `pep508-rs` and `uv-resolver`. `pep508-rs` also depends
on the `pubgrub` crate for the `Range` type, we probably want to move
`pubgrub::Range` into a separate crate to break this, but I don't think
that should block this PR (cc @konstin).

There is still some remaining work here that I decided to leave for now
for the sake of unblocking some of the related work on the resolver.
- We still use `Option<MarkerTree>` throughout uv, which is unnecessary
now that `MarkerTree::TRUE` is canonical.
- The `MarkerTree` type is now interned globally and can potentially
implement `Copy`. However, it's unclear if we want to add more
information to marker trees that would make it `!Copy`. For example, we
may wish to attach extra and requires-python environment information to
avoid simplifying after construction.
- We don't currently combine `python_full_version` and `python_version`
markers.
- I also have not spent too much time investigating performance and
there is probably some low-hanging fruit. Many of the test cases I did
run actually saw large performance improvements due to the markers being
simplified internally, reducing the stress on the old `normalize`
routine, especially for the extremely large markers seen in
`transformers` and other projects.

Resolves #5660,
#5179.
  • Loading branch information
ibraheemdev authored and zanieb committed Aug 9, 2024
1 parent 73b2d1c commit 07c6e07
Show file tree
Hide file tree
Showing 39 changed files with 3,235 additions and 2,581 deletions.
24 changes: 24 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ uv-options-metadata = { path = "crates/uv-options-metadata" }
uv-python = { path = "crates/uv-python" }
uv-requirements = { path = "crates/uv-requirements" }
uv-resolver = { path = "crates/uv-resolver" }
uv-pubgrub = { path = "crates/uv-pubgrub" }
uv-scripts = { path = "crates/uv-scripts" }
uv-settings = { path = "crates/uv-settings" }
uv-shell = { path = "crates/uv-shell" }
Expand All @@ -66,6 +67,7 @@ async_zip = { git = "https://github.com/charliermarsh/rs-async-zip", rev = "011b
axoupdater = { version = "0.7.0", default-features = false }
backoff = { version = "0.4.0" }
base64 = { version = "0.22.0" }
boxcar = { version = "0.2.5" }
cachedir = { version = "0.3.1" }
cargo-util = { version = "0.2.8" }
chrono = { version = "0.4.31" }
Expand Down Expand Up @@ -130,6 +132,7 @@ seahash = { version = "4.1.0" }
serde = { version = "1.0.197", features = ["derive"] }
serde_json = { version = "1.0.114" }
sha2 = { version = "0.10.8" }
smallvec = { version = "1.13.2" }
syn = { version = "2.0.66" }
sys-info = { version = "0.9.1" }
target-lexicon = {version = "0.12.14" }
Expand Down
30 changes: 30 additions & 0 deletions crates/pep440-rs/src/version_specifier.rs
Original file line number Diff line number Diff line change
Expand Up @@ -405,6 +405,13 @@ impl VersionSpecifier {
version,
}
}
/// `!=<version>`
pub fn not_equals_version(version: Version) -> Self {
Self {
operator: Operator::NotEqual,
version,
}
}

/// `>=<version>`
pub fn greater_than_equal_version(version: Version) -> Self {
Expand All @@ -413,6 +420,29 @@ impl VersionSpecifier {
version,
}
}
/// `><version>`
pub fn greater_than_version(version: Version) -> Self {
Self {
operator: Operator::GreaterThan,
version,
}
}

/// `<=<version>`
pub fn less_than_equal_version(version: Version) -> Self {
Self {
operator: Operator::LessThanEqual,
version,
}
}

/// `<<version>`
pub fn less_than_version(version: Version) -> Self {
Self {
operator: Operator::LessThan,
version,
}
}

/// Get the operator, e.g. `>=` in `>= 2.0.0`
pub fn operator(&self) -> &Operator {
Expand Down
7 changes: 7 additions & 0 deletions crates/pep508-rs/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,20 +20,27 @@ crate-type = ["cdylib", "rlib"]
workspace = true

[dependencies]
boxcar = { workspace = true }
derivative = { workspace = true }
itertools = { workspace = true }
indexmap = { workspace = true }
pubgrub = { workspace = true }
rustc-hash = { workspace = true }
pep440_rs = { workspace = true }
pyo3 = { workspace = true, optional = true, features = ["abi3", "extension-module"] }
pyo3-log = { workspace = true, optional = true }
regex = { workspace = true }
schemars = { workspace = true, optional = true }
serde = { workspace = true, features = ["derive", "rc"] }
serde_json = { workspace = true, optional = true }
smallvec = { workspace = true }
thiserror = { workspace = true }
tracing = { workspace = true, optional = true }
unicode-width = { workspace = true }
url = { workspace = true, features = ["serde"] }
uv-fs = { workspace = true }
uv-normalize = { workspace = true }
uv-pubgrub = { workspace = true }

[dev-dependencies]
insta = { version = "1.36.1" }
Expand Down
99 changes: 53 additions & 46 deletions crates/pep508-rs/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,10 @@ use url::Url;

use cursor::Cursor;
pub use marker::{
ExtraOperator, MarkerEnvironment, MarkerEnvironmentBuilder, MarkerExpression, MarkerOperator,
MarkerTree, MarkerValue, MarkerValueString, MarkerValueVersion, MarkerWarningKind,
StringVersion,
ContainsMarkerTree, ExtraMarkerTree, ExtraOperator, InMarkerTree, MarkerEnvironment,
MarkerEnvironmentBuilder, MarkerExpression, MarkerOperator, MarkerTree, MarkerTreeContents,
MarkerTreeKind, MarkerValue, MarkerValueString, MarkerValueVersion, MarkerWarningKind,
StringMarkerTree, StringVersion, VersionMarkerTree,
};
pub use origin::RequirementOrigin;
#[cfg(feature = "pyo3")]
Expand Down Expand Up @@ -189,7 +190,7 @@ impl<T: Pep508Url + Display> Display for Requirement<T> {
}
}
}
if let Some(marker) = &self.marker {
if let Some(marker) = self.marker.as_ref().and_then(MarkerTree::contents) {
write!(f, " ; {marker}")?;
}
Ok(())
Expand Down Expand Up @@ -255,7 +256,10 @@ impl PyRequirement {
/// `requests [security,tests] >= 2.8.1, == 2.8.* ; python_version > "3.8"`
#[getter]
pub fn marker(&self) -> Option<String> {
self.marker.as_ref().map(ToString::to_string)
self.marker
.as_ref()
.and_then(MarkerTree::contents)
.map(|marker| marker.to_string())
}

/// Parses a PEP 440 string
Expand Down Expand Up @@ -416,18 +420,20 @@ impl<T: Pep508Url> Requirement<T> {
#[must_use]
pub fn with_extra_marker(self, extra: &ExtraName) -> Self {
let marker = match self.marker {
Some(expression) => MarkerTree::And(vec![
expression,
MarkerTree::Expression(MarkerExpression::Extra {
Some(mut marker) => {
let extra = MarkerTree::expression(MarkerExpression::Extra {
operator: ExtraOperator::Equal,
name: extra.clone(),
}),
]),
None => MarkerTree::Expression(MarkerExpression::Extra {
});
marker.and(extra);
marker
}
None => MarkerTree::expression(MarkerExpression::Extra {
operator: ExtraOperator::Equal,
name: extra.clone(),
}),
};

Self {
marker: Some(marker),
..self
Expand Down Expand Up @@ -1043,7 +1049,7 @@ fn parse_pep508_requirement<T: Pep508Url>(
let marker = if cursor.peek_char() == Some(';') {
// Skip past the semicolon
cursor.next();
Some(marker::parse::parse_markers_cursor(cursor, reporter)?)
marker::parse::parse_markers_cursor(cursor, reporter)?
} else {
None
};
Expand Down Expand Up @@ -1123,10 +1129,10 @@ mod tests {
use uv_normalize::{ExtraName, InvalidNameError, PackageName};

use crate::cursor::Cursor;
use crate::marker::{
parse, MarkerExpression, MarkerOperator, MarkerTree, MarkerValueString, MarkerValueVersion,
use crate::marker::{parse, MarkerExpression, MarkerTree, MarkerValueVersion};
use crate::{
MarkerOperator, MarkerValueString, Requirement, TracingReporter, VerbatimUrl, VersionOrUrl,
};
use crate::{Requirement, TracingReporter, VerbatimUrl, VersionOrUrl};

fn parse_pep508_err(input: &str) -> String {
Requirement::<VerbatimUrl>::from_str(input)
Expand Down Expand Up @@ -1216,7 +1222,7 @@ mod tests {
.into_iter()
.collect(),
)),
marker: Some(MarkerTree::Expression(MarkerExpression::Version {
marker: Some(MarkerTree::expression(MarkerExpression::Version {
key: MarkerValueVersion::PythonVersion,
specifier: VersionSpecifier::from_pattern(
pep440_rs::Operator::LessThan,
Expand Down Expand Up @@ -1463,37 +1469,38 @@ mod tests {
&mut Cursor::new(marker),
&mut TracingReporter,
)
.unwrap()
.unwrap();
let expected = MarkerTree::And(vec![
MarkerTree::Expression(MarkerExpression::Version {
key: MarkerValueVersion::PythonVersion,
specifier: VersionSpecifier::from_pattern(
pep440_rs::Operator::Equal,
"2.7".parse().unwrap(),
)
.unwrap(),
}),
MarkerTree::Or(vec![
MarkerTree::Expression(MarkerExpression::String {
key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: "win32".to_string(),
}),
MarkerTree::And(vec![
MarkerTree::Expression(MarkerExpression::String {
key: MarkerValueString::OsName,
operator: MarkerOperator::Equal,
value: "linux".to_string(),
}),
MarkerTree::Expression(MarkerExpression::String {
key: MarkerValueString::ImplementationName,
operator: MarkerOperator::Equal,
value: "cpython".to_string(),
}),
]),
]),
]);
assert_eq!(expected, actual);

let mut a = MarkerTree::expression(MarkerExpression::Version {
key: MarkerValueVersion::PythonVersion,
specifier: VersionSpecifier::from_pattern(
pep440_rs::Operator::Equal,
"2.7".parse().unwrap(),
)
.unwrap(),
});
let mut b = MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: "win32".to_string(),
});
let mut c = MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::OsName,
operator: MarkerOperator::Equal,
value: "linux".to_string(),
});
let d = MarkerTree::expression(MarkerExpression::String {
key: MarkerValueString::ImplementationName,
operator: MarkerOperator::Equal,
value: "cpython".to_string(),
});

c.and(d);
b.or(c);
a.and(b);

assert_eq!(a, actual);
}

#[test]
Expand Down
Loading

0 comments on commit 07c6e07

Please sign in to comment.