Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fork when wheels are missing platform support #9928

Closed
wants to merge 3 commits into from

Conversation

charliermarsh
Copy link
Member

@charliermarsh charliermarsh commented Dec 16, 2024

Summary

See #9711 for an extensive explanation of the problem at play here.

This PR attempts to improve the resolver for cases in which a source distribution isn't available. In such cases, we assume that the available wheels cover all necessary platforms... So, taking PyTorch as an example, we always pick 2.5.1+cpu, even though it has no macOS wheels.

The approach taken here is: we require at least one wheel for each "platform" (macOS, Windows, and Linux). If a distribution is missing any of those platforms, then we fork and backtrack to a version that does have the wheels we need (or a source distribution).

The impact of this change is a little nuanced, so I'll talk through each example issue:

  • This approach does fix PyTorch, which is the most common failure case here (i.e., you can remove all the sys_platform != 'darwin' gating that we have in the docs and that we recommend to users).

  • It also fixes this pyqt5 case, which was reported twice (uv lock/add selects package versions not available for current platform #8603).

  • It helps a little with this markupsafe case... But doesn't completely solve it. If you're on a Linux platform that doesn't match the wheel in the PyTorch index, you'll still only get that wheel. (This case is maybe best solved with an explicit index.)

  • It does not fix this odrive issue. In that case, the user wants a wheel that supports an old version of macOS. I don't know how to solve this one without imposing much stricter requirements on the resolver (i.e., we'd have to require that the resolver always find a wheel to support all versions of macOS, which would fail in many cases, e.g., PyTorch).

Ref: #9711.

Closes: #7005.

Closes: #9646.

@charliermarsh charliermarsh added the bug Something isn't working label Dec 16, 2024
@@ -1152,6 +1158,56 @@ impl<InstalledPackages: InstalledPackagesProvider> ResolverState<InstalledPackag
)));
}

// Check whether this version covers all supported platforms.
if !dist.implied_markers().is_true() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only go down this path for distributions that lack a source distribution (which is comparatively quite rare).

_ => {}
}
}
marker
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We basically have control over how granular we want to make this (and the forking logic). For example... we could easily include architecture here too, so that we ensure we always get an x86 and an ARM wheel for macOS. However, we'd then need to require (in the resolver) that we always find an x86 wheel somewhere.

So the tradeoff here is: we can get more complete resolutions (making it less likely that we fail at install time due to a missing wheel), but we make the resolver stricter (making it more likely that resolution fails entirely).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to think of this: we can make sure that there's always an x86 wheel available for macOS, but that requires that we can always find an x86 wheel for macOS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal if we could make those very specific, that is include the arch, but give the user control over what they need. For example, if the user only want macos (any), than both the x86_64 and the aarch64 wheel should satisfy that since they are more specific than macos.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think I understand. Is this different then what’s in the follow-up PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is covered by #10017

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, makes sense.

key: MarkerValueString::SysPlatform,
operator: MarkerOperator::Equal,
value: "darwin".to_string(),
}),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we don't enforce these if the current environment is disjoint. So, like, if a user needs to resolve a requirement that only has Windows wheels, they can set environments = ["sys_platform == 'win32'"].

@zanieb
Copy link
Member

zanieb commented Dec 16, 2024

Casually posting a solution to this very hard problem on a Sunday! Can't wait to give it a look.

Ok(Some(ResolverVersion::Unavailable(
candidate.version().clone(),
UnavailableVersion::IncompatibleDist(IncompatibleDist::Wheel(
IncompatibleWheel::NoBinary,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, this is a TODO -- I'll fix it up tomorrow (going to bed soon). I need to do proper error messages for this.

@charliermarsh
Copy link
Member Author

Hah :) I think this pretty much a strict improvement, and it fixes PyTorch which is the most common case. But it won't fix every case, and I think there's an inherent tension between fixing the install-time failures vs. forcing more resolve-time failures :(

@charliermarsh
Copy link
Member Author

Ok, I think what I have here could potentially work...

The basic idea is that while solving, we track the superset of markers for which a dependency was included. And then we avoid applying this logic if a dependency has only been requested only on disjoint platforms.

This is basically a heuristic to continue delaying aggressive forking. So if a dependency is requested with python-magic-bin>=0.4.14,<0.5.0 ; sys_platform == 'win32', we note python-magic-bin: sys_platform == 'win32' globally (in ForkState); then when we look at the new forking logic in this PR, we skip platforms that are disjoint with the platforms on which the dependency has been requested.

These "known markers" will be wrong when we backtrack, but they'll always be a superset.

@charliermarsh charliermarsh force-pushed the charlie/incomplete branch 2 times, most recently from 2d52e11 to 5299027 Compare December 16, 2024 23:17
Copy link

codspeed-hq bot commented Dec 16, 2024

CodSpeed Performance Report

Merging #9928 will not alter performance

Comparing charlie/incomplete (056b61c) with main (e65a273)

Summary

✅ 14 untouched benchmarks

@charliermarsh charliermarsh force-pushed the charlie/incomplete branch 2 times, most recently from 9b7a177 to 1f2f366 Compare December 16, 2024 23:37
{ index = "pytorch-cu124", extra = "cu124" },
{ index = "pytorch-cpu", extra = "cpu" },
{ index = "pytorch-cpu", extra = "cu124", marker = "platform_system == 'Darwin'" },
{ index = "pytorch-cu124", extra = "cu124", marker = "platform_system != 'Darwin'" },
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the previous version doesn't work anymore... So, is this breaking?

@charliermarsh
Copy link
Member Author

If anyone else is planning to review, this is now ready.

@samypr100 samypr100 added the resolver Related to the package resolver label Dec 17, 2024
@konstin
Copy link
Member

konstin commented Dec 17, 2024

For transformers ecosystem check, we get:

DEBUG Forking platform for tensorflow==2.15.0.post1 (split `python_full_version >= '3.13' and sys_platform == 'win32'`, split `python_full_version >= '3.13' and sys_platform != 'win32'`)

and several other forks for tensorflow, but only tensorflow. As diff we get:

 wheels = [
     { url = "https://files.pythonhosted.org/packages/9c/d3/904d5bf64305218ce19f81ff3b2cb872cf434a558443b4a9a5357924637a/tensorflow-2.15.1-cp310-cp310-macosx_10_15_x86_64.whl", hash = "sha256:91b51a507007d63a70b65be307d701088d15042a6399c0e2312b53072226e909", size = 236439313 },
     { url = "https://files.pythonhosted.org/packages/54/38/2be65dc6f47e6aa0fb0494877676774f8faa685c08a5cecf0c0040afccbc/tensorflow-2.15.1-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:10132acc072d59696c71ce7221d2d8e0e3ff1e6bc8688dbac6d7aed8e675b710", size = 205693732 },
     { url = "https://files.pythonhosted.org/packages/51/1b/1f6eb37c97d9998010751511308058800fc3736092aac64c3fee23cf0b35/tensorflow-2.15.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:30c5ef9c758ec9ff7ce2aff76b71c980bc5119b879071c2cc623b1591a497a1a", size = 2121 },
     { url = "https://files.pythonhosted.org/packages/4f/42/433c0c64c5d3b8bee696cde2006d15f03f0504c2f746d49f38e32e52e239/tensorflow-2.15.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ea290e435464cf0794f657b48786e5fa413362abe55ed771c172c25980d070ce", size = 475215357 },
-      url = "https://files.pythonhosted.org/packages/1c/b7/604ed5e5507e3dd34b14295d5e4a762d47cc2e8cf29a23b4c20575461445/tensorflow-2.15.1-cp310-cp310-win_amd64.whl", hash = "sha256:8e5431d45ceb416c2b1b6de87378054fbac7d2ed35d45b102d89a786613fffdc", size = 2098 },
     { url = "https://files.pythonhosted.org/packages/25/72/2ede9c4b9b96650a8a7b909abf4733adf110c5907425ee252f8095385b11/tensorflow-2.15.1-cp311-cp311-macosx_10_15_x86_64.whl", hash = "sha256:6761efe511e6ee0f893f60738fefbcc51d6dc386eeaaafea59d21899ef369ffd", size = 236482723 },
     { url = "https://files.pythonhosted.org/packages/f1/31/3191cd83da2f213a3c4af5e40597a98996e9c84b56666f9595dad8a6e780/tensorflow-2.15.1-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:aa926114d1e13ffe5b2ea59c3f195216f26646d7fe36e9e5207b291e4b7902ff", size = 205736374 },
     { url = "https://files.pythonhosted.org/packages/81/40/31b27ab3f46de305475ef5160acc8a2addb8fa9ea2179777c4e9bcc5baaf/tensorflow-2.15.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e73d43dbc68d8c711e70edecc4ac70472799a25ec4ec18a84d479ee18033d3c5", size = 2121 },
     { url = "https://files.pythonhosted.org/packages/c1/2d/636471492d93b6c1bf5375b81be7105a313a8c91a07c37e43625b00ca5c3/tensorflow-2.15.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bb0edd69103c154245c5f209f0507355cc68ba7e4de350084bc31edc562478e4", size = 475258663 },
-      url = "https://files.pythonhosted.org/packages/cb/c5/f5b31ee348459d6f6c54d762488aa0a8e42ff11a7395f4d158915ad43000/tensorflow-2.15.1-cp311-cp311-win_amd64.whl", hash = "sha256:a49f8755c74a89553294a99ab25aa87ab1cddbfa40fe58387e09f64f0578cedc", size = 2096 },
     { url = "https://files.pythonhosted.org/packages/62/e7/b8db69612f401f52b510163d5a04b783c3f3440e8f86e299480095e5e76f/tensorflow-2.15.1-cp39-cp39-macosx_10_15_x86_64.whl", hash = "sha256:f8e85821317c9c0fbf1256e9f721cfb1400ba1e09becb844b3ddd91f744805fc", size = 236437643 },
     { url = "https://files.pythonhosted.org/packages/91/ee/1c4db1bb82d1158fdbfe024bc2331b06272cb8f4abaf7d4e2127b21fc428/tensorflow-2.15.1-cp39-cp39-macosx_12_0_arm64.whl", hash = "sha256:b75815b6a601edad52b4181e9805c8fcd04813a6ab1d5cd8127188dfd2788e20", size = 205692241 },
     { url = "https://files.pythonhosted.org/packages/16/85/6b758898a4342b50add537e2dd50f36fe3b16451bd3a41c2dec0a1ac7548/tensorflow-2.15.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:432788ac5d1234b9e9b7c7f73603a5655271a28c293329c52c7c0b9434a1184e", size = 2120 },
     { url = "https://files.pythonhosted.org/packages/38/03/d509cd280c07ba34f1690b5f094cd715bec3c69a08c15a8a61cf82833cf6/tensorflow-2.15.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:89b5aa1022dec47e567512eaf4e1271b8e6c1ff1984e30d0d9127bd1093ed4c5", size = 475214135 },
-      url = "https://files.pythonhosted.org/packages/a5/ef/a9fe22fabd5e11bda4e322daec40d8798a504fd2ee5725a56ba786503452/tensorflow-2.15.1-cp39-cp39-win_amd64.whl", hash = "sha256:aaf3cfa290597ebbdf19d1a78729e3f555e459506cd58f8d7399359ac5e02a05", size = 2095 },
 ]

As replacement, we're adding this older tensorflow version:

[[package]]
name = "tensorflow"
version = "2.10.1"
source = { registry = "https://pypi.org/simple" }
resolution-markers = [
    "python_full_version < '3.10' and sys_platform == 'win32'",
    "python_full_version == '3.10.*' and sys_platform == 'win32'",
    "python_full_version == '3.11.*' and sys_platform == 'win32'",
    "python_full_version == '3.12.*' and sys_platform == 'win32'",
    "python_full_version >= '3.13' and sys_platform == 'win32'",
]
dependencies = [
    # ...
]
wheels = [
    { url = "https://files.pythonhosted.org/packages/ad/87/f484e0b86687c97d2dfb081e03e948b796561fc8608b409a9366e3b4a663/tensorflow-2.10.1-cp310-cp310-win_amd64.whl", hash = "sha256:a6049664f9a0d14b0a4a7e6f058be87b2d8c27be826d7dd9a870ff03683fbc0b", size = 455948187 },
    { url = "https://files.pythonhosted.org/packages/fe/7d/9114d4d155b4414578dbb30e4b61a33dee4437d1c303b73445d79891ca54/tensorflow-2.10.1-cp39-cp39-win_amd64.whl", hash = "sha256:153111af1d773033264f8591f5deffece180a1f16935b579f43edd83acb17584", size = 455882176 },
]

This version does not have more files, this looks like a regression to me.

@charliermarsh
Copy link
Member Author

Thanks @konstin, will review.

@charliermarsh
Copy link
Member Author

I understand why we're forking on tensorflow==2.15.0.post1 and it roughly looks correct to do so (it only has Linux wheels), but I don't understand why it's then leading us to choose older Tensorflow versions.

@charliermarsh
Copy link
Member Author

On inspection, the transformers resolution is generally much worse.

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am definitely a little uneasy about this change. Just from more of a "theory" perspective, it does feel a like adding explicit knowledge of known platforms into the resolver logic is a bit worrisome. On the other hand, the way some packages are oriented does seem to demand this if we want a universal resolution, so I'm not sure I have much better ideas. (Other than the one you mentioned, which was always resolving for Windows/macOS/Linux.)

envs.push(env.narrow_environment(complement));

envs
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests for this would be great. (Although they sadly might be annoying to write given the ResolverEnvironment input.)

impl KnownMarkers {
/// Inserts the given [`MarkerTree`] for the given package name.
pub(crate) fn insert(&mut self, package_name: PackageName, marker_tree: MarkerTree) {
Arc::make_mut(&mut self.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bit of a footgun? If you have two different KnownMarkers values and call insert on one of them, it will only mutate that one and not the other. It feels a little odd because they start by referencing the same object, and by virtue of mutation, they end up referring to different objects.

I haven't read on yet to see how this is used, but I'm guessing this is intended behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm it almost seems like you might not be benefiting much from the Arc here?

Feel free to ignore me if you've already considered all of this. :)

@charliermarsh
Copy link
Member Author

We definitely can't merge this as-is. Transformers goes from Tried 12643 versions to Tried 80879 versions.

@charliermarsh
Copy link
Member Author

It turns out that the changes in Transformers are actually correct, and our existing resolution was worse. Specifically, tensorflow-text dropped Windows support in 2.10.0: https://github.com/tensorflow/text#a-note-about-different-operating-system-packages. Our previous version used 2.15.0 everywhere, which would fail to install on Windows due to a missing wheel...

@charliermarsh
Copy link
Member Author

charliermarsh commented Dec 17, 2024

We definitely can't merge this as-is. Transformers goes from Tried 12643 versions to Tried 80879 versions.

Ok, this is wrong. Refining that estimate, the total visited versions goes from 89,171 to 166,359 (or 129,212 if you omit Windows).

@charliermarsh
Copy link
Member Author

Okay, this solves a lot of user issues, so... I do think it's worth doing. At the same time, I don't love that it's going to lead to larger markers and more "expensive" resolutions in some cases.

I've also considered making this a setting (enabled by default) to allow for opt-out. It's a bit strange, though, since opt-out already exists in a sense: you can mark a platform as not necessary by excluding it from environments.

@charliermarsh
Copy link
Member Author

I was able to get rid of KnownMarkers entirely which is nice.

charliermarsh added a commit that referenced this pull request Dec 20, 2024
…ort (#10046)

## Summary

This is yet another variation on
#9928, with a few minor changes:

1. It only applies to local versions (e.g., `2.5.1+cpu`).
2. It only _considers_ the non-local version as an alternative (e.g.,
`2.5.1`).
3. It only _considers_ the non-local alternative if it _does_ support
the unsupported platform.
4. Instead of failing, it falls back to using the local version.

So, this is far less strict, and is effectively designed to solve
PyTorch but nothing else. It's also not user-configurable, except by way
of using `environments` to exclude platforms.
@charliermarsh
Copy link
Member Author

We merged #10046 instead for now. We may return to this later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working resolver Related to the package resolver
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't install torch_scatter wheel from index on OSX uv add pyqt5 error
5 participants