Abort crate resolution if too many candidates have been tried #4066

alexcrichton · 2017-05-17T15:07:31Z

We should have a hard limit for the number of steps the resolution algorithm can take before it just hard aborts and exits saying "this probably isn't gonna work".

Right now the crate resolution algorithm will only return an error if it proves that literally every possible crate resolution graph is not workable. This can take quite a long time for some larger graphs! Ideally we'd also abort with nice error messages "these tended to conflict a lot" etc, but at the very least Cargo shouldn't look like it's infinite looping.

brson · 2017-05-22T21:04:35Z

I also see this with https://github.com/koute/cargo-web

This is a new development that I have not seen before and it has caused cargobomb to stop working because it hangs when building the lockfile for that crate.

brson · 2017-05-22T21:09:34Z

cc @koute

brson · 2017-05-22T21:10:55Z

I'm not sure on how this behavior would have begun to appear on stable between the last cargobomb and this cargobomb run. Both should have run the generate-lockfiles phase using the same stable compiler.

Hm...

brson · 2017-05-22T21:14:18Z

Oh cargobomb should kill this process after a lengthy timeout. I probably just need to wait 10 minutes or so. So maybe cargobomb is still working and this is not a new behavior for cargo.

brson · 2017-05-22T22:28:21Z

cargobomb doesn't seem to be timing out. Just hanging forever.

alexcrichton · 2017-05-23T14:27:35Z

@brson does this reproduce on cargo-web? O rwas this related to something else?

koute · 2017-05-23T16:05:26Z

@alexcrichton If you try to do cargo build in cargo-web it just hangs indefinitely.

It looks like upgrading hyper-rustls from 0.3 to at least 0.5 fixes the issue. (It starts to compile properly again.) Also, if in my Cargo.toml I change:

hyper-rustls = "0.3"

into:

hyper-rustls = "0.3.3"

it also fixes the hanging and instead spews out this error:

error: no matching version `^0.10` found for package `webpki` (required by `rustls`)
location searched: registry https://github.com/rust-lang/crates.io-index
versions found: 0.12.1

I've already pushed a fix to master, but in case you'd want to reproduce this issue again you can use commit 3b17eff93402bfd160ee944932aedb45298a0421.

alexcrichton · 2017-05-25T03:28:51Z

@koute I think that's an example of an un-resolvable resolution graph again as well (what this issue is). All versions of webpki but the most recent are yanked, so if you need to update a dependency on webpki there's nothing to choose from.

drandreaskrueger · 2017-06-07T09:03:01Z

Is this related? cargo install parity needs 4 hours ... to find out it can't.

alexcrichton · 2017-06-07T15:04:17Z

@drandreaskrueger ah yeah unfortunately that's this issue :(

drandreaskrueger · 2017-06-07T16:12:58Z

thx, @alexcrichton ... 6 hours now ;-)

sunshowers · 2017-09-08T17:59:14Z

Just chiming in here a little bit -- FB has been hitting this problem for a couple of months now. This prevents us from updating our pseudocrate which contains all our third-party dependencies.

Has there been any discussion around using a SAT or SMT solver like Z3 for dependency resolution? I found #2064 which has some brief discussion, but the only substantive thing there appears to be Graydon's test which shows how to offload semver optimization to Z3.

(I generally believe that people should be reaching for SAT solvers far more often than they do today.)

alexcrichton · 2017-09-08T18:40:35Z

@Sid0 there's been discussion of "we should probably do it" but no serious design of how it would actually be done yet.

Also to be clear, this issue is primarily about "this crate graph is not resolvable, and Cargo isn't quickly telling you that". In more cases than not we've found today that if there's a resolvable graph Cargo will reach it quickly-ish. Not true for all cases, but true for most. We likely want a different issue for "this crate graph can be resolved can Cargo can take forever to conclude that"

sunshowers · 2017-09-08T18:44:16Z

Hmm, interesting. Is it possible for a Cargo.toml that just has a bunch of * dependencies like in https://gist.github.com/sid0/064738981955029054d74de50c90c49f to lead to an unresolvable graph?

alexcrichton · 2017-09-08T20:17:33Z

IIRC last I investigate an apparent deadlock w/ y'alls Cargo.toml it was a case of "cargo takes too long to reach the right solution" rather than "there is no solution", so you'd fall in the category of "we'd benefit from a SAT solver" for sure! (my memory may be hazy though)

sunshowers · 2017-09-08T20:19:54Z

Yeah -- that matches my understanding as well.

FWIW I've had a cargo update running on our Cargo.toml for the last 90 minutes or so. Still hasn't completed :)

alexcrichton · 2017-09-08T20:22:21Z

Heh you may have more luck with ctrl-c :) (in that each time Cargo resolves it may take different paths, so new resolutions may complete quicker).

Note though that the crate graph may still be unresolvable for whatever reason, it's not guaranteed to be resolvable even with * dependencies.

Summary: Finally got an update working by removing the `mysql_async` crate. Some notes: * The `mysql_async` crate was responsible in this case: see rust-lang/cargo#4066 (comment) for why. * tokio/futures deprecated a bunch of stuff. I've filed a TODO for now. * We finally pulled in error-chain 0.11, which has a bunch of nice improvements. Reviewed By: kulshrax Differential Revision: D5798282 fbshipit-source-id: a38a7b17ee0205428e2ea63334722aa408582493

alexcrichton · 2017-09-09T19:07:00Z

why do we not pull in multiple semver-compatible versions of a crate if necessary? I think that if it's a private dependency (as defined in #2064) it should be safe.

You're 100% correct! Right now the thinking is that we're "somewhat in the middle" between the npm strategy of duplicate everything and the rubygems strategy of "only one version of anything". That is, we were initially wary to go the npm route of allowing duplicates everywhere because of binary size blowup concerns, but we wanted to be more flexible than rubygems by allowing multiple version to coexist.

Additionally, though, up to this point we haven't actually had the knowledge of public/private dependencies. My hope is that as soon as we have public/private dependencies we can lift the restriction here and allow semver-compatible duplicates in a dependency graph, so long as the public/shared types all have the same version number.

This is an ecosystem change that would need to be carefully communicated and managed, but hopefully it will make upstreams more careful and the crates.io ecosystem healthier overall.

Yeah I don't think we've done a great job discouraging usage of the ~ operator and encouraging "semver compatible", I think we can definitely do better here!

jacwah · 2017-10-21T12:45:16Z

Natalie Weizenbaum talked about how they intend to solve this same problem in the Dart package manager on a recent episode of The Manifest. I'll drop a link if anyone is interested: https://manifest.fm/5. It sound like it is in early research stages, but boils down to using a version of DPLL.

fenollp · 2017-12-06T01:29:59Z

Hopefully the solution to this issue will be concurrent. It can take ages and also uses only one of my many cores! Is it taunting me? :)

EDIT: happening while building git@github.com:gnunicorn/clippy-service.git with cargo 0.24.0-nightly (5bb478a51 2017-11-29).
EDIT: no progress after 59 hours (haha)

Eh2406 · 2018-02-26T16:41:27Z

This I think would just involve changing:

cargo/src/cargo/core/resolver/mod.rs

Line 684 in 0be926d

if config.shell().is_err_tty() && !printed && ticks % 1000 == 0

to bail if ticks gets to big. Extra credit to make to big an argument up to the user.

I am still learning cargo internals, but I would be willing to help someone who wanted to take this on!

@alexcrichton

Faster resolver: Cache past conflicting_activations, prevent doing the same work repeatedly. This work is inspired by @alexcrichton's [comment](#4066 (comment)) that a slow resolver can be caused by all versions of a dependency being yanked. Witch stuck in my brain as I did not understand why it would happen. If a dependency has no candidates then it will be the most constrained and will trigger backtracking in the next tick. Eventually I found a reproducible test case. If the bad dependency is deep in the tree of dependencies then we activate and backtrack `O(versions^depth)` times. Even tho it is fast to identify the problem that is a lot of work. **The set up:** 1. Every time we backtrack cache the (dep, `conflicting_activations`). 2. Build on the work in #5000, Fail to activate if any of its dependencies will just backtrack to this frame. I.E. for each dependency check if any of its cached `conflicting_activations` are already all activated. If so we can just skip to the next candidate. We also add that bad `conflicting_activations` to our set of `conflicting_activations`, so that we can... **The pay off:** If we fail to find any candidates that we can activate in lite of 2, then we cannot be activated in this context, add our (dep, `conflicting_activations`) to the cache so that next time our parent will not bother trying us. I hear you saying "but the error messages, what about the error messages?" So if we are at the end `!has_another` then we disable this optimization. After we mark our dep as being not activatable then we activate anyway. It won't resolve but it will have the same error message as before this PR. If we have been activated for the error messages then skip straight to the last candidate, as that is the only backtrack that will end with the user. I added a test in the vain of #4834. With the old code the time to run was `O(BRANCHING_FACTOR ^ DEPTH)` and took ~3min with DEPTH = 10; BRANCHING_FACTOR = 5; with the new code it runs almost instantly with 200 and 100.

Eh2406 · 2018-07-26T17:05:51Z

@dwijnand want to pick this one off next :-)

dwijnand · 2018-07-26T18:40:44Z

Alright, I'll have a go.

dwijnand · 2018-07-27T08:09:06Z

Ok, I've tried about 4-5 different reproduction in or linked to here and everything's either succeeded or failed fast. 😄 Probably due to the related fixed that have landed.

Does anyone have a reproduction that fails with recent cargo?

Eh2406 · 2018-07-27T13:56:22Z

Sorry, I should have clarified that. I don't know of any that are currently slow. I think I have gone thru every bug report to look for them, I was unable to reproduce or have fixed the resolver for each. (Mostly by reading up on suggestions made in hear.) Then I spent several days fuzzing to find them, nothing over 40 second and then only because I disable one of the Fixes by accident.

So at this point anything that takes more then say a minute, and more than 5_000_000 ticks is something I would like to see! So I'd like there to be cargo to stop and report "It is always possible that this is valid but taking a long time, but It is probably a bug so please report it".

The reason for the message is that legitimately I want to know and fix if anyone hits a new bad case in use.
The reason to stop is that it will probably come up in a CI/Build Server type context and I want people to see it. Including the tests for the fixes I have made, currently if you comment out any of the fixes then the test sweet hangs, it would be better for it to error.

dwijnand · 2018-07-27T14:07:52Z

Ah, right, I see! I'll have a look at those tests and try and work off those. Thanks for the info.

Eh2406 · 2018-10-02T21:56:17Z

#5921 added something like this with hard coded stops that is using debug_asserts. So this would now involve making them configurable and return an error.

This was referenced May 17, 2017

Unify tools building rust-lang/rust#41639

Merged

cargo update hangs forever on stdx, using explicit equality dependencies #4080

Closed

This was referenced May 19, 2017

Compatibility policy vs crate graph resolution in Cargo clap-rs/clap#964

Closed

cargo update hangs in cargo::core::resolver::activate_deps_loop #4108

Closed

tomprince mentioned this issue Jun 3, 2017

Move all cargo invocations into docker. rust-lang/crater#76

Merged

drandreaskrueger mentioned this issue Jun 7, 2017

no matching package named ring found (required by multihash) version required: ~0.6.2 versions found: 0.9.7, 0.9.6, 0.9.5, ... openethereum/parity-ethereum#5783

Closed

tomprince mentioned this issue Jun 8, 2017

generate-lockfiles without updating registry #3479

Closed

This was referenced Jul 18, 2017

Occasional hangs at updating registry #2090

Closed

Unresolvable graphs don't give clear error messages #4322

Closed

alexcrichton mentioned this issue Aug 1, 2017

cargo build hangs forever on updating registry. #4346

Closed

alexcrichton mentioned this issue Aug 21, 2017

cargo update silently fails to update dependencies #4420

Closed

alexcrichton mentioned this issue Sep 5, 2017

Panic - Attempt to subtract with overflow #4460

Closed

pgerber mentioned this issue Sep 6, 2017

Cargo update hangs with a specific configuration of dependency crates #4474

Closed

This was referenced Sep 13, 2017

Arbitrarily long time spent in resolution #4488

Closed

cargo build stuck on "Updating registry" in a repository with old Cargo.lock #4502

Closed

alexcrichton assigned aturon Sep 18, 2017

carols10cents mentioned this issue Sep 26, 2017

cargo update hangs #2975

Closed

carols10cents added A-dependency-resolution Area: dependency resolution and the resolver A-diagnostics Area: Error and warning messages generated by Cargo itself. C-bug Category: bug labels Sep 26, 2017

This was referenced Oct 10, 2017

Cargo stuck for hours, taking 100% of the CPU #4604

Closed

cargo build loops for ever #4612

Closed

alexcrichton mentioned this issue Dec 13, 2017

Infinite cycle in depgraph resolution #4810

Closed

alexcrichton mentioned this issue Jan 29, 2018

Only allow one of each links attribute in resolver #4978

Merged

Eh2406 mentioned this issue Feb 5, 2018

backtrack if can not activate #5000

Merged

Eh2406 mentioned this issue Mar 12, 2018

Faster resolver: Cache past conflicting_activations, prevent doing the same work repeatedly. #5168

Merged

alexcrichton unassigned aturon Mar 17, 2020

pradyunsg mentioned this issue Dec 5, 2020

New resolver takes a very long time to complete pypa/pip#9187

Closed

DavePearce mentioned this issue Jan 10, 2022

Improved Package Resolution Algorithm Whiley/WhileyBuildTool#16

Open

ehuss mentioned this issue Dec 10, 2022

Cargo hangs at "Resolving dependency graph..." #11454

Open

epage added the S-triage Status: This issue is waiting on initial triage. label Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abort crate resolution if too many candidates have been tried #4066

Abort crate resolution if too many candidates have been tried #4066

alexcrichton commented May 17, 2017

brson commented May 22, 2017 •

edited

Loading

brson commented May 22, 2017

brson commented May 22, 2017

brson commented May 22, 2017 •

edited

Loading

brson commented May 22, 2017

alexcrichton commented May 23, 2017

koute commented May 23, 2017

alexcrichton commented May 25, 2017

drandreaskrueger commented Jun 7, 2017

alexcrichton commented Jun 7, 2017

drandreaskrueger commented Jun 7, 2017

sunshowers commented Sep 8, 2017 •

edited

Loading

alexcrichton commented Sep 8, 2017

sunshowers commented Sep 8, 2017

alexcrichton commented Sep 8, 2017

sunshowers commented Sep 8, 2017 •

edited

Loading

alexcrichton commented Sep 8, 2017

alexcrichton commented Sep 9, 2017

jacwah commented Oct 21, 2017

fenollp commented Dec 6, 2017 •

edited

Loading

Eh2406 commented Feb 26, 2018

Eh2406 commented Jul 26, 2018

dwijnand commented Jul 26, 2018

dwijnand commented Jul 27, 2018

Eh2406 commented Jul 27, 2018

dwijnand commented Jul 27, 2018

Eh2406 commented Oct 2, 2018

Abort crate resolution if too many candidates have been tried #4066

Abort crate resolution if too many candidates have been tried #4066

Comments

alexcrichton commented May 17, 2017

brson commented May 22, 2017 • edited Loading

brson commented May 22, 2017

brson commented May 22, 2017

brson commented May 22, 2017 • edited Loading

brson commented May 22, 2017

alexcrichton commented May 23, 2017

koute commented May 23, 2017

alexcrichton commented May 25, 2017

drandreaskrueger commented Jun 7, 2017

alexcrichton commented Jun 7, 2017

drandreaskrueger commented Jun 7, 2017

sunshowers commented Sep 8, 2017 • edited Loading

alexcrichton commented Sep 8, 2017

sunshowers commented Sep 8, 2017

alexcrichton commented Sep 8, 2017

sunshowers commented Sep 8, 2017 • edited Loading

alexcrichton commented Sep 8, 2017

alexcrichton commented Sep 9, 2017

jacwah commented Oct 21, 2017

fenollp commented Dec 6, 2017 • edited Loading

Eh2406 commented Feb 26, 2018

Eh2406 commented Jul 26, 2018

dwijnand commented Jul 26, 2018

dwijnand commented Jul 27, 2018

Eh2406 commented Jul 27, 2018

dwijnand commented Jul 27, 2018

Eh2406 commented Oct 2, 2018

brson commented May 22, 2017 •

edited

Loading

brson commented May 22, 2017 •

edited

Loading

sunshowers commented Sep 8, 2017 •

edited

Loading

sunshowers commented Sep 8, 2017 •

edited

Loading

fenollp commented Dec 6, 2017 •

edited

Loading