-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abort crate resolution if too many candidates have been tried #4066
Comments
I also see this with https://github.com/koute/cargo-web This is a new development that I have not seen before and it has caused cargobomb to stop working because it hangs when building the lockfile for that crate. |
cc @koute |
I'm not sure on how this behavior would have begun to appear on stable between the last cargobomb and this cargobomb run. Both should have run the generate-lockfiles phase using the same stable compiler. Hm... |
Oh cargobomb should kill this process after a lengthy timeout. I probably just need to wait 10 minutes or so. So maybe cargobomb is still working and this is not a new behavior for cargo. |
cargobomb doesn't seem to be timing out. Just hanging forever. |
@brson does this reproduce on cargo-web? O rwas this related to something else? |
@alexcrichton If you try to do It looks like upgrading hyper-rustls = "0.3" into: hyper-rustls = "0.3.3" it also fixes the hanging and instead spews out this error:
I've already pushed a fix to master, but in case you'd want to reproduce this issue again you can use commit |
@koute I think that's an example of an un-resolvable resolution graph again as well (what this issue is). All versions of |
Is this related? |
@drandreaskrueger ah yeah unfortunately that's this issue :( |
thx, @alexcrichton ... 6 hours now ;-) |
Just chiming in here a little bit -- FB has been hitting this problem for a couple of months now. This prevents us from updating our pseudocrate which contains all our third-party dependencies. Has there been any discussion around using a SAT or SMT solver like Z3 for dependency resolution? I found #2064 which has some brief discussion, but the only substantive thing there appears to be Graydon's test which shows how to offload semver optimization to Z3. (I generally believe that people should be reaching for SAT solvers far more often than they do today.) |
@Sid0 there's been discussion of "we should probably do it" but no serious design of how it would actually be done yet. Also to be clear, this issue is primarily about "this crate graph is not resolvable, and Cargo isn't quickly telling you that". In more cases than not we've found today that if there's a resolvable graph Cargo will reach it quickly-ish. Not true for all cases, but true for most. We likely want a different issue for "this crate graph can be resolved can Cargo can take forever to conclude that" |
Hmm, interesting. Is it possible for a Cargo.toml that just has a bunch of |
IIRC last I investigate an apparent deadlock w/ y'alls Cargo.toml it was a case of "cargo takes too long to reach the right solution" rather than "there is no solution", so you'd fall in the category of "we'd benefit from a SAT solver" for sure! (my memory may be hazy though) |
Yeah -- that matches my understanding as well. FWIW I've had a |
Heh you may have more luck with ctrl-c :) (in that each time Cargo resolves it may take different paths, so new resolutions may complete quicker). Note though that the crate graph may still be unresolvable for whatever reason, it's not guaranteed to be resolvable even with |
Summary: Finally got an update working by removing the `mysql_async` crate. Some notes: * The `mysql_async` crate was responsible in this case: see rust-lang/cargo#4066 (comment) for why. * tokio/futures deprecated a bunch of stuff. I've filed a TODO for now. * We finally pulled in error-chain 0.11, which has a bunch of nice improvements. Reviewed By: kulshrax Differential Revision: D5798282 fbshipit-source-id: a38a7b17ee0205428e2ea63334722aa408582493
You're 100% correct! Right now the thinking is that we're "somewhat in the middle" between the npm strategy of duplicate everything and the rubygems strategy of "only one version of anything". That is, we were initially wary to go the npm route of allowing duplicates everywhere because of binary size blowup concerns, but we wanted to be more flexible than rubygems by allowing multiple version to coexist. Additionally, though, up to this point we haven't actually had the knowledge of public/private dependencies. My hope is that as soon as we have public/private dependencies we can lift the restriction here and allow semver-compatible duplicates in a dependency graph, so long as the public/shared types all have the same version number.
Yeah I don't think we've done a great job discouraging usage of the |
Natalie Weizenbaum talked about how they intend to solve this same problem in the Dart package manager on a recent episode of The Manifest. I'll drop a link if anyone is interested: https://manifest.fm/5. It sound like it is in early research stages, but boils down to using a version of DPLL. |
Hopefully the solution to this issue will be concurrent. It can take ages and also uses only one of my many cores! Is it taunting me? :) EDIT: happening while building |
This I think would just involve changing: cargo/src/cargo/core/resolver/mod.rs Line 684 in 0be926d
to big an argument up to the user.
I am still learning cargo internals, but I would be willing to help someone who wanted to take this on! |
Faster resolver: Cache past conflicting_activations, prevent doing the same work repeatedly. This work is inspired by @alexcrichton's [comment](#4066 (comment)) that a slow resolver can be caused by all versions of a dependency being yanked. Witch stuck in my brain as I did not understand why it would happen. If a dependency has no candidates then it will be the most constrained and will trigger backtracking in the next tick. Eventually I found a reproducible test case. If the bad dependency is deep in the tree of dependencies then we activate and backtrack `O(versions^depth)` times. Even tho it is fast to identify the problem that is a lot of work. **The set up:** 1. Every time we backtrack cache the (dep, `conflicting_activations`). 2. Build on the work in #5000, Fail to activate if any of its dependencies will just backtrack to this frame. I.E. for each dependency check if any of its cached `conflicting_activations` are already all activated. If so we can just skip to the next candidate. We also add that bad `conflicting_activations` to our set of `conflicting_activations`, so that we can... **The pay off:** If we fail to find any candidates that we can activate in lite of 2, then we cannot be activated in this context, add our (dep, `conflicting_activations`) to the cache so that next time our parent will not bother trying us. I hear you saying "but the error messages, what about the error messages?" So if we are at the end `!has_another` then we disable this optimization. After we mark our dep as being not activatable then we activate anyway. It won't resolve but it will have the same error message as before this PR. If we have been activated for the error messages then skip straight to the last candidate, as that is the only backtrack that will end with the user. I added a test in the vain of #4834. With the old code the time to run was `O(BRANCHING_FACTOR ^ DEPTH)` and took ~3min with DEPTH = 10; BRANCHING_FACTOR = 5; with the new code it runs almost instantly with 200 and 100.
@dwijnand want to pick this one off next :-) |
Alright, I'll have a go. |
Ok, I've tried about 4-5 different reproduction in or linked to here and everything's either succeeded or failed fast. 😄 Probably due to the related fixed that have landed. Does anyone have a reproduction that fails with recent cargo? |
Sorry, I should have clarified that. I don't know of any that are currently slow. I think I have gone thru every bug report to look for them, I was unable to reproduce or have fixed the resolver for each. (Mostly by reading up on suggestions made in hear.) Then I spent several days fuzzing to find them, nothing over 40 second and then only because I disable one of the Fixes by accident. So at this point anything that takes more then say a minute, and more than 5_000_000 ticks is something I would like to see! So I'd like there to be cargo to stop and report "It is always possible that this is valid but taking a long time, but It is probably a bug so please report it". The reason for the message is that legitimately I want to know and fix if anyone hits a new bad case in use. |
Ah, right, I see! I'll have a look at those tests and try and work off those. Thanks for the info. |
#5921 added something like this with hard coded stops that is using debug_asserts. So this would now involve making them configurable and return an error. |
We should have a hard limit for the number of steps the resolution algorithm can take before it just hard aborts and exits saying "this probably isn't gonna work".
Right now the crate resolution algorithm will only return an error if it proves that literally every possible crate resolution graph is not workable. This can take quite a long time for some larger graphs! Ideally we'd also abort with nice error messages "these tended to conflict a lot" etc, but at the very least Cargo shouldn't look like it's infinite looping.
The text was updated successfully, but these errors were encountered: