Source::{fuzzy_}query{_vec} can say "try again" #8985

Eh2406 · 2020-12-16T21:41:06Z

This adds Poll to the return type of Source. This is prep work for @jonhoo's #8890 as suggested by @alexcrichton.

There is definitely a lot of polishing work to do before merge, but it is good enough to start a conversation and see what CI has to say.

Some questions as it stands now:

CargoResult<Poll<_>> vs Poll<CargoResult<()>> neither is clearly more ergonomic to work with?
Is there some use case that makes query pull its weight, or should we only have the query_vec version?
What to do when we get a Poll::Pending? Currently we bizzy weight, but that is not grate. What can we do short of async and await.
There are a lot of places that want to "just weight till it is ready", should we find a way to have that code in only one or two places? Should we have a query_weight that is defaulted to call query in a loop?
Should we move some of the loops up the stack for better parallelism?
Should we move some of the loops down the stack to do less redundant work?
Can we break some of the caches up, so that there is more sharing between runs? (Yes, but maybe it can be left for later.)
There are 2 places where we want expect for Poll, should we start an extension trait to add a method?

rust-highfive · 2020-12-16T21:41:09Z

r? @alexcrichton

(rust-highfive has picked a reviewer for you, use r? to override)

alexcrichton

Awesome, thanks for this!

FWIW to get the full value out of this change I think we will want to remove Source::update. That should no longer be needed at all since sources will naturally report that they need to be updated if, during a query, they discover that blocking work needs to be done. This would involve refactoring the registry, for example, to report that if any unlocked dependency was queried and an update hasn't been done yet that an update needs to be done. The thinking then is that for the HTTP inded source it would be more fine-grained about whether a package is ready or not.

As to some of your questions:

CargoResult<Poll<_>> vs Poll<CargoResult<()>> neither is clearly more ergonomic to work with?

I like CargoResult<Poll<_>> because the usage of ? is still "obviously correct"

Is there some use case that makes query pull its weight, or should we only have the query_vec version?

This was added for performance reasons at some point in the past to avoid allocating lots of intermediate vectors. I'm not sure if this is still the case but it didn't seem like it was too onerous to handle here?

What to do when we get a Poll::Pending? Currently we bizzy weight, but that is not grate. What can we do short of async and await.

I wrote down a few comments here and there, but the main gist of what I'm thinking is that we go back to the source and say "hey you said pending earlier, please resolve that pending status right now in a blocking fashion".

There are a lot of places that want to "just weight till it is ready", should we find a way to have that code in only one or two places? Should we have a query_weight that is defaulted to call query in a loop?

I was thinking the same thing about possibly having a helper method on the trait which does the wait for you. I think it'd be fine to add that, although we would want to use it sparingly within Cargo.

Should we move some of the loops up the stack for better parallelism?

I only found one location (patches) that I think should get moved up, but other than that the loops seemed reasonable to me (to either add blocking or ignore the pending status).

Should we move some of the loops down the stack to do less redundant work?

I didn't find anything along these lines

Can we break some of the caches up, so that there is more sharing between runs? (Yes, but maybe it can be left for later.)

Yeah I was initially hoping we could preserve the entire resolver and benefit from lots of prepopulated non-pending caches, but that may not end up being the case. I don't think this is a huge worry though given the speed of the resolver and the typical time it takes to fetch something from the network.

There are 2 places where we want expect for Poll, should we start an extension trait to add a method?

Yeah seems fine to me!

alexcrichton · 2020-12-17T16:33:38Z

src/cargo/core/registry.rs

+                    })
+                    .collect::<Vec<_>>();
+            } else {
+                // TODO: dont hot loop for it to be Ready


FWIW I imagine this bottoms out in something like self.wait_for_sources_to_be_ready(). Each source would already be registered in some internal map of PackageRegistry if we're blocked on it, and then we'd ask each source, in sequence, "do your blocking thing now".

alexcrichton · 2020-12-17T16:34:38Z

src/cargo/core/registry.rs

-                                source.query(dep, callback)?;
+                                source.query(dep, callback)?
+                            };
+                            if pend.is_pending() {


This is where I'd imagine that a record of this dependency's source id is recorded in an internal map for us to later iterate over and ask the source to block on things.

src/cargo/core/registry.rs

alexcrichton · 2020-12-17T16:38:20Z

src/cargo/core/resolver/dep_cache.rs

@@ -32,7 +33,7 @@ pub struct RegistryQueryer<'a> {
    /// specify minimum dependency versions to be used.
    minimal_versions: bool,
    /// a cache of `Candidate`s that fulfil a `Dependency`
-    registry_cache: HashMap<Dependency, Rc<Vec<Summary>>>,
+    registry_cache: HashMap<Dependency, Poll<Rc<Vec<Summary>>>>,


I was a bit surprised by this where it's caching not ready signals. I was hoping that we could reuse the resolver across iterations perhaps and benefit from mostly prepopulated caches?

src/cargo/core/resolver/dep_cache.rs

src/cargo/core/resolver/errors.rs

alexcrichton · 2020-12-17T16:41:36Z

src/cargo/core/resolver/mod.rs

+        if registry.all_ready() {
+            break (registry, cx);
+        } else {
+            // TODO: dont hot loop for it to be Ready


FWIW I'm imagining a new method on Registry which is something like "block on things" which is called here.

alexcrichton · 2020-12-17T16:42:47Z

src/cargo/ops/common_for_install_and_uninstall.rs

+                break deps;
+            }
+            Poll::Pending => {
+                // TODO: dont hot loop for it to be Ready


I forget the context in which this function is called, but this could call the hypothetical new Source::block_on_stuff method

jonhoo · 2020-12-17T18:56:14Z

I don't have too much to add to this except that RegistryData::load must also return Poll to enable #8890. I also agree with Alex that we probably want busy waiting to be replaced with a call to a "wait for ready" method on Source.

alexcrichton · 2020-12-18T16:12:33Z

@jonhoo yeah I'm imagining that after this PR we'll want to refactor the internal index implementation of the registry like you mention. We'll want to tweak the current implementations as well to match the new interface, and the http index should then fit quite cleanly into the implementation.

Eh2406 · 2020-12-18T17:52:39Z

I just added a commit to push the Poll all the way to RegistryData::load. It added more ugly loops. I wanted to get all of the ugly on the table before starting to take your suggestions for how to clean them up.

jonhoo · 2020-12-18T18:16:07Z

src/cargo/sources/registry/index.rs

-            .summaries(pkg.name(), &req, load)?
-            .any(|summary| summary.yanked);
-        Ok(found)
+        loop {


Should this just also be modified to return Poll<bool>?

One level is straightforward and worth it. Continuing up the stack Source::is_yanked; I am not seeing where we would want to call a lot of it in parallel nor where it ends in an existing loop. Seams like it is called where there is not an opportunity for parallelism (install code) or where we have already updated that pkg (writing a lockfile). What have I missed?

src/cargo/sources/registry/mod.rs

jonhoo · 2020-12-18T18:25:44Z

I think my general thinking here is that we should just continue to propagate Poll up the stack and deal with it "near the top"

src/cargo/sources/registry/mod.rs

alexcrichton · 2020-12-18T21:12:35Z

src/cargo/sources/registry/remote.rs

+                    break;
+                }
+                Poll::Pending => {
+                    // TODO: dont hot loop for it to be Ready


I think this will want the ability to return up Poll

Happy to do the work, I am just not seeing where up the stack we would want to call this in parallel with other work? What have I missed?

Ah that's true, although I think we'll want special handling for this since I don't think we want to add a round-trip time to get this file all the time in Cargo. We'd presumably want to cache this file separately for a fixed length of time and while in that window of time Cargo doesn't re-fetch the file.

bors · 2021-02-03T16:05:27Z

☔ The latest upstream changes (presumably #9125) made this pull request unmergeable. Please resolve the merge conflicts.

bors · 2021-02-10T00:21:05Z

☔ The latest upstream changes (presumably #9133) made this pull request unmergeable. Please resolve the merge conflicts.

samanpa · 2021-08-11T23:59:42Z

Any updates on this? I keep hoping that this will be unblocked so #8890 will become a reality.

Eh2406 · 2021-08-12T01:36:13Z

That is my hope as well. My plan was to get back to this, but I have been spending my time understanding the design space around the Cargo Auth RFC. When that is done, I will get back to this. (Best laid plans and all, but I will try) The Cargo team may close this PR as stale in the meantime, but it will still be "next" on my todo list.

samanpa · 2021-08-12T10:41:58Z

That is my hope as well. My plan was to get back to this, but I have been spending my time understanding the design space around the Cargo Auth RFC. When that is done, I will get back to this. (Best laid plans and all, but I will try) The Cargo team may close this PR as stale in the meantime, but it will still be "next" on my todo list.

Thank you.

Registry functions return Poll to enable parallel fetching of index data Adds `Poll` as a return type for several registry functions to enable parallel fetching of crate metadata with a future http-based registry. Work is scheduled by calling the `query` and related functions, then waited on with `block_until_ready`. This PR is based on the draft PR started by eh2406 here [#8985](#8985). r? `@Eh2406` cc `@alexcrichton` cc `@jonhoo`

arlosi · 2022-03-09T22:53:44Z

This can be closed since it landed via #10064!

rust-highfive assigned alexcrichton Dec 16, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 16, 2020

Eh2406 force-pushed the resolve_repeatedly_poll branch from d2d905c to b9e814d Compare December 16, 2020 22:04

alexcrichton reviewed Dec 17, 2020

View reviewed changes

jonhoo reviewed Dec 18, 2020

View reviewed changes

src/cargo/sources/registry/mod.rs Outdated Show resolved Hide resolved

jonhoo reviewed Dec 18, 2020

View reviewed changes

src/cargo/sources/registry/mod.rs Outdated Show resolved Hide resolved

alexcrichton reviewed Dec 18, 2020

View reviewed changes

src/cargo/sources/registry/mod.rs Outdated Show resolved Hide resolved

alexcrichton reviewed Dec 18, 2020

View reviewed changes

Eh2406 force-pushed the resolve_repeatedly_poll branch from 405d92b to 4c13c26 Compare December 21, 2020 16:30

jonhoo mentioned this pull request Jan 6, 2021

Implement experimental registry HTTP API from RFC #8890

Closed

Eh2406 force-pushed the resolve_repeatedly_poll branch from b2d3468 to f6e9f8c Compare January 17, 2021 02:02

Eh2406 added 9 commits February 3, 2021 13:48

Source::{fuzzy_}query{_vec} can say "try again"

466e719

RegistryData::load can say "try again"

c354199

pull the loop up out of summary_for_patch

ea37a74

reuse the resolvers cache

fdf08eb

don't retry just for an error message

240cb5f

move Poll one layer up in is_yanked

08f96cd

just assert when downloading

33190fa

add a PollExt so we can use expect instead of match

1d9e675

I missed one unneeded loop.

44012fd

Eh2406 force-pushed the resolve_repeatedly_poll branch from f6e9f8c to 44012fd Compare February 3, 2021 18:50

jonhoo mentioned this pull request Apr 25, 2021

Add authorization jonhoo/cargo#1

Closed

arlosi mentioned this pull request Nov 9, 2021

Registry functions return Poll to enable parallel fetching of index data #10064

Merged

alexcrichton removed their assignment Mar 8, 2022

Eh2406 closed this Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source::{fuzzy_}query{_vec} can say "try again" #8985

Source::{fuzzy_}query{_vec} can say "try again" #8985

Eh2406 commented Dec 16, 2020

rust-highfive commented Dec 16, 2020

alexcrichton left a comment

alexcrichton Dec 17, 2020

alexcrichton Dec 17, 2020

alexcrichton Dec 17, 2020

alexcrichton Dec 17, 2020

alexcrichton Dec 17, 2020

jonhoo commented Dec 17, 2020

alexcrichton commented Dec 18, 2020

Eh2406 commented Dec 18, 2020

jonhoo Dec 18, 2020

Eh2406 Dec 21, 2020

jonhoo commented Dec 18, 2020

alexcrichton Dec 18, 2020

Eh2406 Dec 21, 2020

alexcrichton Jan 4, 2021

bors commented Feb 3, 2021

bors commented Feb 10, 2021

samanpa commented Aug 11, 2021

Eh2406 commented Aug 12, 2021

samanpa commented Aug 12, 2021

arlosi commented Mar 9, 2022

Source::{fuzzy_}query{_vec} can say "try again" #8985

Source::{fuzzy_}query{_vec} can say "try again" #8985

Conversation

Eh2406 commented Dec 16, 2020

rust-highfive commented Dec 16, 2020

alexcrichton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonhoo commented Dec 17, 2020

alexcrichton commented Dec 18, 2020

Eh2406 commented Dec 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonhoo commented Dec 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bors commented Feb 3, 2021

bors commented Feb 10, 2021

samanpa commented Aug 11, 2021

Eh2406 commented Aug 12, 2021

samanpa commented Aug 12, 2021

arlosi commented Mar 9, 2022