Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More insight into Kademlia queries. #1567

Merged
merged 9 commits into from
May 16, 2020
Merged

Conversation

romanb
Copy link
Contributor

@romanb romanb commented May 7, 2020

This is an extended proposal for providing more insight into and control over Kademlia queries, based on earlier discussions at #1494.

  • More insight: The API allows iterating over the active queries and inspecting their state and execution statistics (Kademlia::iter_queries via QueryRef and Kademlia::iter_queries_mut via QueryMut). Both QueryRef and QueryMut give access to the QueryInfo and QueryStats. I went a bit back-and-forth about whether to expose QueryInfo or not but in the end decided to do so. Any query state that is purely internal can still kept in the internal QueryInner, which is not exposed. Whenever a query finishes, successfully or with an error, execution statistics are also reported in the KademliaEvent. The QueryStats include: Number of requests initiated, number of successful requests, number of failed requests, as well as query duration.

  • More control: The API allows aborting queries prematurely at any time: Kademlia::query_mut(id) yields a QueryMut that provides QueryMut::finish(). This functionality existed before internally, but was not exposed.

To that end, API operations that initiate new queries return the query ID and multi-phase queries such as put_record retain the query ID across all phases, each phase being executed by a new (internal) query.

Tangentially, the behaviour of Kademlia::bootstrap has changed (for the better): Previously, once the initial self-lookup finished, queries to refresh all buckets beyond the first non-empty bucket would be initiated all at the same time, now they are initiated one after the other, which is more sane. The bootstrap events reported via a KademliaEvent also report the remaining number of bootstrap queries. When that counter reaches 0, bootstrapping is complete. This was not really detectable before.

Roman S. Borschel added 3 commits May 7, 2020 16:11
More insight: The API allows iterating over the active queries and
inspecting their state and execution statistics.

More control: The API allows aborting queries prematurely
at any time.

To that end, API operations that initiate new queries return the query ID
and multi-phase queries such as `put_record` retain the query ID across all
phases, each phase being executed by a new (internal) query.
@romanb romanb requested review from mxinden and tomaka May 7, 2020 16:08
protocols/kad/src/behaviour.rs Outdated Show resolved Hide resolved
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch for this patch set! Among other things this will help a lot understanding the impact of #1473. I will deploy this pull request to kademlia-exporter.max-inden.de so we can test it out.

Woud you mind mentioning the changes of this pull request in the changelog?

protocols/kad/src/behaviour.rs Outdated Show resolved Hide resolved
protocols/kad/src/behaviour.rs Outdated Show resolved Hide resolved
@@ -188,7 +199,7 @@ fn handle_input_line(kademlia: &mut Kademlia<MemoryStore>, line: String) {
publisher: None,
expires: None,
};
kademlia.put_record(record, Quorum::One);
kademlia.put_record(record, Quorum::One).expect("Failed to store record locally.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of leaving the interpretation of the error up to the user (in this case the expect string, what do you think of having RecordStore::put return a proper error? Not saying this should happen within this pull request. I am happy to do that as a follow up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interpretation isn't up to the user, the error type is a store::Error.

protocols/kad/src/behaviour.rs Show resolved Hide resolved
protocols/kad/src/behaviour.rs Outdated Show resolved Hide resolved
protocols/kad/src/behaviour.rs Show resolved Hide resolved
@@ -732,72 +770,114 @@ where
target = kbucket::Key::new(PeerId::random());
}
target
}).collect::<Vec<_>>();
}).collect::<Vec<_>>().into_iter();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this iterator lazy delaying the heavy computations to the time they are needed.

Copy link
Contributor Author

@romanb romanb May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you have in mind? Note the following:

  • The absolute maximum size of this vector is 254 elements, but this is extremely unlikely (probabilistically). The realistic sizes are ~1-10 elements (iterations).
  • There are no heavy computations (at least I don't consider hashing, xor and sampling bytes from a PRNG to be heavy computations).
  • We need to know the length of this vector / iterator immediately below, so if we wanted to leave it as an iterator, it must be an ExactSizeIterator.
  • remaining is put into the QueryInfo for which I'd like to avoid adding type parameters.

Nevertheless, please let me know if you have a concrete approach in mind that is not invasive, maybe I'm missing something!

protocols/kad/src/query.rs Show resolved Hide resolved
protocols/kad/src/query.rs Outdated Show resolved Hide resolved
@mxinden
Copy link
Member

mxinden commented May 8, 2020

I have updated the Kademlia exporter with this patch and made it expose the new statistics as metrics. In addition I added a new graph (Number of hops per query) to the two (all-dhts, specific dht) dashboards at kademlia-exporter.max-inden.de.

(It ignores num_pending for now. In addition with the bug above in mind, one should ignore the num_failures stat for now.)

@romanb
Copy link
Contributor Author

romanb commented May 8, 2020

Woud you mind mentioning the changes of this pull request in the changelog?

I will update the changelog once the PR has passed review just before merging. That way I don't need to adjust the changelog together with ongoing changes resulting from reviews. Just a personal preference.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

Docker hub is currently building a new version of my Kademlia exporter with the recent changes of this pull request incorporated. I would suggest holding off merging until it is deployed.

@mxinden
Copy link
Member

mxinden commented May 11, 2020

New version of exporter is deployed on kademlia-exporter.max-inden.de.

@tomaka
Copy link
Member

tomaka commented May 11, 2020

I've been tagged as reviewer, but I don't really have the time/motivation to dig into the Kademlia code, and I'd trust Max's review here.
The PR corresponds to what I think we need, so +1

@mxinden
Copy link
Member

mxinden commented May 12, 2020

This is ready to be merged from my side (just missing the changelog entries).

95% of all get queries on Kusama without disjoint paths involve at most ~180 failed and ~60 succeeded requests.

image

95% of all get queries on Kusama without disjoint paths involve at most ~240 failed and ~60 succeeded requests.

image

(num_{requests, failure, success} don't add up, keep in mind that these graphs show the 95th percentile for each of the values independently.)

@mxinden
Copy link
Member

mxinden commented May 15, 2020

romanb#4 should fix the existing merge conflicts.

Merge branch 'libp2p/master' into kad-query-info
@romanb
Copy link
Contributor Author

romanb commented May 16, 2020

@mxinden Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants