More insight into Kademlia queries. #1567

romanb · 2020-05-07T16:08:43Z

This is an extended proposal for providing more insight into and control over Kademlia queries, based on earlier discussions at #1494.

More insight: The API allows iterating over the active queries and inspecting their state and execution statistics (Kademlia::iter_queries via QueryRef and Kademlia::iter_queries_mut via QueryMut). Both QueryRef and QueryMut give access to the QueryInfo and QueryStats. I went a bit back-and-forth about whether to expose QueryInfo or not but in the end decided to do so. Any query state that is purely internal can still kept in the internal QueryInner, which is not exposed. Whenever a query finishes, successfully or with an error, execution statistics are also reported in the KademliaEvent. The QueryStats include: Number of requests initiated, number of successful requests, number of failed requests, as well as query duration.
More control: The API allows aborting queries prematurely at any time: Kademlia::query_mut(id) yields a QueryMut that provides QueryMut::finish(). This functionality existed before internally, but was not exposed.

To that end, API operations that initiate new queries return the query ID and multi-phase queries such as put_record retain the query ID across all phases, each phase being executed by a new (internal) query.

Tangentially, the behaviour of Kademlia::bootstrap has changed (for the better): Previously, once the initial self-lookup finished, queries to refresh all buckets beyond the first non-empty bucket would be initiated all at the same time, now they are initiated one after the other, which is more sane. The bootstrap events reported via a KademliaEvent also report the remaining number of bootstrap queries. When that counter reaches 0, bootstrapping is complete. This was not really detectable before.

More insight: The API allows iterating over the active queries and inspecting their state and execution statistics. More control: The API allows aborting queries prematurely at any time. To that end, API operations that initiate new queries return the query ID and multi-phase queries such as `put_record` retain the query ID across all phases, each phase being executed by a new (internal) query.

protocols/kad/src/behaviour.rs

mxinden

Thanks a bunch for this patch set! Among other things this will help a lot understanding the impact of #1473. I will deploy this pull request to kademlia-exporter.max-inden.de so we can test it out.

Woud you mind mentioning the changes of this pull request in the changelog?

protocols/kad/src/behaviour.rs

mxinden · 2020-05-08T07:08:36Z

examples/distributed-key-value-store.rs

@@ -188,7 +199,7 @@ fn handle_input_line(kademlia: &mut Kademlia<MemoryStore>, line: String) {
                publisher: None,
                expires: None,
            };
-            kademlia.put_record(record, Quorum::One);
+            kademlia.put_record(record, Quorum::One).expect("Failed to store record locally.");


Instead of leaving the interpretation of the error up to the user (in this case the expect string, what do you think of having RecordStore::put return a proper error? Not saying this should happen within this pull request. I am happy to do that as a follow up.

The interpretation isn't up to the user, the error type is a store::Error.

protocols/kad/src/behaviour.rs

mxinden · 2020-05-08T07:26:39Z

protocols/kad/src/behaviour.rs

@@ -732,72 +770,114 @@ where
                                target = kbucket::Key::new(PeerId::random());
                            }
                            target
-                        }).collect::<Vec<_>>();
+                        }).collect::<Vec<_>>().into_iter();


Why not make this iterator lazy delaying the heavy computations to the time they are needed.

What exactly do you have in mind? Note the following:

The absolute maximum size of this vector is 254 elements, but this is extremely unlikely (probabilistically). The realistic sizes are ~1-10 elements (iterations).

There are no heavy computations (at least I don't consider hashing, xor and sampling bytes from a PRNG to be heavy computations).

We need to know the length of this vector / iterator immediately below, so if we wanted to leave it as an iterator, it must be an ExactSizeIterator.

remaining is put into the QueryInfo for which I'd like to avoid adding type parameters.

Nevertheless, please let me know if you have a concrete approach in mind that is not invasive, maybe I'm missing something!

protocols/kad/src/query.rs

protocols/kad/src/query/peers/closest.rs

protocols/kad/src/query.rs

mxinden · 2020-05-08T12:53:44Z

I have updated the Kademlia exporter with this patch and made it expose the new statistics as metrics. In addition I added a new graph (Number of hops per query) to the two (all-dhts, specific dht) dashboards at kademlia-exporter.max-inden.de.

(It ignores num_pending for now. In addition with the bug above in mind, one should ignore the num_failures stat for now.)

romanb · 2020-05-08T15:17:41Z

Woud you mind mentioning the changes of this pull request in the changelog?

I will update the changelog once the PR has passed review just before merging. That way I don't need to adjust the changelog together with ongoing changes resulting from reviews. Just a personal preference.

mxinden

This looks good.

Docker hub is currently building a new version of my Kademlia exporter with the recent changes of this pull request incorporated. I would suggest holding off merging until it is deployed.

mxinden · 2020-05-11T10:01:01Z

New version of exporter is deployed on kademlia-exporter.max-inden.de.

tomaka · 2020-05-11T14:54:16Z

I've been tagged as reviewer, but I don't really have the time/motivation to dig into the Kademlia code, and I'd trust Max's review here.
The PR corresponds to what I think we need, so +1

mxinden · 2020-05-12T10:02:44Z

This is ready to be merged from my side (just missing the changelog entries).

95% of all get queries on Kusama without disjoint paths involve at most ~180 failed and ~60 succeeded requests.

95% of all get queries on Kusama without disjoint paths involve at most ~240 failed and ~60 succeeded requests.

(num_{requests, failure, success} don't add up, keep in mind that these graphs show the 95th percentile for each of the values independently.)

mxinden · 2020-05-15T14:02:27Z

romanb#4 should fix the existing merge conflicts.

Merge branch 'libp2p/master' into kad-query-info

romanb · 2020-05-16T07:06:25Z

@mxinden Thanks a lot.

Roman S. Borschel added 3 commits May 7, 2020 16:11

Cleanup

a3ea712

Cleanup

05eb8d3

romanb requested review from mxinden and tomaka May 7, 2020 16:08

romanb commented May 7, 2020

View reviewed changes

protocols/kad/src/behaviour.rs Outdated Show resolved Hide resolved

Update examples and re-exports.

9aed495

mxinden reviewed May 8, 2020

View reviewed changes

protocols/kad/src/query/peers/closest.rs Show resolved Hide resolved

mxinden reviewed May 8, 2020

View reviewed changes

protocols/kad/src/query.rs Outdated Show resolved Hide resolved

mxinden reviewed May 8, 2020

View reviewed changes

protocols/kad/src/query.rs Show resolved Hide resolved

mxinden mentioned this pull request May 8, 2020

src/exporter: Expose query statistics mxinden/kademlia-exporter#1

Merged

Incorporate review feedback.

c3a8bf8

mxinden approved these changes May 11, 2020

View reviewed changes

romanb mentioned this pull request May 12, 2020

Add Kademlia::num_active_iterative_queries #1494

Closed

Merge branch 'libp2p/master' into kad-query-info

9377df0

Merge pull request #4 from mxinden/kad-query-info

69fc5c1

Merge branch 'libp2p/master' into kad-query-info

Roman S. Borschel added 2 commits May 16, 2020 10:26

Update CHANGELOG

368dbe1

Update CHANGELOG

c9d67ee

romanb merged commit 3a96ebf into libp2p:master May 16, 2020

romanb deleted the kad-query-info branch May 16, 2020 08:43

mxinden mentioned this pull request Jul 29, 2020

client/authority-discovery: Add option to specify relevant authority set paritytech/substrate#6595

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More insight into Kademlia queries. #1567

More insight into Kademlia queries. #1567

romanb commented May 7, 2020 •

edited

Loading

mxinden left a comment

mxinden May 8, 2020

romanb May 8, 2020

mxinden May 8, 2020

romanb May 8, 2020 •

edited

Loading

mxinden commented May 8, 2020 •

edited

Loading

romanb commented May 8, 2020 •

edited

Loading

mxinden left a comment

mxinden commented May 11, 2020

tomaka commented May 11, 2020 •

edited

Loading

mxinden commented May 12, 2020 •

edited

Loading

mxinden commented May 15, 2020

romanb commented May 16, 2020

More insight into Kademlia queries. #1567

More insight into Kademlia queries. #1567

Conversation

romanb commented May 7, 2020 • edited Loading

mxinden left a comment

Choose a reason for hiding this comment

mxinden May 8, 2020

Choose a reason for hiding this comment

romanb May 8, 2020

Choose a reason for hiding this comment

mxinden May 8, 2020

Choose a reason for hiding this comment

romanb May 8, 2020 • edited Loading

Choose a reason for hiding this comment

mxinden commented May 8, 2020 • edited Loading

romanb commented May 8, 2020 • edited Loading

mxinden left a comment

Choose a reason for hiding this comment

mxinden commented May 11, 2020

tomaka commented May 11, 2020 • edited Loading

mxinden commented May 12, 2020 • edited Loading

mxinden commented May 15, 2020

romanb commented May 16, 2020

romanb commented May 7, 2020 •

edited

Loading

romanb May 8, 2020 •

edited

Loading

mxinden commented May 8, 2020 •

edited

Loading

romanb commented May 8, 2020 •

edited

Loading

tomaka commented May 11, 2020 •

edited

Loading

mxinden commented May 12, 2020 •

edited

Loading