Multiple improvements to discovery ping handling #8771

jimpo · 2018-06-03T05:07:27Z

This addresses some of the problems listed in #8757. The PR is rather large, so it is best reviewed commit by commit and could be split into multiple PRs if reviewers would like.

There are a few behavior changes:

Nodes are only added to the k-buckets after responding to a PING
The echo hash on PONG messages is verified against the request packet
FIND_NODE request timeouts are handled similarly to PING timeouts
Selection of the last seen node in a k-bucket is improved
A node is allowed 5 consecutive request timeouts before being evicted from a k-bucket

There are more improvements to come (the k-buckets are still not limited to k entries), but I think this is a step in the right direction.

parity-cla-bot · 2018-06-03T05:07:29Z

It looks like @jimpo signed our Contributor License Agreement. 👍

Many thanks,

Parity Technologies CLA Bot

twittner

Thanks! The changes look good to me.

twittner · 2018-06-06T10:42:51Z

util/network-devp2p/src/discovery.rs

 	send_queue: VecDeque<Datagramm>,
 	check_timestamps: bool,
 	adding_nodes: Vec<NodeEntry>,
 	ip_filter: IpFilter,
+	request_backoff: Vec<Duration>,


What is the motivation for NodeBucket::request_backoff? Why not declare REQUEST_BACKOFFas a [Duration; 4]?

It's on Discovery, not NodeBucket. And the motivation is just injecting dependencies explicitly. It's nice because the backoff can be tweaked in tests and stuff (which I make use of).

It's on Discovery, not NodeBucket.

Of course, sorry. Typo.

And the motivation is just injecting dependencies explicitly.

I was just thinking that since Discovery::new refers to const REQUEST_BACKOFF anyway only to map its integers to Durations, why not declare REQUEST_BACKOFF itself as a &'static [Duration] and only assign the reference to Duration::request_backoff. After all, client code can not change the backoff setting and in test modules one can still define other backoff sequences and overwrite the default. But it was just a quick thought and certainly not important.

Added 364c8e1. Let me know if you prefer that or want me to revert. Shouldn't make any difference performance-wise because Discovery is only instantiated once.

twittner · 2018-06-06T12:56:54Z

util/network-devp2p/src/discovery.rs

-		self.adding_nodes = nodes;
-		self.update_new_nodes();
+	pub fn add_node_list(&mut self, mut nodes: Vec<NodeEntry>) {
+		for node in nodes.drain(..) {


No need to drain because nodes is moved here. for node in nodes { .. } is sufficient.

Will do. I assume the drain can be removed from init_node_list as well? (which is why I did it here).

twittner · 2018-06-06T13:34:08Z

util/network-devp2p/src/discovery.rs

+	};
+	packet[32..(32 + 65)].clone_from_slice(&signature[..]);
+	let signed_hash = keccak(&packet[32..]);
+	packet[0..32].clone_from_slice(&signed_hash);


I think you can use copy_from_slice.

twittner · 2018-06-07T09:03:32Z

util/network-devp2p/src/discovery.rs

 	}

 	/// Add a list of known nodes to the table.
 	pub fn init_node_list(&mut self, mut nodes: Vec<NodeEntry>) {
-		for n in nodes.drain(..) {
+		for n in nodes {


The nodes parameter here and in add_node_list does not need to be declared as mut.

5chdn · 2018-07-02T12:07:44Z

@twittner @tomaka please use the files > review > approve functionality to review a PR :)

tomaka · 2018-07-02T16:03:31Z

util/network-devp2p/src/discovery.rs

 			rlp.begin_list(c.len());
 			for n in 0 .. c.len() {
 				rlp.begin_list(4);
 				c[n].endpoint.to_rlp(&mut rlp);
 				rlp.append(&c[n].id);
 			}
+			append_expiration(&mut rlp);


This may be a stupid question but isn't this a modification of the protocol?
(because if so, what about compatibility?)

No, this should not affect behavior. The change is that the expiry timestamp used to be appended in send_packet and now it is appended to the RLP payload before calling that method.

twittner · 2018-07-05T15:49:16Z

util/network-devp2p/src/discovery.rs

+fn assemble_packet(packet_id: u8, bytes: &[u8], secret: &Secret) -> Result<Bytes, Error> {
+	let mut packet = Bytes::with_capacity(bytes.len() + 32 + 65 + 1);
+	packet.extend_from_slice(&[0; 32 + 65][..]);
+	packet.extend_from_slice(&[packet_id][..]);


No need to create temporaries:

packet.resize(32 + 65, 0); packet.put_u8(packet_id);

twittner · 2018-07-06T08:39:26Z

util/network-devp2p/src/discovery.rs

+			} else { false };
+
+		if remove_from_bucket {
+			let id_hash = keccak(&node_id);


Why not moving this block directly after entry.remove();?

twittner · 2018-07-06T10:10:50Z

util/network-devp2p/src/discovery.rs

+		if !is_expected {
+			debug!(target: "discovery", "Got unexpected Neighbors from {:?}", &from);
+			return Ok(None);
+		}


I don't quite understand response_count. If we receive two responses with 10 results each, this would only consider the first one and ignore the second response? I am probably thick, but I would be grateful if you could explain the idea a bit more. Also, would datagram duplication not interfere with response_counts? (Not sure how much of a problem this would be in practice.)

Yeah, this is not really ideal, but I don't have a better solution. Since the NEIGHBORS packet does not have any ID or hash that can be used to associate it with a particular request, the best we can do is ensure there is at most one FIND_NODE in flight to any node at all times and associate the response based on sender. The sender is not supposed to send more than k=16 results to any query, so if they do, this code will ignore the second 10 result packet in your example. As far as packet duplication, if it is a concern, we could dedup the results based on node ID and track unique results in the response count.

I'm open to suggestions here.

This function is useful inside unit tests.

Previously the discovery algorithm would add nodes to the routing table before confirming that the endpoint is participating in the protocol. This now tracks in-flight pings and adds to the routing table only after receiving a response.

Now that we may ping nodes before adding to a k-bucket, the timeout tracking must be separate from BucketEntry.

Stores packet hash with in-flight requests and matches with pong response.

UDP packets may get dropped, so instead of immediately booting nodes that fail to respond to a ping, retry 4 times with exponential backoff.

twittner · 2018-07-09T09:07:20Z

util/network-devp2p/src/discovery.rs

+					}
+				};
+				if entry.get().response_count == BUCKET_SIZE {
+					entry.remove();


If a node reports less than BUCKET_SIZE responses it will not have it's entry removed here and instead timeout. This will eventually cause the node from being removed altogether, no?

Yeah, ideally the packet would have an indication that it's the end of a series of NEIGHBORS packets, but since it doesn't I think this is the best strategy. From my reading, Geth has the same behavior (https://github.com/ethereum/go-ethereum/blob/master/p2p/discover/udp.go#L329) of triggering a timeout if it receives fewer than k results (and more than k would be incorrect).

I'm assuming that this won't be too much of a problem in practice because 1) you only return <16 if there's <16 nodes in the whole routing table (which I suppose is possible on a small network) 2) there are 4 retries and if any of them are a PING, it will clear the failure count. In a future PR, it would probably be good to put something in to ensure we PING nodes that have failed a few FIND_NODE requests.

Another alternative would be to keep the PendingRequest alive but not treat it as a timeout if it clears with <16 responses. On the other hand, I think it's more likely in that case that we stay connected to a high latency node that continues to give incomplete NEIGHBORS responses without ever booting them.

Another alternative would be to keep the PendingRequest alive but not treat it as a timeout if it clears with <16 responses.

This is what I had in mind, but I agree that the approach chosen here is probably better in practice.

twittner · 2018-07-11T14:48:14Z

util/network-devp2p/src/discovery.rs

+					}
+				};
+				if entry.get().response_count == BUCKET_SIZE {
+					entry.remove();


Another alternative would be to keep the PendingRequest alive but not treat it as a timeout if it clears with <16 responses.

This is what I had in mind, but I agree that the approach chosen here is probably better in practice.

* discovery: Only add nodes to routing table after receiving pong. Previously the discovery algorithm would add nodes to the routing table before confirming that the endpoint is participating in the protocol. This now tracks in-flight pings and adds to the routing table only after receiving a response. * discovery: Refactor packet creation into its own function. This function is useful inside unit tests. * discovery: Additional testing for new add_node behavior. * discovery: Track expiration of pings to non-yet-in-bucket nodes. Now that we may ping nodes before adding to a k-bucket, the timeout tracking must be separate from BucketEntry. * discovery: Verify echo hash on pong packets. Stores packet hash with in-flight requests and matches with pong response. * discovery: Track timeouts on FIND_NODE requests. * discovery: Retry failed pings with exponential backoff. UDP packets may get dropped, so instead of immediately booting nodes that fail to respond to a ping, retry 4 times with exponential backoff. * !fixup Use slice instead of Vec for request_backoff.

* parity-version: betalize 2.0 * Multiple improvements to discovery ping handling (#8771) * discovery: Only add nodes to routing table after receiving pong. Previously the discovery algorithm would add nodes to the routing table before confirming that the endpoint is participating in the protocol. This now tracks in-flight pings and adds to the routing table only after receiving a response. * discovery: Refactor packet creation into its own function. This function is useful inside unit tests. * discovery: Additional testing for new add_node behavior. * discovery: Track expiration of pings to non-yet-in-bucket nodes. Now that we may ping nodes before adding to a k-bucket, the timeout tracking must be separate from BucketEntry. * discovery: Verify echo hash on pong packets. Stores packet hash with in-flight requests and matches with pong response. * discovery: Track timeouts on FIND_NODE requests. * discovery: Retry failed pings with exponential backoff. UDP packets may get dropped, so instead of immediately booting nodes that fail to respond to a ping, retry 4 times with exponential backoff. * !fixup Use slice instead of Vec for request_backoff. * Add separate database directory for light client (#8927) (#9064) * Add seperate default DB path for light client (#8927) * Improve readability * Revert "Replace `std::env::home_dir` with `dirs::home_dir` (#9077)" (#9097) * Revert "Replace `std::env::home_dir` with `dirs::home_dir` (#9077)" This reverts commit 7e77932. * Restore some of the changes * Update parity-common * Offload cull to IoWorker. (#9099) * Fix work-notify. (#9104) * Update hidapi, fixes #7542 (#9108) * docker: add cmake dependency (#9111) * Update light client hardcoded headers (#9098) * Insert Kovan hardcoded headers until #7690241 * Insert Kovan hardcoded headers until block 7690241 * Insert Ropsten hardcoded headers until #3612673 * Insert Mainnet hardcoded headers until block 5941249 * Make sure to produce full blocks. (#9115) * Insert ETC (classic) hardcoded headers until block #6170625 (#9121) * fix verification in ethcore-sync collect_blocks (#9135) * Completely remove all dapps struct from rpc (#9107) * Completely remove all dapps struct from rpc * Remove unused pub use * `evm bench` fix broken dependencies (#9134) * `evm bench` use valid dependencies Benchmarks of the `evm` used stale versions of a couple a crates that this commit fixes! * fix warnings * Update snapcraft.yaml (#9132)

debris added A0-pleasereview 🤓 Pull request needs code review. M4-core ⛓ Core client code / Rust. labels Jun 4, 2018

debris requested review from twittner and tomaka June 4, 2018 08:23

5chdn added this to the 1.12 milestone Jun 4, 2018

jimpo mentioned this pull request Jun 4, 2018

network-devp2p: downgrade logging to debug, add target #8784

Merged

jimpo force-pushed the discovery-tracking branch 2 times, most recently from 1230ef0 to bd208f3 Compare June 5, 2018 16:46

jimpo mentioned this pull request Jun 5, 2018

Discovery: Do not Receive ping from self #8800

Closed

twittner reviewed Jun 6, 2018

View reviewed changes

jimpo force-pushed the discovery-tracking branch from bd208f3 to 03deb75 Compare June 6, 2018 19:33

twittner reviewed Jun 7, 2018

View reviewed changes

jimpo force-pushed the discovery-tracking branch 3 times, most recently from d9a4ad9 to b8581b3 Compare June 12, 2018 02:16

5chdn requested a review from twittner July 2, 2018 12:06

tomaka reviewed Jul 2, 2018

View reviewed changes

twittner reviewed Jul 6, 2018

View reviewed changes

jimpo added 8 commits July 6, 2018 17:52

discovery: Refactor packet creation into its own function.

44c4299

This function is useful inside unit tests.

discovery: Additional testing for new add_node behavior.

7dc960e

discovery: Track expiration of pings to non-yet-in-bucket nodes.

5e8f2c9

Now that we may ping nodes before adding to a k-bucket, the timeout tracking must be separate from BucketEntry.

discovery: Verify echo hash on pong packets.

f86cfd5

Stores packet hash with in-flight requests and matches with pong response.

discovery: Track timeouts on FIND_NODE requests.

b5c87d8

discovery: Retry failed pings with exponential backoff.

2ba21de

UDP packets may get dropped, so instead of immediately booting nodes that fail to respond to a ping, retry 4 times with exponential backoff.

!fixup Use slice instead of Vec for request_backoff.

c987f37

jimpo force-pushed the discovery-tracking branch from b8581b3 to c987f37 Compare July 7, 2018 00:53

5chdn requested a review from twittner July 9, 2018 06:52

5chdn requested a review from tomaka July 9, 2018 06:52

twittner reviewed Jul 9, 2018

View reviewed changes

arkpar approved these changes Jul 10, 2018

View reviewed changes

5chdn added the A1-onice 🌨 Pull request is reviewed well, but should not yet be merged. label Jul 10, 2018

5chdn modified the milestones: 2.0, 2.1 Jul 10, 2018

5chdn removed the A1-onice 🌨 Pull request is reviewed well, but should not yet be merged. label Jul 11, 2018

twittner approved these changes Jul 11, 2018

View reviewed changes

5chdn added B0-patchthis A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Jul 11, 2018

5chdn merged commit 01f825b into openethereum:master Jul 11, 2018

5chdn mentioned this pull request Jul 12, 2018

Backports to 2.0.0-beta #9094

Merged

15 tasks

jimpo deleted the discovery-tracking branch July 18, 2018 16:27

jimpo mentioned this pull request Sep 12, 2018

Improve P2P discovery #9526

Merged

5chdn mentioned this pull request Sep 13, 2018

Network fragmentation on Ropsten? #9549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple improvements to discovery ping handling #8771

Multiple improvements to discovery ping handling #8771

jimpo commented Jun 3, 2018 •

edited

Loading

parity-cla-bot commented Jun 3, 2018

twittner left a comment

twittner Jun 6, 2018

jimpo Jun 6, 2018

twittner Jun 7, 2018

jimpo Jun 7, 2018

twittner Jun 6, 2018

jimpo Jun 6, 2018

twittner Jun 6, 2018

twittner Jun 7, 2018

5chdn commented Jul 2, 2018

tomaka Jul 2, 2018

jimpo Jul 2, 2018 •

edited

Loading

twittner Jul 5, 2018

twittner Jul 6, 2018

twittner Jul 6, 2018

jimpo Jul 6, 2018 •

edited

Loading

twittner Jul 9, 2018

jimpo Jul 9, 2018

twittner Jul 11, 2018

twittner Jul 11, 2018

Multiple improvements to discovery ping handling #8771

Multiple improvements to discovery ping handling #8771

Conversation

jimpo commented Jun 3, 2018 • edited Loading

parity-cla-bot commented Jun 3, 2018

twittner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

5chdn commented Jul 2, 2018

Choose a reason for hiding this comment

jimpo Jul 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jimpo Jul 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jimpo commented Jun 3, 2018 •

edited

Loading

jimpo Jul 2, 2018 •

edited

Loading

jimpo Jul 6, 2018 •

edited

Loading