Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protocols/kad: Fix right shift overflow panic in record_received #1492

Merged
merged 7 commits into from
Mar 18, 2020

Conversation

mxinden
Copy link
Member

@mxinden mxinden commented Mar 11, 2020

Within Behaviour::record_received the exponentially decreasing
expiration based on the distance to the target for a record is
calculated as following:

  1. Calculate the amount of nodes between us and the record key beyond
    the k replication constant as n.

  2. Shift the configured record time-to-live n times to the right to
    calculate an exponentially decreasing expiration.

The configured record time-to-live is a u64. If n is larger or equal
to 64 the right shift will lead to an overflow which panics in debug
mode.

This patch uses a checked right shift instead, defaulting to 0 (now + 0) for the expiration on overflow.

Fixes #1491.

Within `Behaviour::record_received` the exponentially decreasing
expiration based on the distance to the target for a record is
calculated as following:

1. Calculate the amount of nodes between us and the record key beyond
the k replication constant as `n`.

2. Shift the configured record time-to-live `n` times to the right to
calculate an exponentially decreasing expiration.

The configured record time-to-live is a u64. If `n` is larger or equal
to 64 the right shift will lead to an overflow which panics in debug
mode.

This patch uses a checked right shift instead, defaulting to 0 (`now +
0`) for the expiration on overflow.
@mxinden mxinden requested review from romanb and arkpar March 11, 2020 12:30
@@ -364,6 +364,15 @@ pub struct KademliaRequestId {
connec_unique_id: UniqueConnecId,
}

#[cfg(test)]
impl Default for KademliaRequestId {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as KademliaRequestId implements Clone, this Default impl might as well be available all the time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an opaque implementation detail. Implementing Default on it doesn't make any sense. Maybe it could contain a channel or an Arc in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood KademliaRequestId to be some sort of unique one-time token and Clone defeats that purpose but I did not look much into it so feel free to ignore the criticism.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine either way. I can't come up with a use-case for Default outside of testing though.

Copy link
Contributor

@romanb romanb Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood KademliaRequestId to be some sort of unique one-time token and Clone defeats that purpose but I did not look much into it so feel free to ignore the criticism.

Just to give some context: KademliaRequestId did not use to be Clone but it had to be as a result of #1440 requiring Clone for the TInEvent of any handler in libp2p-swarm, since it is now possible to "broadcast" a single such event to multiple handlers (which in turn was a requirement discovered during the substrate integration in paritytech/substrate#5066) and KademliaRequestId is part of these events in libp2p-kad. It would of course be better to not have that Clone, but I didn't come up with a way to do so.

vec![],
);

kad.record_received(target, connection_id, request_id, record);
Copy link
Member

@tomaka tomaka Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be testing private implementation details. The parameters of this method are totally arbitrary. The fact that it's a separate method at all is also totally arbitrary and subject to change. All we're doing with this test is locking the current implementation as it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your concern. What are you advocating for instead: (1) remove the test or (2) write a larger test that reproduces the same behaviour?

All we're doing with this test is locking the current implementation as it is.

I don't think a test should ever lock an implementation. While I am in favor of having a test for each regression, I am also in favor of removing outdated tests if necessary.

@romanb
Copy link
Contributor

romanb commented Mar 13, 2020

Thanks for fixing this, I wasn't even aware bit shifts could overflow in this way. A regression test is a good idea but I don't see why it needs to be so complicated. Why not extract the code that calculates the expiry into a standalone function and writing a small property and or / regression test for that? Even if you want the function to actually take a KBucketsTable as argument, so that it uses count_nodes_between internally, there shouldn't be a need for such a probabilistic generation of so many peer IDs, since Key::for_distance can be used in combination with the ClosestBucketsIter to deterministically fill the buckets as desired (this test may give inspiration). In other words, when testing a particular edge case requires seemingly too much surrounding boilerplate in order to test what you want, it is often a good idea to partition the code further so that the part you're interested in testing in isolation is easier to test, without so much surrounding ceremony. This seems to be such a case.

Extract right shift into isolated function and replace complex
regression test with small isolated one.
@mxinden
Copy link
Member Author

mxinden commented Mar 16, 2020

@romanb thanks for the input. Can you take another look?

Copy link
Contributor

@romanb romanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few simplifying suggestions, let me know if you disagree with any of them.

let expiration = self.record_ttl.map(|ttl|
now + Duration::from_secs(ttl.as_secs() >> num_beyond_k)
);
let expiration = exp_decr_expiration(self.record_ttl, num_beyond_k);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let expiration = exp_decr_expiration(self.record_ttl, num_beyond_k);
let expiration = self.record_ttl.map(|ttl| now + exp_decrease(ttl, num_beyond_k));

@@ -1030,6 +1026,15 @@ where
}
}

/// Calculate exponentially decreasing expiration from a default time-to-live by a factor.
fn exp_decr_expiration(default_ttl: Option<Duration>, factor: u32) -> Option<Instant> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn exp_decr_expiration(default_ttl: Option<Duration>, factor: u32) -> Option<Instant> {
fn exp_decrease(ttl: Duration, exp: u32) -> Instant {

@@ -1030,6 +1026,15 @@ where
}
}

/// Calculate exponentially decreasing expiration from a default time-to-live by a factor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Calculate exponentially decreasing expiration from a default time-to-live by a factor.
/// Exponentially decrease the given duration (base 2).

@@ -1030,6 +1026,15 @@ where
}
}

/// Calculate exponentially decreasing expiration from a default time-to-live by a factor.
fn exp_decr_expiration(default_ttl: Option<Duration>, factor: u32) -> Option<Instant> {
default_ttl.map(|ttl| Instant::now() + Duration::from_secs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default_ttl.map(|ttl| Instant::now() + Duration::from_secs(
Duration::from_secs(ttl.as_secs().checked_shr(exp).unwrap_or(0))

@@ -946,8 +946,6 @@ where
return
}

let now = Instant::now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is preferable to leave this here as it can be reused multiple times (cf. #1496). And since Instant::now() is a side-effecting function for obtaining some notion of system time, usually both testability and performance is improved by using it mostly in top-level functions and passing the instant down to others where needed (see also the other comments).

@mxinden
Copy link
Member Author

mxinden commented Mar 17, 2020

Suggested simplifications look good to me. @romanb would you mind taking another look?

@romanb romanb merged commit 2947146 into libp2p:master Mar 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic in libp2p-kad
5 participants