Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc/metrics: Track # connected nodes supporting specific protocol #2734

Merged
merged 16 commits into from
Jul 15, 2022

Conversation

mxinden
Copy link
Member

@mxinden mxinden commented Jun 28, 2022

Description

An example metric exposed with this patch:

libp2p_identify_protocols{protocol="/ipfs/ping/1.0.0"} 10

This implies that 10 of the currently connected nodes support the ping protocol.

To achieve the above, this pull request also:

  • protocols/relay: Expose HOP_PROTOCOL_NAME and STOP_PROTOCOL_NAME

  • protocols/ping: Expose PROTOCOL_NAME

  • protocols/identify: Expose PROTOCOL_NAME and PUSH_PROTOCOL_NAME

  • protocols/dcutr: Expose PROTOCOL_NAME

  • misc/metrics: Explicitly delegate event recording to each recorder

    This allows delegating a single event to multiple Recorders. That enables e.g. the
    identify::Metrics Recorder to act both on IdentifyEvent and SwarmEvent. The latter enables
    it to garbage collect per peer data on disconnects.

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

This allows delegating a single event to multiple `Recorder`s. That enables e.g. the
`identify::Metrics` `Recorder` to act both on `IdentifyEvent` and `SwarmEvent`. The latter enables
it to garbage collect per peer data on disconnects.
An example metric exposed with this patch:

```
libp2p_identify_protocols{protocol="/ipfs/ping/1.0.0"} 10
```

This implies that 10 of the currently connected nodes support the ping protocol.
misc/metrics/CHANGELOG.md Outdated Show resolved Hide resolved
protocols/dcutr/CHANGELOG.md Outdated Show resolved Hide resolved
protocols/identify/CHANGELOG.md Outdated Show resolved Hide resolved
protocols/ping/CHANGELOG.md Outdated Show resolved Hide resolved
protocols/relay/CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK but I think we shouldn't filter the supported protocols.

misc/metrics/src/identify.rs Show resolved Hide resolved
Comment on lines +153 to +160
// Signal via an additional label value that one or more
// protocols of the remote peer have not been recognized.
if protocols.len() < info.protocols.len() {
protocols.push("unrecognized".to_string());
}

protocols.sort_unstable();
protocols.dedup();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will flatten all unrecognized into 1, regardless of how many unrecognized protocols there are.

Would it make sense to increment a separate counter for the number of unrecognized protocols?

Copy link
Member Author

@mxinden mxinden Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to increment a separate counter for the number of unrecognized protocols?

We would then need to track the number of unrecognized protocols per peer. On each new identify message, we would have to subtract the previous amount and add the new amount to the counter. I am not saying this is impossible, just adds a bit of complexity which I am not sure is worth the benefit. In other words, is it relevant for a user to know how many, potentially duplicate, unrecognized protocols all connected peers offer as a sum?

I was hoping for the addition to the # HELP text to resolve any confusion:

with "unrecognized" for each peer supporting one or more unrecognized protocols

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to increment a separate counter for the number of unrecognized protocols?

We would then need to track the number of unrecognized protocols per peer. On each new identify message, we would have to subtract the previous amount and add the new amount to the counter.

I think this would only apply if we would track it as a gauge?

A counter always increases anyway and you need to use functions like rate to see how it changes over time.

A spike in the unrecognized protocols rate could e.g. hint at a spam attack by a peer.

I don't feel strongly about it though, happy to go either way :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to increment a separate counter for the number of unrecognized protocols?

We would then need to track the number of unrecognized protocols per peer. On each new identify message, we would have to subtract the previous amount and add the new amount to the counter.

I think this would only apply if we would track it as a gauge?

A counter always increases anyway and you need to use functions like rate to see how it changes over time.

The rate on a counter would only tell how often you saw a unrecognized protocol, not how many unrecognized protocols connected peers offer. One could correlate this with the interval libp2p-identify requests a new identification, though that is (a) neither specified nor (b) solid with the push mechanism.

A spike in the unrecognized protocols rate could e.g. hint at a spam attack by a peer.

True, though I think we would want to prevent such attack before it happens. In addition, given that this attack would potentially bring down our Prometheus instance, who is going to alert us of the spam attack?

Copy link
Contributor

@thomaseizinger thomaseizinger Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to increment a separate counter for the number of unrecognized protocols?

We would then need to track the number of unrecognized protocols per peer. On each new identify message, we would have to subtract the previous amount and add the new amount to the counter.

I think this would only apply if we would track it as a gauge?
A counter always increases anyway and you need to use functions like rate to see how it changes over time.

The rate on a counter would only tell how often you saw a unrecognized protocol, not how many unrecognized protocols connected peers offer. One could correlate this with the interval libp2p-identify requests a new identification, though that is (a) neither specified nor (b) solid with the push mechanism.

Assuming a stable set of peers, the rate of unrecognised protocols should be stable too? Identify pushes might throw a little spanner in here and there but overall, I'd expect it to be meaningful.

A spike in the unrecognized protocols rate could e.g. hint at a spam attack by a peer.

True, though I think we would want to prevent such attack before it happens. In addition, given that this attack would potentially bring down our Prometheus instance, who is going to alert us of the spam attack?

We are preventing the attack by only increasing a counter instead of creating a label per protocol. Surely prometheus can't be brought down just because a counter increases at a massive rate?

All I am saying is that increasing the counter by the number of unrecognised protocols should allow us to observe the attempt of an attack assuming that a non-malicious node will have a stable number of unrecognised protocols whereas an attack would likely max out the size limit of the incoming message with as many protocols as possible and thus increase the counter by more than ordinary behaviour.

It is not massively useful outside of this I think so happy to go either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A spike in the unrecognized protocols rate could e.g. hint at a spam attack by a peer.

True, though I think we would want to prevent such attack before it happens. In addition, given that this attack would potentially bring down our Prometheus instance, who is going to alert us of the spam attack?

We are preventing the attack by only increasing a counter instead of creating a label per protocol. Surely prometheus can't be brought down just because a counter increases at a massive rate?

Oh error in reasoning on my end here. You are right. Sorry about that @thomaseizinger.

It is not massively useful outside of this I think so happy to go either way.

I will go without it.

@dignifiedquire
Copy link
Member

@mxinden it would be really nice to have this be extendable in some form, is there an easy way to do this if I say have two other protocols in my impl I want to track in this same system?

@mxinden
Copy link
Member Author

mxinden commented Jul 14, 2022

@mxinden it would be really nice to have this be extendable in some form, is there an easy way to do this if I say have two other protocols in my impl I want to track in this same system?

Agreed. Though thus far I have been unable to come up with a good design how to allow users to whitelist a set of protocols. Any ideas @dignifiedquire?

I would argue this pull request already adds value, i.e. tracking core libp2p protocols only is already helpful, thus I will proceed with this pull request.

@dignifiedquire
Copy link
Member

fully agreed on this adding value already. happy to defer the extension to a followup issue. will do some thinking about an api to extend this

@mxinden mxinden merged commit d4f8ec2 into libp2p:master Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants