Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc/metrics: Add auxiliary crate to record events as OpenMetrics #2063

Merged
merged 22 commits into from
Aug 13, 2021

Conversation

mxinden
Copy link
Member

@mxinden mxinden commented Apr 30, 2021

Motivation

Metrics (e.g. Prometheus or OpenMetrics) enable one to gain a deep understanding of a system, both while developing and operating it.

I find myself re-implementing metric instrumentation for rust-libp2p over and over again. The most advanced metric setup likely lives in Substrate. Other smaller setups can e.g. be found in the Kademlia exporter. Instead of duplicating most of this logic once more for rust-libp2p-relay-server I would like to propose collaborating on a set of base metrics via a new crate here in rust-libp2p - namely libp2p-metrics - providing extensive instrumentation to rust-libp2p users out-of-the-box.

In the long term, I would like to extend this effort beyond the definition and exposition of metrics, all the way to Grafana dashboard definitions and alerting rules. See Substrate/.maintain as an example.

Implementation

As said in #2013 I don't plan to introduce OpenMetrics in each rust-libp2p crate itself, but instead I want to provide an optional auxiliary crate that can live on-the-side and can be used on-demand by recording events emitted by the Swarm and various protocols.

I have not yet found a great abstraction that is both intuitive and extensible. For now I am proposing the below, though I am more than happy for alternative suggestions.

let mut swarm = Swarm::new(
    block_on(libp2p::development_transport(local_key))?,
    Ping::new(PingConfig::new().with_keep_alive(true)),
    local_peer_id,
);

let mut metric_registry = Registry::default();
let metrics = Metrics::new(&mut metric_registry);
thread::spawn(move || block_on(metrics_server(metric_registry)));

block_on(async {
    loop {
        match swarm.next_event().await {
            SwarmEvent::Behaviour(ping_event) => {
                metrics.record(&ping_event);
            }
            swarm_event => {
                metrics.record(&swarm_event);
            }
        }
    }
})

You can find the full example in misc/metrics/examples/metrics.rs in this pull request.

Note, this pull request is using the open-metrics-client crate (authored by me) instead of the more popular prometheus crate. You can find a high-level comparison on tikv/rust-prometheus#392.

Fixes #2013

This commit adds an auxiliary crate recording protocol and Swarm events
and exposing them as metrics in the [OpenMetrics] format.

[OpenMetrics]: https://github.com/OpenObservability/OpenMetrics/
@mxinden
Copy link
Member Author

mxinden commented Apr 30, 2021

The new crate proposed here - libp2p-metrics - is already used by all deployments of https://github.com/mxinden/rust-libp2p-relay-server. To find an abstraction that works for many projects instead of just one, I would like to collaborate with someone, preferably with a large code-base, helping them introduce libp2p-metrics into their project. (It sounds a lot more complicated than it is.)

Any volunteers? @ec2 would this be interesting for you and https://github.com/ChainSafe/forest?

@ec2
Copy link

ec2 commented May 4, 2021

This is very cool! @olibearo from our team will be leading the metrics and instrumentation efforts. I think we would be very down. Let me direct Jorge's attention to this thread so he can chime in.

@dvc94ch
Copy link
Contributor

dvc94ch commented Jul 27, 2021

ipfs-embed uses something prometheus based. while I probably won't change it for the time being, I will eventually rewrite it and make the core abstraction provided to applications a stream instead of a block and remove all the complex garbage collection etc. this crate does seem useful, thanks!

@mxinden
Copy link
Member Author

mxinden commented Jul 28, 2021

Great to hear @dvc94ch, let me know in case you every end up using it.

@mxinden mxinden merged commit 98bc5e6 into libp2p:master Aug 13, 2021
dvc94ch added a commit to dvc94ch/rust-libp2p that referenced this pull request Sep 14, 2021
* protocols/gossipsub: Fix inconsistency in mesh peer tracking (libp2p#2189)

Co-authored-by: Age Manning <Age@AgeManning.com>

* misc/metrics: Add auxiliary crate to record events as OpenMetrics (libp2p#2063)

This commit adds an auxiliary crate recording protocol and Swarm events
and exposing them as metrics in the OpenMetrics format.

* README: Mention security@ipfs.io

* examples/: Add file sharing example (libp2p#2186)

Basic file sharing application with peers either providing or locating
and getting files by name.

While obviously showcasing how to build a basic file sharing
application, the actual goal of this example is **to show how to
integrate rust-libp2p into a larger application**.

Architectural properties

- Clean clonable async/await interface ([`Client`]) to interact with the
network layer.

- Single task driving the network layer, no locks required.

* examples/README: Give an overview over the many examples (libp2p#2194)

* protocols/kad: Enable filtering of (provider) records (libp2p#2163)

Introduce `KademliaStoreInserts` option, which allows to filter records.

Co-authored-by: Max Inden <mail@max-inden.de>

* swarm/src/protocols_handler: Impl ProtocolsHandler on either::Either (libp2p#2192)

Implement ProtocolsHandler on either::Either representing either of two
ProtocolsHandler implementations.

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

* *: Make libp2p-core default features optional (libp2p#2181)

Co-authored-by: Max Inden <mail@max-inden.de>

* core/: Remove DisconnectedPeer::set_connected and Pool::add (libp2p#2195)

This logic seems to be a leftover of
libp2p#889 and unused today.

* protocols/gossipsub: Use ed25519 in tests (libp2p#2197)

With f2905c0 the secp256k1 feature is
disabled by default. Instead of enabling it in the dev-dependency,
simply use ed25519.

* build(deps): Update minicbor requirement from 0.10 to 0.11 (libp2p#2200)

Updates the requirements on [minicbor](https://gitlab.com/twittner/minicbor) to permit the latest version.
- [Release notes](https://gitlab.com/twittner/minicbor/tags)
- [Changelog](https://gitlab.com/twittner/minicbor/blob/master/CHANGELOG.md)
- [Commits](https://gitlab.com/twittner/minicbor/compare/minicbor-v0.10.0...minicbor-v0.11.0)

---
updated-dependencies:
- dependency-name: minicbor
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): Update salsa20 requirement from 0.8 to 0.9 (libp2p#2206)

* build(deps): Update salsa20 requirement from 0.8 to 0.9

Updates the requirements on [salsa20](https://github.com/RustCrypto/stream-ciphers) to permit the latest version.
- [Release notes](https://github.com/RustCrypto/stream-ciphers/releases)
- [Commits](RustCrypto/stream-ciphers@ctr-v0.8.0...salsa20-v0.9.0)

---
updated-dependencies:
- dependency-name: salsa20
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* *: Bump pnet to v0.22

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Max Inden <mail@max-inden.de>

* *: Dial with handler and return handler on error and closed (libp2p#2191)

Require `NetworkBehaviourAction::{DialPeer,DialAddress}` to contain a
`ProtocolsHandler`. This allows a behaviour to attach custom state to its
handler. The behaviour would no longer need to track this state separately
during connection establishment, thus reducing state required in a behaviour.
E.g. in the case of `libp2p-kad` the behaviour can include a `GetRecord` request
in its handler, or e.g. in the case of `libp2p-request-response` the behaviour
can include the first request in the handler.

Return `ProtocolsHandler` on connection error and close. This allows a behaviour
to extract its custom state previously included in the handler on connection
failure and connection closing. E.g. in the case of `libp2p-kad` the behaviour
could extract the attached `GetRecord` from the handler of the failed connection
and then start another connection attempt with a new handler with the same
`GetRecord` or bubble up an error to the user.

Co-authored-by: Thomas Eizinger <thomas@eizinger.io>

* core/: Remove deprecated read/write functions (libp2p#2213)

Co-authored-by: Max Inden <mail@max-inden.de>

* protocols/ping: Revise naming of symbols (libp2p#2215)

Co-authored-by: Max Inden <mail@max-inden.de>

* protocols/rendezvous: Implement protocol (libp2p#2107)

Implement the libp2p rendezvous protocol.

> A lightweight mechanism for generalized peer discovery. It can be used for
bootstrap purposes, real time peer discovery, application specific routing, and
so on.

Co-authored-by: rishflab <rishflab@hotmail.com>
Co-authored-by: Daniel Karzel <daniel@comit.network>

* core/src/network/event.rs: Fix typo (libp2p#2218)

* protocols/mdns: Do not fire all timers at the same time. (libp2p#2212)

Co-authored-by: Max Inden <mail@max-inden.de>

* misc/metrics/src/kad: Set query_duration lowest bucket to 0.1 sec (libp2p#2219)

Probability for a Kademlia query to return in less than 100 milliseconds
is low, thus increasing the lower bucket to improve accuracy within the
higher ranges.

* misc/metrics/src/swarm: Expose role on connections_closed (libp2p#2220)

Expose whether closed connection was a Dialer or Listener.

* .github/workflows/ci.yml: Use clang 11 (libp2p#2233)

* protocols/rendezvous: Update prost (libp2p#2226)

Co-authored-by: Max Inden <mail@max-inden.de>

* *: Fix clippy warnings (libp2p#2227)

* swarm-derive/: Make event_process = false the default (libp2p#2214)

Co-authored-by: Max Inden <mail@max-inden.de>

Co-authored-by: Max Inden <mail@max-inden.de>
Co-authored-by: Age Manning <Age@AgeManning.com>
Co-authored-by: Ruben De Smet <ruben.de.smet@rubdos.be>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: rishflab <rishflab@hotmail.com>
Co-authored-by: Daniel Karzel <daniel@comit.network>
Co-authored-by: David Craven <david@craven.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide Open Metrics (/Prometheus) Wrapper
3 participants