-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Add a sub_libp2p_notifications_queues_size Prometheus metric #5503
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Other than the observe
on flushing none of my comments should block this pull request.
@@ -56,14 +57,17 @@ const INITIAL_KEEPALIVE_TIME: Duration = Duration::from_secs(5); | |||
pub struct NotifsOutHandlerProto { | |||
/// Name of the protocol to negotiate. | |||
protocol_name: Cow<'static, [u8]>, | |||
/// Optional prometheus histogram to report message queue sizes variations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Optional prometheus histogram to report message queue sizes variations. | |
/// Optional Prometheus histogram to report message queue size variations. |
.map(|(p, _)| { | ||
let queue_size_report = queue_size_report.as_ref().and_then(|qs| { | ||
if let Ok(utf8) = str::from_utf8(&p) { | ||
Some(qs.with_label_values(&[utf8])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for only passing a single Histogram.
@@ -103,6 +108,9 @@ pub struct NotifsOutHandler { | |||
/// When the connection with the remote has been successfully established. | |||
when_connection_open: Instant, | |||
|
|||
/// Optional prometheus histogram to report message queue sizes variations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Optional prometheus histogram to report message queue sizes variations. | |
/// Optional prometheus histogram to report message queue size variations. |
@@ -259,6 +265,7 @@ impl<B: BlockT + 'static, H: ExHashT> NetworkWorker<B, H> { | |||
params.block_announce_validator, | |||
params.metrics_registry.as_ref(), | |||
boot_node_ids.clone(), | |||
metrics.as_ref().map(|m| m.notifications_queues_size.clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say we would like a Protocol
to expose more metrics in the future. In that case wouldn't it make more sense to passs a &Registry
down to Protocol::new
and have it register its own Metrics
struct?
This is a preference, not a requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My general idea about this code is that is would eventually completely disappear after #5481. I'm trying as much as possible to be fully predictable/deterministic and report everything that is worth notifying to the parent.
Poll::Pending | Poll::Ready(Ok(())) => {}, | ||
Poll::Pending | Poll::Ready(Ok(())) => | ||
if let Some(metric) = &self.queue_size_report { | ||
metric.observe(substream.queue_len() as f64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question is: should be really call observe() when we flush the queue? Should we not call it only when we push a message, so that the value represents the size that a queue had when we sent a message?
I don't think we should call observe
when flushing the queue. From the top of my head I don't see how that would help.
Reports on a
Histogram
, for each protocol, the queue size.This means that if we send for example 2 messages, then we will report 2 then 1 then 0, or 2 then 0. Therefore the "sum" is rather meaningless, as when sending 2 messages it could be either 3 or 2.
Instead, the idea is to use the "buckets" feature of Prometheus to see the number of times where we reached a certain value.
One question is: should be really call
observe()
when we flush the queue? Should we not call it only when we push a message, so that the value represents the size that a queue had when we sent a message?