-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report tracing_unbounded
channel size to prometheus
#1489
Conversation
} | ||
|
||
pub static SENT_LABEL: &'static str = "send"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub static SENT_LABEL: &'static str = "send"; | |
pub static SENT_LABEL: &'static str = "sent"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original label name was "send", and I'm not sure if it's a good idea to break the compatibility — there are probably a lot of automation scripts using this.
), | ||
&["entity", "action"], // name of channel, send|received|dropped | ||
).expect("Creating of statics doesn't fail. qed"); | ||
pub static ref UNBOUNDED_CHANNELS_SIZE: GenericGaugeVec<AtomicU64> = GenericGaugeVec::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure that a single gauge
will provide enough data here. Since a gauge is a simple counter cell, Prometheus will get just a single value on the next collection. However, we will not track any spikes of this value between the collection intervals. I would suggest to use histogram here, as it includes both Gauge functionality and buckets that will allow to track anomalities and peaks easily. On the other hand, it is more expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go with a simple gauge here, because it's hard to estimate the performance impact of using a histogram, considering we use channels extensively. So far this is merely a fix for the instant channel size calculation, which was done on the CI side previously using metrics not intended for this. But thanks for the suggestion anyway.
paritytech#1568) # Description Follow up for paritytech#1489. Closes paritytech#611 Before we calculated the channel size during alert expression but in paritytech#1489 a new metric was introduced that reports channel size. ## Changes: 1. updated alert rule to use new metric.
Fixes issue #611 by introducing a metric for the channel sizes.
Introduce the new metric
substrate_unbounded_channel_size
labeled by the channel name that contains the size of the channel (the number of messages in transit).