Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report tracing_unbounded channel size to prometheus #1489

Merged
merged 3 commits into from
Sep 12, 2023

Conversation

dmitry-markin
Copy link
Contributor

@dmitry-markin dmitry-markin commented Sep 11, 2023

Fixes issue #611 by introducing a metric for the channel sizes.

Introduce the new metric substrate_unbounded_channel_size labeled by the channel name that contains the size of the channel (the number of messages in transit).

@dmitry-markin dmitry-markin added the T0-node This PR/Issue is related to the topic “node”. label Sep 11, 2023
}

pub static SENT_LABEL: &'static str = "send";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub static SENT_LABEL: &'static str = "send";
pub static SENT_LABEL: &'static str = "sent";

Copy link
Contributor Author

@dmitry-markin dmitry-markin Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original label name was "send", and I'm not sure if it's a good idea to break the compatibility — there are probably a lot of automation scripts using this.

),
&["entity", "action"], // name of channel, send|received|dropped
).expect("Creating of statics doesn't fail. qed");
pub static ref UNBOUNDED_CHANNELS_SIZE: GenericGaugeVec<AtomicU64> = GenericGaugeVec::new(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure that a single gauge will provide enough data here. Since a gauge is a simple counter cell, Prometheus will get just a single value on the next collection. However, we will not track any spikes of this value between the collection intervals. I would suggest to use histogram here, as it includes both Gauge functionality and buckets that will allow to track anomalities and peaks easily. On the other hand, it is more expensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with a simple gauge here, because it's hard to estimate the performance impact of using a histogram, considering we use channels extensively. So far this is merely a fix for the instant channel size calculation, which was done on the CI side previously using metrics not intended for this. But thanks for the suggestion anyway.

@dmitry-markin dmitry-markin merged commit f5ca403 into master Sep 12, 2023
@dmitry-markin dmitry-markin deleted the dm-report-channel-size-to-prometheus branch September 12, 2023 11:38
BulatSaif added a commit that referenced this pull request Oct 16, 2023
#1568)

# Description

Follow up for #1489.
Closes #611 

Before we calculated the channel size during alert expression but in
#1489 a new metric was introduced that reports channel size.
## Changes:
1. updated alert rule to use new metric.
bgallois pushed a commit to duniter/duniter-polkadot-sdk that referenced this pull request Mar 25, 2024
bgallois pushed a commit to duniter/duniter-polkadot-sdk that referenced this pull request Mar 25, 2024
paritytech#1568)

# Description

Follow up for paritytech#1489.
Closes paritytech#611 

Before we calculated the channel size during alert expression but in
paritytech#1489 a new metric was introduced that reports channel size.
## Changes:
1. updated alert rule to use new metric.
bkchr pushed a commit that referenced this pull request Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T0-node This PR/Issue is related to the topic “node”.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants