Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

initial prometheus metrics #1536

Merged
merged 65 commits into from
Aug 18, 2020
Merged

initial prometheus metrics #1536

merged 65 commits into from
Aug 18, 2020

Conversation

ordian
Copy link
Member

@ordian ordian commented Aug 4, 2020

Part of #1482.

  • Number of statements signed
  • Number of bitfields signed
  • Number of availability chunks received
  • Number of candidates seconded
  • Number of collations generated
  • Number of Runtime API errors encountered

@ordian ordian added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Aug 4, 2020
ordian and others added 25 commits August 4, 2020 15:36
* master:
  Unalias Substrate Imports (#1530)
  Rewrite client handling (#1531)
  Integrate ChainApi with all messages (#1533)
  Add Rococo test network (#1363)
  Fix a typo parathreads -> parachains (#1529)
  Cleanup upcoming paras (#1527)
  Sudo wrapper for paras (#1517)
  Implementer's guide: notes on contextual execution (#1525)
  Companion for substrate/6782 (#1523)
  Sort out validation errors (#1516)
* master:
  Network Bridge Refactoring (#1535)
  Use DNS hostnames for westend bootnodes (#1528)
  Companion PR: add weightinfo for democracy (#1522)
* master:
  [CI] Fix draft release publishing (#1546)
  Bump Substrate, version (#1541)
  Update check_labels runtimenoteworthy label (#1540)
  revert enabling authority discovery by default (#1532)
* master:
  Companion PR to delaying network startup to after initialization (#1547)
  Add SyncOracle to network's Service (#1543)
  Ignore checks for companion PRs (#1455)
  Add an Origin to parachains v1 (#1542)
…trics

* origin/gav-upsub:
  Bumb substrate again
  Bump Substrate
* master:
  implement provisioner (#1473)
  Implementer's guide: downward messages and HRMP, take 2 (#1503)
  Revert "Ignore checks for companion PRs (#1455)" (#1549)
Copy link
Contributor

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. I have a couple smaller comments.

Regarding naming, I've read https://prometheus.io/docs/practices/naming/, but it's not clear if all metrics should have a prefix like "parachains_" (I guess there is namespacing in prometheus?).

I think prefixing each parachain related metric with parachain is a good idea. Same is done for Substrate networking with sub_libp2p.

node/core/av-store/src/lib.rs Outdated Show resolved Hide resolved
node/core/backing/src/lib.rs Outdated Show resolved Hide resolved
node/overseer/src/lib.rs Outdated Show resolved Hide resolved
node/overseer/src/lib.rs Outdated Show resolved Hide resolved
@@ -195,6 +197,9 @@ pub trait SubsystemContext: Send + 'static {
/// [`Overseer`]: struct.Overseer.html
/// [`Subsystem`]: trait.Subsystem.html
pub trait Subsystem<C: SubsystemContext> {
/// Subsystem-specific prometheus metrics.
type Metrics: metrics::Metrics;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for making Metrics a core component of Subsystems. I think this is a very clean approach.

In the future we should probably have the discussion whether our way of recording metrics should trickle down further than the level of Subsystem or whether the core logic should stay generic. E.g. sc-network handles all metric recording within NetworkWorker. Lower level components bubble up the data needed to feed these metrics. On the other hand finality-grandpa passes down a metrics registry all the way to bottom components.

node/subsystem/src/lib.rs Outdated Show resolved Hide resolved
@ordian
Copy link
Member Author

ordian commented Aug 12, 2020

Another thing that was mentioned in #1482 is alerts, this can be done in a separate PR as well.

@mxinden
Copy link
Contributor

mxinden commented Aug 14, 2020

Another thing that was mentioned in #1482 is alerts, this can be done in a separate PR as well.

Yes, agree that this can happen in another pull request. You can take a look at Substrate alerting-rules.yaml and the corresponding test file as an example.

node/core/candidate-validation/src/lib.rs Outdated Show resolved Hide resolved
node/core/chain-api/src/lib.rs Outdated Show resolved Hide resolved
node/core/provisioner/src/lib.rs Outdated Show resolved Hide resolved
node/core/runtime-api/src/lib.rs Outdated Show resolved Hide resolved
node/subsystem/src/lib.rs Outdated Show resolved Hide resolved
Co-authored-by: Max Inden <mail@max-inden.de>
Copy link
Contributor

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from a metrics point of view. Thanks for the work @ordian!

I am not too familiar with the subsystem system, thus would anyone more familiar be able to take a look as well?

* master:
  Make parachain validation wasm executor functional (#1574)
  Use async test helper to simplify node testing (#1578)
  guide: validation data refactoring (#1576)
  Remove v0 parachains runtime (#1501)
  [CI] Add github token to generate-release-text (#1581)
  Allow using any polkadot client instead of enum Client (#1575)
  service/src/lib: Update authority discovery construction (#1563)
  Update .editorconfig to what we have in practice (#1545)
  Companion PR for substrate #6672 (#1560)
  pre-redenomination tockenSymbol change (#1561)
@rphmeier rphmeier requested a review from montekki August 17, 2020 11:11
* master:
  Companion PR for #6862 (#1564)
  implement collation generation subsystem (#1557)
  Add spawn_blocking to SubsystemContext (#1570)
  Companion PR for #6846 (#1568)
  overseer: add a test to ensure all subsystem receive msgs (#1590)
  Implementer's Guide: Flesh out more details for upward messages (#1556)
let subsystem = CollationGenerationSubsystem { config: None, metrics: self.metrics };

let future = Box::pin(subsystem.run(ctx));
let future = Box::pin(self.run(ctx));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coriolinus this is safe since there is no way to construct a subsystem with Some(config).

Comment on lines +458 to +463
/// Overseer Prometheus metrics.
#[derive(Clone)]
struct MetricsInner {
activated_heads_total: prometheus::Counter<prometheus::U64>,
deactivated_heads_total: prometheus::Counter<prometheus::U64>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the overseer I would probably also account for things such as number of messages relayed between subsystems.

@ordian
Copy link
Member Author

ordian commented Aug 18, 2020

I'll merge the PR as is and comment in the issue on the remaining metrics.

bot merge

@ordian ordian merged commit 804958a into master Aug 18, 2020
@ordian ordian deleted the ao-prometheus-metrics branch August 18, 2020 09:18
ordian added a commit that referenced this pull request Aug 19, 2020
…n-race-condition

* master:
  initial prometheus metrics (#1536)
  Companion for Substrate 6868 (WeightInfo for System, Utility, and Timestamp) (#1606)
  Promote `HrmpChannelId` and supply more docs (#1595)
  move AssignmentKind and CoreAssigment to scheduler (#1571)
  overseer: fix build (#1596)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants