Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: prefix metrics with "ipc" #1172

Merged
merged 2 commits into from
Oct 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 56 additions & 56 deletions docs/fendermint/observability.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,47 +9,47 @@ This is achieved through the use of the `ipc-observability` crate/library, which
### How it works

1. **Events**: Specific events are defined and triggered throughout the codebase to capture significant occurrences or actions.
These events encapsulate relevant data and context about what is happening within the system.
These events encapsulate relevant data and context about what is happening within the system.

2. **Journal**: Events are recorded in a journal, which is a rotational ledger that records chronologically ordered, timestamped trace objects to log files on disk.
The journal can also be emitted to console.
The journal can also be emitted to console.

3. **Metrics**: Each event is associated with one or more Prometheus metrics.
When an event is triggered, the corresponding metrics are updated to reflect the event's occurrence.
This allows for real-time tracking and monitoring of various system activities and states through dashboards and alerts.
When an event is triggered, the corresponding metrics are updated to reflect the event's occurrence.
This allows for real-time tracking and monitoring of various system activities and states through dashboards and alerts.

4. **Prometheus integration**: The metrics collected are designed to integrate seamlessly with Prometheus, a powerful monitoring and alerting toolkit.
Prometheus collects and stores these metrics, enabling detailed analysis and visualization through its query language and dashboarding capabilities.
Prometheus collects and stores these metrics, enabling detailed analysis and visualization through its query language and dashboarding capabilities.

5. **ipc-observability crate**: This custom library encapsulates the logic and functionality required to define, trigger, and record events and metrics.
It simplifies the process of adding observability to the codebase by providing ready-to-use macros, structs, and functions.
It simplifies the process of adding observability to the codebase by providing ready-to-use macros, structs, and functions.

## Metrics

- `consensus_block_proposal_received_height` (IntGauge): Incremented when a block proposal is received.
- `consensus_block_proposal_sent_height` (IntGauge): Incremented when a block proposal is sent.
- `consensus_block_proposal_accepted_height` (IntGauge): Incremented if the block proposal is accepted.
- `consensus_block_proposal_rejected_height` (IntGauge): Incremented if the block proposal is rejected.
- `consensus_block_committed_height` (IntGauge): Incremented when a block is committed.
- `exec_fvm_check_execution_time_secs` (Histogram): Records the execution time of FVM check in seconds.
- `exec_fvm_estimate_execution_time_secs` (Histogram): Records the execution time of FVM estimate in seconds.
- `exec_fvm_apply_execution_time_secs` (Histogram): Records the execution time of FVM apply in seconds.
- `exec_fvm_call_execution_time_secs` (Histogram): Records the execution time of FVM call in seconds.
- `bottomup_checkpoint_created_total` (IntCounter): Incremented when a bottom-up checkpoint is created.
- `bottomup_checkpoint_created_height` (IntGauge): Sets the height of the created checkpoint.
- `bottomup_checkpoint_created_msgcount` (IntGauge): Sets the number of messages in the created checkpoint.
- `bottomup_checkpoint_created_confignum` (IntGauge): Sets the configuration number of the created checkpoint.
- `bottomup_checkpoint_signed_height` (IntGaugeVec): Sets the height of the signed checkpoint, labeled by validator.
- `bottomup_checkpoint_finalized_height` (IntGauge): Sets the height of the finalized checkpoint.
- `topdown_parent_rpc_call_total` (IntCounterVec): Incremented when a parent RPC call is made.
- `topdown_parent_rpc_call_latency_secs` (HistogramVec): Records the latency of parent RPC calls.
- `topdown_parent_finality_latest_acquired_height` (IntGaugeVec): Sets the height of the latest locally acquired parent finality.
- `topdown_parent_finality_voting_latest_received_height` (IntGaugeVec): Sets the height of the received parent finality peer vote.
- `topdown_parent_finality_voting_latest_sent_height` (IntGauge): Sets the height of the sent parent finality peer vote.
- `topdown_parent_finality_voting_quorum_height` (IntGauge): Sets the height of the parent finality quorum.
- `topdown_parent_finality_voting_quorum_weight` (IntGauge): Sets the weight of the parent finality quorum.
- `topdown_parent_finality_committed_height` (IntGauge): Sets the height of the committed parent finality.
- `tracing_errors` (IntCounterVec): Increments the count of tracing errors for the affected event.
- `ipc_consensus_block_proposal_received_height` (IntGauge): Incremented when a block proposal is received.
- `ipc_consensus_block_proposal_sent_height` (IntGauge): Incremented when a block proposal is sent.
- `ipc_consensus_block_proposal_accepted_height` (IntGauge): Incremented if the block proposal is accepted.
- `ipc_consensus_block_proposal_rejected_height` (IntGauge): Incremented if the block proposal is rejected.
- `ipc_consensus_block_committed_height` (IntGauge): Incremented when a block is committed.
- `ipc_exec_fvm_check_execution_time_secs` (Histogram): Records the execution time of FVM check in seconds.
- `ipc_exec_fvm_estimate_execution_time_secs` (Histogram): Records the execution time of FVM estimate in seconds.
- `ipc_exec_fvm_apply_execution_time_secs` (Histogram): Records the execution time of FVM apply in seconds.
- `ipc_exec_fvm_call_execution_time_secs` (Histogram): Records the execution time of FVM call in seconds.
- `ipc_bottomup_checkpoint_created_total` (IntCounter): Incremented when a bottom-up checkpoint is created.
- `ipc_bottomup_checkpoint_created_height` (IntGauge): Sets the height of the created checkpoint.
- `ipc_bottomup_checkpoint_created_msgcount` (IntGauge): Sets the number of messages in the created checkpoint.
- `ipc_bottomup_checkpoint_created_confignum` (IntGauge): Sets the configuration number of the created checkpoint.
- `ipc_bottomup_checkpoint_signed_height` (IntGaugeVec): Sets the height of the signed checkpoint, labeled by validator.
- `ipc_bottomup_checkpoint_finalized_height` (IntGauge): Sets the height of the finalized checkpoint.
- `ipc_topdown_parent_rpc_call_total` (IntCounterVec): Incremented when a parent RPC call is made.
- `ipc_topdown_parent_rpc_call_latency_secs` (HistogramVec): Records the latency of parent RPC calls.
- `ipc_topdown_parent_finality_latest_acquired_height` (IntGaugeVec): Sets the height of the latest locally acquired parent finality.
- `ipc_topdown_parent_finality_voting_latest_received_height` (IntGaugeVec): Sets the height of the received parent finality peer vote.
- `ipc_topdown_parent_finality_voting_latest_sent_height` (IntGauge): Sets the height of the sent parent finality peer vote.
- `ipc_topdown_parent_finality_voting_quorum_height` (IntGauge): Sets the height of the parent finality quorum.
- `ipc_topdown_parent_finality_voting_quorum_weight` (IntGauge): Sets the weight of the parent finality quorum.
- `ipc_topdown_parent_finality_committed_height` (IntGauge): Sets the height of the committed parent finality.
- `ipc_tracing_errors` (IntCounterVec): Increments the count of tracing errors for the affected event.

## Events and corresponding metrics

Expand All @@ -68,7 +68,7 @@ Represents a block proposal received event.

**Affects metrics:**

- `consensus_block_proposal_received_height`
- `ipc_consensus_block_proposal_received_height`

### BlockProposalSent

Expand All @@ -84,7 +84,7 @@ Represents a block proposal sent event.

**Affects metrics:**

- `consensus_block_proposal_sent_height`
- `ipc_consensus_block_proposal_sent_height`

### BlockProposalEvaluated

Expand All @@ -103,8 +103,8 @@ Represents the evaluation of a block proposal.

**Affects metrics:**

- `consensus_block_proposal_accepted_height`
- `consensus_block_proposal_rejected_height`
- `ipc_consensus_block_proposal_accepted_height`
- `ipc_consensus_block_proposal_rejected_height`

### BlockCommitted

Expand All @@ -118,7 +118,7 @@ Represents a block committed event.

**Affects metrics:**

- `consensus_block_committed_height`
- `ipc_consensus_block_committed_height`

### MsgExec

Expand All @@ -135,10 +135,10 @@ Represents an execution message for different purposes.

**Affects metrics:**

- `exec_fvm_check_execution_time_secs`
- `exec_fvm_estimate_execution_time_secs`
- `exec_fvm_apply_execution_time_secs`
- `exec_fvm_call_execution_time_secs`
- `ipc_exec_fvm_check_execution_time_secs`
- `ipc_exec_fvm_estimate_execution_time_secs`
- `ipc_exec_fvm_apply_execution_time_secs`
- `ipc_exec_fvm_call_execution_time_secs`

### CheckpointCreated

Expand All @@ -154,10 +154,10 @@ Represents the creation of a bottom-up checkpoint.

**Affects metrics:**

- `bottomup_checkpoint_created_total`
- `bottomup_checkpoint_created_height`
- `bottomup_checkpoint_created_msgcount`
- `bottomup_checkpoint_created_confignum`
- `ipc_bottomup_checkpoint_created_total`
- `ipc_bottomup_checkpoint_created_height`
- `ipc_bottomup_checkpoint_created_msgcount`
- `ipc_bottomup_checkpoint_created_confignum`

### CheckpointSigned

Expand Down Expand Up @@ -187,7 +187,7 @@ Represents the finalization of a bottom-up checkpoint.

**Affects metrics:**

- `bottomup_checkpoint_finalized_height`
- `ipc_bottomup_checkpoint_finalized_height`

### ParentRpcCalled

Expand All @@ -204,8 +204,8 @@ Represents a parent RPC call.

**Affects metrics:**

- `topdown_parent_rpc_call_total`
- `topdown_parent_rpc_call_latency_secs`
- `ipc_topdown_parent_rpc_call_total`
- `ipc_topdown_parent_rpc_call_latency_secs`

### ParentFinalityAcquired

Expand All @@ -224,7 +224,7 @@ Represents the acquisition of parent finality.

**Affects metrics:**

- `topdown_parent_finality_latest_acquired_height`
- `ipc_topdown_parent_finality_latest_acquired_height`

### ParentFinalityPeerVoteReceived

Expand All @@ -240,7 +240,7 @@ Represents the reception of a parent finality peer vote.

**Affects metrics:**

- `topdown_parent_finality_voting_latest_received_height`
- `ipc_topdown_parent_finality_voting_latest_received_height`

### ParentFinalityPeerVoteSent

Expand All @@ -255,7 +255,7 @@ Represents the sending of a parent finality peer vote.

**Affects metrics:**

- `topdown_parent_finality_voting_latest_sent_height`
- `ipc_topdown_parent_finality_voting_latest_sent_height`

### ParentFinalityPeerQuorumReached

Expand All @@ -271,8 +271,8 @@ Represents the reaching of a parent finality quorum.

**Affects metrics:**

- `topdown_parent_finality_voting_quorum_height`
- `topdown_parent_finality_voting_quorum_weight`
- `ipc_topdown_parent_finality_voting_quorum_height`
- `ipc_topdown_parent_finality_voting_quorum_weight`

### ParentFinalityCommitted

Expand All @@ -288,7 +288,7 @@ Represents the commitment of parent finality.

**Affects metrics:**

- `topdown_parent_finality_committed_height`
- `ipc_topdown_parent_finality_committed_height`

### TracingError

Expand All @@ -302,7 +302,7 @@ Represents an error that occurs during tracing.

**Affects metrics:**

- `tracing_errors`
- `ipc_tracing_errors`

## Configuration

Expand Down Expand Up @@ -330,7 +330,7 @@ enabled = true

> 🚧 Note: the event journal and general logs are currently output to the same file.
> We plan to segregate in the near future so that the event journal has its dedicated file.
> See this issue: https://github.com/consensus-shipyard/ipc/issues/1084.
> See this issue: https://github.com/consensus-shipyard/ipc/issues/1084.

Tracing can also be configured via the configuration file for Fendermint. You can set the tracing level and specify whether to log to console or file.

Expand All @@ -342,7 +342,7 @@ Example config:
[tracing]

[tracing.console]
level = "trace" # Options: off, error, warn, info, debug, trace (default: trace)
level = "trace" # Eg. "info,my_crate::module=trace" - https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html#directives
```

### File tracing
Expand All @@ -352,7 +352,7 @@ Example config:
```toml
[tracing.file]
enabled = true # Options: true, false
level = "trace" # Options: off, error, warn, info, debug, trace (default: trace)
level = "trace" # Eg. "info,my_crate::module=trace" - https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html#directives
directory = "/path/to/log/directory"
max_log_files = 5 # Number of files to keep after rotation
rotation = "daily" # Options: minutely, hourly, daily, never
Expand Down
11 changes: 8 additions & 3 deletions fendermint/app/config/test.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,7 @@ kind = "ethereum"
subnet_id = "/r31415926"

[resolver.membership]
static_subnets = [
"/r31415926/f2xwzbdu7z5sam6hc57xxwkctciuaz7oe5omipwbq",
]
static_subnets = ["/r31415926/f2xwzbdu7z5sam6hc57xxwkctciuaz7oe5omipwbq"]

[resolver.discovery]
static_addresses = [
Expand All @@ -30,3 +28,10 @@ external_addresses = [

[testing]
push_chain_meta = false

[tracing.file]
enabled = true
level = "debug"
directory = "logs"
max_log_files = 4
rotation = "minutely"
3 changes: 2 additions & 1 deletion fendermint/app/src/cmd/run.rs
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,8 @@ async fn run(settings: Settings) -> anyhow::Result<()> {

// Prometheus metrics
let metrics_registry = if settings.metrics.enabled {
let registry = prometheus::Registry::new();
let registry = prometheus::Registry::new_custom(Some("ipc".to_string()), None)
.context("failed to create Prometheus registry")?;

register_default_metrics(&registry).context("failed to register default metrics")?;
register_topdown_metrics(&registry).context("failed to register topdown metrics")?;
Expand Down
Loading
Loading