Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: simplify collected metrics #21963

Merged
merged 3 commits into from
Sep 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,10 @@ Every module contains its own CHANGELOG.md. Please refer to the module you are i
### Improvements

* (sims) [#21613](https://github.com/cosmos/cosmos-sdk/pull/21613) Add sims2 framework and factory methods for simpler message factories in modules
* (modules) [#21963](https://github.com/cosmos/cosmos-sdk/pull/21963) Duplicatable metrics are no more collected in modules. They were unecessary overhead.

### Bug Fixes

* (sims) [#21952](https://github.com/cosmos/cosmos-sdk/pull/21952) Use liveness matrix for validator sign status in sims
* (sims) [#21906](https://github.com/cosmos/cosmos-sdk/pull/21906) Skip sims test when running dry on validators
* (cli) [#21919](https://github.com/cosmos/cosmos-sdk/pull/21919) Query address-by-acc-num by account_id instead of id.
Expand Down
77 changes: 25 additions & 52 deletions docs/learn/advanced/09-telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,18 @@ their application through the use of the `telemetry` package. To enable telemetr

The Cosmos SDK currently supports enabling in-memory and prometheus as telemetry sinks. In-memory sink is always attached (when the telemetry is enabled) with 10 second interval and 1 minute retention. This means that metrics will be aggregated over 10 seconds, and metrics will be kept alive for 1 minute.

To query active metrics (see retention note above) you have to enable API server (`api.enabled = true` in the app.toml). Single API endpoint is exposed: `http://localhost:1317/metrics?format={text|prometheus}`, the default being `text`.
To query active metrics (see retention note above) you have to enable API server (`api.enabled = true` in the app.toml). Single API endpoint is exposed: `http://localhost:1317/metrics?format={text|prometheus}` (or port `1318` in v2) , the default being `text`.

## Emitting metrics

If telemetry is enabled via configuration, a single global metrics collector is registered via the
[go-metrics](https://github.com/hashicorp/go-metrics) library. This allows emitting and collecting
metrics through simple [API](https://github.com/cosmos/cosmos-sdk/blob/v0.50.0-alpha.0/telemetry/wrapper.go). Example:
metrics through simple [API](https://github.com/cosmos/cosmos-sdk/blob/v0.50.10/telemetry/wrapper.go). Example:

```go
func EndBlocker(ctx sdk.Context, k keeper.Keeper) {
defer telemetry.ModuleMeasureSince(types.ModuleName, time.Now(), telemetry.MetricKeyEndBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyEndBlocker)

// ...
}
Expand Down Expand Up @@ -69,60 +70,32 @@ Consider the following examples with enough granularity and adequate cardinality
* begin/end blocker time
* tx gas used
* block gas used
* amount of tokens minted
* amount of accounts created

The following examples expose too much cardinality and may not even prove to be useful:

* transfers between accounts with amount
* voting/deposit amount from unique addresses

## Idempotency

Metrics aren't idempotent, so if a metric is emitted twice, it will be counted twice.
This is important to keep in mind when collecting metrics. If a module is called twice, the metrics will be emitted twice (for instance in `CheckTx`, `SimulateTx` or `DeliverTx`).

## Supported Metrics

| Metric | Description | Unit | Type |
|:--------------------------------|:------------------------------------------------------------------------------------------|:----------------|:--------|
| `tx_count` | Total number of txs processed via `DeliverTx` | tx | counter |
| `tx_successful` | Total number of successful txs processed via `DeliverTx` | tx | counter |
| `tx_failed` | Total number of failed txs processed via `DeliverTx` | tx | counter |
| `tx_gas_used` | The total amount of gas used by a tx | gas | gauge |
| `tx_gas_wanted` | The total amount of gas requested by a tx | gas | gauge |
| `tx_msg_send` | The total amount of tokens sent in a `MsgSend` (per denom) | token | gauge |
| `tx_msg_withdraw_reward` | The total amount of tokens withdrawn in a `MsgWithdrawDelegatorReward` (per denom) | token | gauge |
| `tx_msg_withdraw_commission` | The total amount of tokens withdrawn in a `MsgWithdrawValidatorCommission` (per denom) | token | gauge |
| `tx_msg_delegate` | The total amount of tokens delegated in a `MsgDelegate` | token | gauge |
| `tx_msg_begin_unbonding` | The total amount of tokens undelegated in a `MsgUndelegate` | token | gauge |
| `tx_msg_begin_begin_redelegate` | The total amount of tokens redelegated in a `MsgBeginRedelegate` | token | gauge |
| `tx_msg_ibc_transfer` | The total amount of tokens transferred via IBC in a `MsgTransfer` (source or sink chain) | token | gauge |
| `ibc_transfer_packet_receive` | The total amount of tokens received in a `FungibleTokenPacketData` (source or sink chain) | token | gauge |
| `new_account` | Total number of new accounts created | account | counter |
| `gov_proposal` | Total number of governance proposals | proposal | counter |
| `gov_vote` | Total number of governance votes for a proposal | vote | counter |
| `gov_deposit` | Total number of governance deposits for a proposal | deposit | counter |
| `staking_delegate` | Total number of delegations | delegation | counter |
| `staking_undelegate` | Total number of undelegations | undelegation | counter |
| `staking_redelegate` | Total number of redelegations | redelegation | counter |
| `ibc_transfer_send` | Total number of IBC transfers sent from a chain (source or sink) | transfer | counter |
| `ibc_transfer_receive` | Total number of IBC transfers received to a chain (source or sink) | transfer | counter |
| `ibc_client_create` | Total number of clients created | create | counter |
| `ibc_client_update` | Total number of client updates | update | counter |
| `ibc_client_upgrade` | Total number of client upgrades | upgrade | counter |
| `ibc_client_misbehaviour` | Total number of client misbehaviours | misbehaviour | counter |
| `ibc_connection_open-init` | Total number of connection `OpenInit` handshakes | handshake | counter |
| `ibc_connection_open-try` | Total number of connection `OpenTry` handshakes | handshake | counter |
| `ibc_connection_open-ack` | Total number of connection `OpenAck` handshakes | handshake | counter |
| `ibc_connection_open-confirm` | Total number of connection `OpenConfirm` handshakes | handshake | counter |
| `ibc_channel_open-init` | Total number of channel `OpenInit` handshakes | handshake | counter |
| `ibc_channel_open-try` | Total number of channel `OpenTry` handshakes | handshake | counter |
| `ibc_channel_open-ack` | Total number of channel `OpenAck` handshakes | handshake | counter |
| `ibc_channel_open-confirm` | Total number of channel `OpenConfirm` handshakes | handshake | counter |
| `ibc_channel_close-init` | Total number of channel `CloseInit` handshakes | handshake | counter |
| `ibc_channel_close-confirm` | Total number of channel `CloseConfirm` handshakes | handshake | counter |
| `tx_msg_ibc_recv_packet` | Total number of IBC packets received | packet | counter |
| `tx_msg_ibc_acknowledge_packet` | Total number of IBC packets acknowledged | acknowledgement | counter |
| `ibc_timeout_packet` | Total number of IBC timeout packets | timeout | counter |
| `store_iavl_get` | Duration of an IAVL `Store#Get` call | ms | summary |
| `store_iavl_set` | Duration of an IAVL `Store#Set` call | ms | summary |
| `store_iavl_has` | Duration of an IAVL `Store#Has` call | ms | summary |
| `store_iavl_delete` | Duration of an IAVL `Store#Delete` call | ms | summary |
| `store_iavl_commit` | Duration of an IAVL `Store#Commit` call | ms | summary |
| `store_iavl_query` | Duration of an IAVL `Store#Query` call | ms | summary |
| Metric | Description | Unit | Type |
| ------------------- | ------------------------------------------------------------------------------ | ---- | ------- |
| `tx_count` | Total number of txs processed via `DeliverTx` | tx | counter |
| `tx_successful` | Total number of successful txs processed via `DeliverTx` | tx | counter |
| `tx_failed` | Total number of failed txs processed via `DeliverTx` | tx | counter |
| `tx_gas_used` | The total amount of gas used by a tx | gas | gauge |
| `tx_gas_wanted` | The total amount of gas requested by a tx | gas | gauge |
| `store_iavl_get` | Duration of an IAVL `Store#Get` call | ms | summary |
| `store_iavl_set` | Duration of an IAVL `Store#Set` call | ms | summary |
| `store_iavl_has` | Duration of an IAVL `Store#Has` call | ms | summary |
| `store_iavl_delete` | Duration of an IAVL `Store#Delete` call | ms | summary |
| `store_iavl_commit` | Duration of an IAVL `Store#Commit` call | ms | summary |
| `store_iavl_query` | Duration of an IAVL `Store#Query` call | ms | summary |
| `begin_blocker` | Duration of the `BeginBlock` call per module | ms | summary |
| `end_blocker` | Duration of the `EndBlock` call per module | ms | summary |
| `server_info` | Information about the server, such as version, commit, and build date, upgrade | - | gauge |
8 changes: 3 additions & 5 deletions telemetry/wrapper.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,9 @@ import (

// Common metric key constants
const (
MetricKeyBeginBlocker = "begin_blocker"
MetricKeyEndBlocker = "end_blocker"
MetricKeyPrepareCheckStater = "prepare_check_stater"
MetricKeyPrecommiter = "precommiter"
MetricLabelNameModule = "module"
MetricKeyBeginBlocker = "begin_blocker"
MetricKeyEndBlocker = "end_blocker"
MetricLabelNameModule = "module"
)

// NewLabel creates a new instance of Label with name and value
Expand Down
2 changes: 1 addition & 1 deletion x/bank/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ require (
github.com/golang/mock v1.6.0
github.com/golang/protobuf v1.5.4
github.com/grpc-ecosystem/grpc-gateway v1.16.0
github.com/hashicorp/go-metrics v0.5.3
github.com/hashicorp/go-metrics v0.5.3 // indirect
github.com/spf13/cobra v1.8.1
github.com/stretchr/testify v1.9.0
google.golang.org/genproto/googleapis/api v0.0.0-20240814211410-ddb44dafa142
Expand Down
15 changes: 0 additions & 15 deletions x/bank/keeper/msg_server.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,9 @@ package keeper
import (
"context"

"github.com/hashicorp/go-metrics"

errorsmod "cosmossdk.io/errors"
"cosmossdk.io/x/bank/types"

"github.com/cosmos/cosmos-sdk/telemetry"
sdk "github.com/cosmos/cosmos-sdk/types"
sdkerrors "github.com/cosmos/cosmos-sdk/types/errors"
)
Expand Down Expand Up @@ -65,18 +62,6 @@ func (k msgServer) Send(ctx context.Context, msg *types.MsgSend) (*types.MsgSend
return nil, err
}

defer func() {
for _, a := range msg.Amount {
if a.Amount.IsInt64() {
telemetry.SetGaugeWithLabels(
[]string{"tx", "msg", "send"},
float32(a.Amount.Int64()),
[]metrics.Label{telemetry.NewLabel("denom", a.Denom)},
)
}
}
}()

return &types.MsgSendResponse{}, nil
}

Expand Down
14 changes: 0 additions & 14 deletions x/bank/v2/keeper/handlers.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,12 @@ import (
"errors"
"fmt"

"github.com/hashicorp/go-metrics"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"

errorsmod "cosmossdk.io/errors"
"cosmossdk.io/x/bank/v2/types"

"github.com/cosmos/cosmos-sdk/telemetry"
sdk "github.com/cosmos/cosmos-sdk/types"
sdkerrors "github.com/cosmos/cosmos-sdk/types/errors"
)
Expand Down Expand Up @@ -85,18 +83,6 @@ func (h handlers) MsgSend(ctx context.Context, msg *types.MsgSend) (*types.MsgSe
return nil, err
}

defer func() {
for _, a := range msg.Amount {
if a.Amount.IsInt64() {
telemetry.SetGaugeWithLabels(
[]string{"tx", "msg", "send"},
float32(a.Amount.Int64()),
[]metrics.Label{telemetry.NewLabel("denom", a.Denom)},
)
}
}
}()

return &types.MsgSendResponse{}, nil
}

Expand Down
2 changes: 1 addition & 1 deletion x/distribution/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ require (
github.com/golang/mock v1.6.0
github.com/golang/protobuf v1.5.4
github.com/grpc-ecosystem/grpc-gateway v1.16.0
github.com/hashicorp/go-metrics v0.5.3
github.com/pkg/errors v0.9.1
github.com/spf13/cobra v1.8.1
github.com/stretchr/testify v1.9.0
Expand Down Expand Up @@ -92,6 +91,7 @@ require (
github.com/gsterjov/go-libsecret v0.0.0-20161001094733-a6f4afe4910c // indirect
github.com/hashicorp/go-hclog v1.6.3 // indirect
github.com/hashicorp/go-immutable-radix v1.3.1 // indirect
github.com/hashicorp/go-metrics v0.5.3 // indirect
github.com/hashicorp/go-plugin v1.6.1 // indirect
github.com/hashicorp/golang-lru v1.0.2 // indirect
github.com/hashicorp/golang-lru/v2 v2.0.7 // indirect
Expand Down
3 changes: 2 additions & 1 deletion x/distribution/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ import (
// BeginBlocker sets the proposer for determining distribution during endblock
// and distribute rewards for the previous block.
func (k Keeper) BeginBlocker(ctx context.Context) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, telemetry.Now(), telemetry.MetricKeyBeginBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyBeginBlocker)

// determine the total power signing the block
var previousTotalPower int64
Expand Down
27 changes: 0 additions & 27 deletions x/distribution/keeper/msg_server.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,9 @@ import (
"context"
"fmt"

"github.com/hashicorp/go-metrics"

"cosmossdk.io/errors"
"cosmossdk.io/x/distribution/types"

"github.com/cosmos/cosmos-sdk/telemetry"
sdk "github.com/cosmos/cosmos-sdk/types"
sdkerrors "github.com/cosmos/cosmos-sdk/types/errors"
)
Expand Down Expand Up @@ -61,18 +58,6 @@ func (k msgServer) WithdrawDelegatorReward(ctx context.Context, msg *types.MsgWi
return nil, err
}

defer func() {
for _, a := range amount {
if a.Amount.IsInt64() {
telemetry.SetGaugeWithLabels(
[]string{"tx", "msg", "withdraw_reward"},
float32(a.Amount.Int64()),
[]metrics.Label{telemetry.NewLabel("denom", a.Denom)},
)
}
}
}()

return &types.MsgWithdrawDelegatorRewardResponse{Amount: amount}, nil
}

Expand All @@ -87,18 +72,6 @@ func (k msgServer) WithdrawValidatorCommission(ctx context.Context, msg *types.M
return nil, err
}

defer func() {
for _, a := range amount {
if a.Amount.IsInt64() {
telemetry.SetGaugeWithLabels(
[]string{"tx", "msg", "withdraw_commission"},
float32(a.Amount.Int64()),
[]metrics.Label{telemetry.NewLabel("denom", a.Denom)},
)
}
}
}()

return &types.MsgWithdrawValidatorCommissionResponse{Amount: amount}, nil
}

Expand Down
3 changes: 2 additions & 1 deletion x/epochs/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ import (

// BeginBlocker of epochs module.
func (k Keeper) BeginBlocker(ctx context.Context) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, telemetry.Now(), telemetry.MetricKeyBeginBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyBeginBlocker)

headerInfo := k.HeaderService.HeaderInfo(ctx)
err := k.EpochInfo.Walk(
Expand Down
3 changes: 2 additions & 1 deletion x/evidence/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ import (
// BeginBlocker iterates through and handles any newly discovered evidence of
// misbehavior submitted by CometBFT. Currently, only equivocation is handled.
func (k Keeper) BeginBlocker(ctx context.Context, cometService comet.Service) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, telemetry.Now(), telemetry.MetricKeyBeginBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyBeginBlocker)

bi := cometService.CometInfo(ctx)

Expand Down
3 changes: 2 additions & 1 deletion x/gov/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ import (

// EndBlocker is called every block.
func (k Keeper) EndBlocker(ctx context.Context) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, telemetry.Now(), telemetry.MetricKeyEndBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyEndBlocker)

// delete dead proposals from store and returns theirs deposits.
// A proposal is dead when it's inactive and didn't get enough deposit on time to get into voting phase.
Expand Down
7 changes: 7 additions & 0 deletions x/group/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,18 @@ package keeper

import (
"context"

"cosmossdk.io/x/gov/types"

"github.com/cosmos/cosmos-sdk/telemetry"
)

// EndBlocker called at every block, updates proposal's `FinalTallyResult` and
// prunes expired proposals.
func (k Keeper) EndBlocker(ctx context.Context) error {
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyEndBlocker)

if err := k.TallyProposalsAtVPEnd(ctx); err != nil {
return err
}
Expand Down
3 changes: 2 additions & 1 deletion x/mint/keeper/abci.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ import (

// BeginBlocker mints new tokens for the previous block.
func (k Keeper) BeginBlocker(ctx context.Context) error {
defer telemetry.ModuleMeasureSince(types.ModuleName, telemetry.Now(), telemetry.MetricKeyBeginBlocker)
start := telemetry.Now()
defer telemetry.ModuleMeasureSince(types.ModuleName, start, telemetry.MetricKeyBeginBlocker)

// fetch stored minter & params
minter, err := k.Minter.Get(ctx)
Expand Down
Loading
Loading