Skip to content

Commit

Permalink
Merge pull request #2993 from xiang90/md
Browse files Browse the repository at this point in the history
doc: add doc for metrics feature
  • Loading branch information
xiang90 committed Jun 16, 2015
2 parents cdcae2d + c599e81 commit 2efbc76
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 0 deletions.
4 changes: 4 additions & 0 deletions Documentation/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ The node in each member follows raft consensus protocol to replicate logs. Clust

Peer is another member of the same cluster.

### Proposal

A proposal is a request (for example a write request, a configuration change request) that needs to go through raft protocol.

### Client

Client is a caller of the cluster's HTTP API.
Expand Down
48 changes: 48 additions & 0 deletions Documentation/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
## Metrics

**NOTE: The metrics feature is considered as an experimental. We might add/change/remove metrics without warning in the future releases.**

etcd uses [Prometheus](http://prometheus.io/) for metrics reporting in the server. The metrics can be used for real-time monitoring and debugging.

The simplest way to see the available metrics is to cURL the metrics endpoint `/metrics` of etcd. The format is described [here](http://prometheus.io/docs/instrumenting/exposition_formats/).


You can also follow the doc [here](http://prometheus.io/docs/introduction/getting_started/) to start a Promethus server and monitor etcd metrics.

The naming of metrics follows the suggested [best practice of Promethus](http://prometheus.io/docs/practices/naming/). A metric name has an `etcd` prefix as its namespace and a subsystem prefix (for example `wal` and `etcdserver`).

etcd now exposes the following metrics:

### etcdserver

| Name | Description | Type |
|-----------------------------------------|--------------------------------------------------|---------|
| file_descriptors_used_total | The total number of file descriptors used | Gauge |
| proposal_durations_milliseconds | The latency distributions of committing proposal | Summary |
| pending_proposal_total | The total number of pending proposals | Gauge |
| proposal_failed_total | The total number of failed proposals | Counter |

High file descriptors (`file_descriptors_used_total`) usage (near the file descriptors limitation of the process) indicates a potential out of file descriptors issue. That might cause etcd fails to create new WAL files and panics.

[Proposal](glossary.md#proposal) durations (`proposal_durations_milliseconds`) give you an summary about the proposal commit latency. Latency can be introduced into this process by network and disk IO.

Pending proposal (`pending_proposal_total`) gives you an idea about how many proposal are in the queue and waiting for commit. An increasing pending number indicates a high client load or an unstable cluster.

Failed proposals (`proposal_failed_total`) are normally related to two issues: temporary failures related to a leader election or longer duration downtime caused by a loss of quorum in the cluster.

### wal

| Name | Description | Type |
|------------------------------------|--------------------------------------------------|---------|
| fsync_durations_microseconds | The latency distributions of fsync called by wal | Summary |
| last_index_saved | The index of the last entry saved by wal | Gauge |

Abnormally high fsync duration (`fsync_durations_microseconds`) indicates disk issues and might cause the cluster to be unstable.

### snapshot

| Name | Description | Type |
|--------------------------------------------|------------------------------------------------------------|---------|
| snapshot_save_total_durations_microseconds | The total latency distributions of save called by snapshot | Summary |

Abnormally high snapshot duration (`snapshot_save_total_durations_microseconds`) indicates disk issues and might cause the cluster to be unstable.

0 comments on commit 2efbc76

Please sign in to comment.