From 52c59b3a1656be2b31be00683abd5e13b2bd98ae Mon Sep 17 00:00:00 2001 From: Matt Keeler Date: Thu, 9 Dec 2021 16:18:59 -0500 Subject: [PATCH] Various Boltdb/Raft Documentation Updates (#11793) * Documenting the new raft_boltdb configuration options * Add documentation around new boltdb metrics. * Correct documentation for the consul.raft.fsm.apply metric --- website/content/docs/agent/options.mdx | 10 ++++++++++ website/content/docs/agent/telemetry.mdx | 25 +++++++++++++++++++++++- 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/website/content/docs/agent/options.mdx b/website/content/docs/agent/options.mdx index 4f1557c2c9be..4cf58f7e54ef 100644 --- a/website/content/docs/agent/options.mdx +++ b/website/content/docs/agent/options.mdx @@ -1770,6 +1770,16 @@ bind_addr = "{{ GetPrivateInterfaces | include \"network\" \"10.0.0.0/8\" | attr - `protocol` ((#protocol)) Equivalent to the [`-protocol` command-line flag](#_protocol). +- `raft_boltdb` ((#raft_boltdb)) This is a nested object that allows configuring + options for Raft's BoltDB based log store. + + - `NoFreelistSync` ((#NoFreelistSync)) Setting this to `true` will disable + syncing the BoltDB freelist to disk within the raft.db file. Not syncing + the freelist to disk will reduce disk IO required for write operations + at the expense of potentially increasing start up time due to needing + to scan the db to discover where the free space resides within the file. + + - `raft_protocol` ((#raft_protocol)) Equivalent to the [`-raft-protocol` command-line flag](#_raft_protocol). diff --git a/website/content/docs/agent/telemetry.mdx b/website/content/docs/agent/telemetry.mdx index 9a8a729e2db0..9bcede9c60e2 100644 --- a/website/content/docs/agent/telemetry.mdx +++ b/website/content/docs/agent/telemetry.mdx @@ -342,11 +342,34 @@ These metrics are used to monitor the health of the Consul servers. | `consul.raft.applied_index` | Represents the raft applied index. | index | gauge | | `consul.raft.apply` | Counts the number of Raft transactions occurring over the interval, which is a general indicator of the write load on the Consul servers. | raft transactions / interval | counter | | `consul.raft.barrier` | Counts the number of times the agent has started the barrier i.e the number of times it has issued a blocking call, to ensure that the agent has all the pending operations that were queued, to be applied to the agent's FSM. | blocks / interval | counter | +| `consul.raft.boltdb.freelistBytes` | Represents the number of bytes necessary to encode the freelist metadata. When `raft_boltdb.NoFreelistSync` is set to `false` these metadata bytes must also be written to disk for each committed log. | bytes | gauge | +| `consul.raft.boltdb.freePageBytes` | Represents the number of bytes of free space within the raft.db file. | bytes | gauge | +| `consul.raft.boltdb.getLog` | Measures the amount of time spent reading logs from the db. | ms | timer | +| `consul.raft.boltdb.logBatchSize` | Measures the total size in bytes of logs being written to the db in a single batch. | bytes | sample | +| `consul.raft.boltdb.logsPerBatch` | Measures the number of logs being written per batch to the db. | logs | sample | +| `consul.raft.boltdb.logSize` | Measures the size of logs being written to the db. | bytes | sample | +| `consul.raft.boltdb.numFreePages` | Represents the number of free pages within the raft.db file. | pages | gauge | +| `consul.raft.boltdb.numPendingPages` | Represents the number of pending pages within the raft.db that will soon become free. | pages | gauge | +| `consul.raft.boltdb.openReadTxn` | Represents the number of open read transactions against the db | transactions | guage | +| `consul.raft.boltdb.totalReadTxn` | Represents the total number of started read transactions against the db | transactions | guage | +| `consul.raft.boltdb.storeLogs` | Measures the amount of time spent writing logs to the db. | ms | timer | +| `consul.raft.boltdb.txstats.cursorCount` | Counts the number of cursors created since Consul was started. | cursors | counter | +| `consul.raft.boltdb.txstats.nodeCount` | Counts the number of node allocations within the db since Consul was started. | allocations | counter | +| `consul.raft.boltdb.txstats.nodeDeref` | Counts the number of node dereferences in the db since Consul was started. | dereferences | counter | +| `consul.raft.boltdb.txstats.pageAlloc` | Represents the number of bytes allocated within the db since Consul was started. Note that this does not take into account space having been freed and reused. In that case, the value of this metric will still increase. | bytes | gauge | +| `consul.raft.boltdb.txstats.pageCount` | Represents the number of pages allocated since Consul was started. Note that this does not take into account space having been freed and reused. In that case, the value of this metric will still increase. | pages | gauge | +| `consul.raft.boltdb.txstats.rebalance` | Counts the number of node rebalances performed in the db since Consul was started. | rebalances | counter | +| `consul.raft.boltdb.txstats.rebalanceTime` | Measures the time spent rebalancing nodes in the db. | ms | timer | +| `consul.raft.boltdb.txstats.spill` | Counts the number of nodes spilled in the db since Consul was started. | spills | counter | +| `consul.raft.boltdb.txstats.spillTime` | Measures the time spent spilling nodes in the db. | ms | timer | +| `consul.raft.boltdb.txstats.split` | Counts the number of nodes split in the db since Consul was started. | splits | counter | +| `consul.raft.boltdb.txstats.write` | Counts the number of writes to the db since Consul was started. | writes | counter | +| `consul.raft.boltdb.txstats.writeTime` | Measures the amount of time spent performing writes to the db. | ms | timer | | `consul.raft.commitNumLogs` | Measures the count of logs processed for application to the FSM in a single batch. | logs | gauge | | `consul.raft.commitTime` | Measures the time it takes to commit a new entry to the Raft log on the leader. | ms | timer | | `consul.raft.fsm.lastRestoreDuration` | Measures the time taken to restore the FSM from a snapshot on an agent restart or from the leader calling installSnapshot. This is a gauge that holds it's value since most servers only restore during restarts which are typically infrequent. | ms | gauge | | `consul.raft.fsm.snapshot` | Measures the time taken by the FSM to record the current state for the snapshot. | ms | timer | -| `consul.raft.fsm.apply` | The number of logs committed since the last interval. | commit logs / interval | counter | +| `consul.raft.fsm.apply` | Measures the time to apply a log to the FSM. | ms | timer | | `consul.raft.fsm.enqueue` | Measures the amount of time to enqueue a batch of logs for the FSM to apply. | ms | timer | | `consul.raft.fsm.restore` | Measures the time taken by the FSM to restore its state from a snapshot. | ms | timer | | `consul.raft.last_index` | Represents the raft applied index. | index | gauge |