Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and move wal docs #16387

Merged
merged 19 commits into from
Feb 27, 2023
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 89 additions & 9 deletions website/content/docs/agent/config/config-files.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -1586,15 +1586,95 @@ Valid time units are 'ns', 'us' (or 'µs'), 'ms', 's', 'm', 'h'."

## Raft Parameters

- `raft_boltdb` ((#raft_boltdb)) This is a nested object that allows configuring
options for Raft's BoltDB based log store.

- `NoFreelistSync` ((#NoFreelistSync)) Setting this to `true` will disable
syncing the BoltDB freelist to disk within the raft.db file. Not syncing
the freelist to disk will reduce disk IO required for write operations
at the expense of potentially increasing start up time due to needing
to scan the db to discover where the free space resides within the file.

- `raft_boltdb` ((#raft_boltdb)) **These fields are deprecated in Consul 1.15.0.
Use [`raft_logstore`](#raft_logstore) instead.** This is a nested
object that allows configuring options for Raft's BoltDB based log store.
im2nguyen marked this conversation as resolved.
Show resolved Hide resolved

- `NoFreelistSync` **This field is deprecated in Consul 1.15.0. Use the
[`raft_logstore.boltdb.no_freelist_sync`](#raft_logstore_boltdb_no_freelist_sync) field
instead.** Setting this to `true` will disable syncing the BoltDB freelist
to disk within the raft.db file. Not syncing the freelist to disk will
reduce disk IO required for write operations at the expense of potentially
increasing start up time due to needing to scan the db to discover where the
free space resides within the file.
im2nguyen marked this conversation as resolved.
Show resolved Hide resolved

- `raft_logstore` ((#raft_logstore)) This is a nested object that allows
configuring options for Raft's LogStore component which is used to persist
logs and crucial Raft state on disk during writes. This was added in Consul
1.15.
im2nguyen marked this conversation as resolved.
Show resolved Hide resolved

- `backend` ((#raft_logstore_backend)) Specifies which storage
engine to use to persist logs. Valid options are `boltdb` or `wal`. Default
is `boltdb`. The `wal` option specifies an experimental backend that
should be used with caution. Refer to
[Experimental WAL LogStore backend](/consul/docs/agent/wal-logstore)
for more information.

- `disable_log_cache` ((#raft_logstore_disable_log_cache)) This allows
disabling of the in-memory cache of recent logs. This exists mostly for
performance testing purposes. In theory the log cache prevents disk reads
for recent logs. In practice recent logs are still in OS page cache so tend
not to be slow to read using either backend. We recommend leaving it enabled
for now as we've not measured a significant improvement in any metric by
disabling.
im2nguyen marked this conversation as resolved.
Show resolved Hide resolved

- `verification` ((#raft_logstore_verification)) This is a nested object that
allows configuring online verification of the LogStore. Verification
provides additional assurances that LogStore backends are correctly storing
data. It imposes very low overhead on servers and is safe to run in
production, however it's mostly useful when evaluating a new backend
implementation.
im2nguyen marked this conversation as resolved.
Show resolved Hide resolved

Verification must be enabled on the leader to have any effect and can be
used with any backend. When enabled, the leader will periodically write a
special "checkpoint" log message including checksums of all log entries
written to Raft since the last checkpoint. Followers that have verification
enabled will run a background task for each checkpoint that reads all logs
directly from the LogStore and recomputes the checksum. A report is output
as an INFO level log for each checkpoint.
im2nguyen marked this conversation as resolved.
Show resolved Hide resolved

Checksum failure should never happen and indicate unrecoverable corruption
on that server. The only correct response is to stop the server, remove its
data directory, and restart so it can be caught back up with a correct
server again. Please report verification failures including details about
your hardware and workload via GitHub issues. Refer to
[Experimental WAL LogStore backend](/consul/docs/agent/wal-logstore)
for more information.
Comment on lines +1630 to +1636
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Checksum failure should never happen and indicate unrecoverable corruption
on that server. The only correct response is to stop the server, remove its
data directory, and restart so it can be caught back up with a correct
server again. Please report verification failures including details about
your hardware and workload via GitHub issues. Refer to
[Experimental WAL LogStore backend](/consul/docs/agent/wal-logstore)
for more information.
Checksum failure indicates unrecoverable corruption
on that server. The only corrective response is to stop the server, remove its
data directory, and then restart. These actions catch up the data directory with a correct
server again. Report verification failures including details about
your hardware and workload through GitHub issues. Refer to
[Experimental WAL LogStore backend](/consul/docs/agent/wal-logstore)
for more information.

There was a pronoun I couldn't parse, so please verify.

In the following sentence: "The only correct response is to stop the server, remove its data directory, and restart so it can be caught back up with a correct server again."

The "it" after the so - what is it that's catching up with a correct server again? As written is says: "These actions catch up the server with a correct server again." I replaced the "it" with "data directory" - the only other thing in the sentence it could refer to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that. IMO would read more clearly as

Suggested change
Checksum failure should never happen and indicate unrecoverable corruption
on that server. The only correct response is to stop the server, remove its
data directory, and restart so it can be caught back up with a correct
server again. Please report verification failures including details about
your hardware and workload via GitHub issues. Refer to
[Experimental WAL LogStore backend](/consul/docs/agent/wal-logstore)
for more information.
Checksum failure indicatea unrecoverable corruption
on that server. If this occurs, stop the server, remove the data directory, and restart so it can re-sync its state with the leader again. Report each verification failure including details about
your hardware and workload in a GitHub issue. Refer to
[Experimental WAL LogStore backend](/consul/docs/agent/wal-logstore)
for more information.

Does that read better to you @boruszak?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops saw this was merged. Nevermind!


- `enabled` ((#raft_logstore_verification_enabled)) - Set to `true` to
allow this Consul server to write and verify log verification checkpoints
when it is elected leader.
im2nguyen marked this conversation as resolved.
Show resolved Hide resolved

- `interval` ((#raft_logstore_verification_interval)) - Specifies the time
interval between checkpoints. There is no default value. You must
configure the `interval` and set [`enabled`](#raft_logstore_verification_enabled)
to `true` to correctly enable intervals. We recommend using an interval
between `30s` and `5m`. The performance overhead is insignificant if the
interval is set to `5m` or less.
im2nguyen marked this conversation as resolved.
Show resolved Hide resolved

- `boltdb` ((#raft_logstore_boltdb)) - Object that configures options for
Raft's `boltdb` backend. It has no effect if the `backend` is not `boltdb`.

- `no_freelist_sync` ((#raft_logstore_boltdb_no_freelist_sync)) - Set to
`true` to disable storing BoltDB's freelist to disk within the
`raft.db` file. Disabling freelist syncs reduces the disk IO required
for write operations, but could potentially increase start up time
because Consul must scan the database to find free space
within the file.

- - `wal` ((#raft_logstore_wal)) - Object that configures the `wal` backend.
Refer to [Experimental WAL LogStore backend](/consul/docs/agent/wal-logstore)
for more information.

- `segment_size_mb` ((#raft_logstore_wal_segment_size_mb)) - Integer value
that represents the target size in MB for each segment file before
rolling to a new segment. The default is `64` and is suitable for
most deployments. A smaller value may use less disk space because you
can reclaim space by deleting old segments sooner, but a smaller segment
may affect performance because safely rotating to a new file more
frequently could impact tail latencies. Larger values are unlikely
to improve performance significantly. We recommend using this
configuration for performance testing purposes.
im2nguyen marked this conversation as resolved.
Show resolved Hide resolved

- `raft_protocol` ((#raft_protocol)) Equivalent to the [`-raft-protocol`
command-line flag](/consul/docs/agent/config/cli-flags#_raft_protocol).
Expand Down
Loading