Skip to content

Commit

Permalink
Updates to RP self test (#604)
Browse files Browse the repository at this point in the history
Co-authored-by: Paulo Borges <paulohtb@hotmail.com>
Co-authored-by: Jake Cahill <45230295+JakeSCahill@users.noreply.github.com>
Co-authored-by: Michele Cyran <michele@redpanda.com>
Co-authored-by: Joyce Fee <102751339+Feediver1@users.noreply.github.com>
Co-authored-by: Mike Boquard <michael@redpanda.com>
Co-authored-by: tris0laris <57298792+tris0laris@users.noreply.github.com>
Co-authored-by: Dave Voutila <voutilad@gmail.com>
Co-authored-by: Angela Simms <102690377+asimms41@users.noreply.github.com>
Co-authored-by: Andrew Hsu <xuzuan@gmail.com>
Co-authored-by: Oren Leiman <mumblemumble777@gmail.com>
  • Loading branch information
11 people authored Jul 29, 2024
1 parent 3f9ef30 commit 1ba762c
Show file tree
Hide file tree
Showing 7 changed files with 273 additions and 614 deletions.
10 changes: 10 additions & 0 deletions modules/get-started/pages/whats-new.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,16 @@ Redpanda now includes `rpk` and Redpanda Console support for managing xref:manag

Client throughput quotas, previously applied on a per-shard basis, now apply on a per-broker basis. Cluster configuration properties for managing client quotas are xref:upgrade:deprecated/index.adoc[deprecated], including `target_quota_byte_rate` which is disabled by default with the value `0`.

== Self-test enhancements

New tests are added to the xref:manage:cluster-maintenance/cluster-diagnostics.adoc[Redpanda self-test] suite:

* Cloud storage tests to validate xref:manage:tiered-storage.adoc[Tiered Storage] configuration.
* 16K block size disk tests to better asses block storage performance, particularly in response to I/O depth changes.
* 4K block size disk test with dsync off to asses the impact of fdatasync on the storage layer.

See the xref:reference:rpk/rpk-cluster/rpk-cluster-self-test-status.adoc[`rpk self test`] reference for usage and output examples.

== Next steps

* xref:install-beta.adoc[]
243 changes: 19 additions & 224 deletions modules/manage/pages/cluster-maintenance/cluster-diagnostics.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,19 @@ When anomalous behavior arises in a cluster and you're trying to figure out whet

Self-test runs a set of benchmarks to determine the maximum performance of a machine's disks and network connections. For disks, it runs throughput and latency tests by performing concurrent sequential operations. For networks, it selects unique pairs of Redpanda nodes as client/server pairs, then it runs throughput tests between them. Self-test runs each benchmark for a configurable duration, and it returns IOPS, throughput, and latency metrics.

=== Self-test command examples
== Cloud storage tests

If you use xref:manage:tiered-storage.adoc[Tiered Storage], run self-test to verify that you have configured your cloud storage accounts correctly.

Self-test performs the following tests to validate cloud storage configuration:

include::reference:partial$rpk-self-test-cloud-tests.adoc[]

See the xref:reference:rpk/rpk-cluster/rpk-cluster-self-test-start.adoc[`rpk cluster self-test start`] reference for cloud storage test details.

== Self-test command examples

=== Start self-test

To begin using self-test, run the `self-test start` command.

Expand All @@ -34,6 +46,8 @@ rpk cluster self-test status

The `self-test start` command returns immediately, and self-test runs its benchmarks asynchronously.

=== Check self-test status

To check on the status of self-test, run the `self-test status` command.

[,bash]
Expand Down Expand Up @@ -66,231 +80,12 @@ rpk cluster self-test status --format=json

If benchmarks have completed, `self-test status` returns their results.

include::reference:partial$rpk-self-test-descriptions.adoc[]

.Example status output: test results
[%collapsible]
====
Test results are grouped by node ID. Each test returns the following:
- **NAME**: Description of the test.
- **INFO**: Detail about the test run attached by Redpanda itself.
- **TYPE**: Either `disk` or `network` test.
- **TEST ID**: Unique identifier given to jobs of a run. All IDs in a test should match. If they don't match, then newer and/or older test results have been included erroneously.
- **TIMEOUTS**: Number of timeouts incurred during the test.
- **DURATION**: Duration of the test.
- **IOPS**: Number of operations per second. For disk, it's `seastar::dma_read` and `seastar::dma_write`. For network, it's `rpc.send()`
- **THROUGHPUT**: For disk, it's throughput rate in bytes per second. For network, it's throughput rate in bits per second in. (Note: GiB vs. Gib is the correct notation displayed by the UI.)
- **LATENCY**: 50th, 90th, etc. percentiles of operation latency, reported in microseconds.
```
$ rpk cluster self-test status
NODE ID: 1 | STATUS: IDLE
=========================
NAME 512K sequential r/w throughput disk test
INFO write run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5001ms
IOPS 1590 req/sec
THROUGHPUT 795.2MiB/sec
LATENCY P50 P90 P99 P999 MAX
831us 5887us 11263us 24575us 507903us

NAME 512K sequential r/w throughput disk test
INFO read run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5001ms
IOPS 4504 req/sec
THROUGHPUT 2.2GiB/sec
LATENCY P50 P90 P99 P999 MAX
703us 1599us 4351us 6399us 10239us

NAME 4k sequential r/w latency/iops disk test
INFO write run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5031ms
IOPS 289 req/sec
THROUGHPUT 144.7MiB/sec
LATENCY P50 P90 P99 P999 MAX
543us 34815us 69631us 77823us 77823us

NAME 4k sequential r/w latency/iops disk test
INFO read run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 8275 req/sec
THROUGHPUT 4.041GiB/sec
LATENCY P50 P90 P99 P999 MAX
191us 447us 831us 2175us 278527us

NAME 8K Network Throughput Test
INFO Test performed against node: 0
TYPE network
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 61254 req/sec
THROUGHPUT 3.74Gib/sec
LATENCY P50 P90 P99 P999 MAX
159us 207us 303us 415us 1087us

NAME 8K Network Throughput Test
INFO Test performed against node: 2
TYPE network
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 54814 req/sec
THROUGHPUT 3.35Gib/sec
LATENCY P50 P90 P99 P999 MAX
167us 255us 367us 511us 25599us

NODE ID: 0 | STATUS: IDLE
=========================
NAME 512K sequential r/w throughput disk test
INFO write run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5002ms
IOPS 1593 req/sec
THROUGHPUT 796.8MiB/sec
LATENCY P50 P90 P99 P999 MAX
735us 5887us 11263us 69631us 507903us
NAME 512K sequential r/w throughput disk test
INFO read run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 4372 req/sec
THROUGHPUT 2.135GiB/sec
LATENCY P50 P90 P99 P999 MAX
735us 1599us 4351us 7423us 9215us
NAME 4k sequential r/w latency/iops disk test
INFO write run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5026ms
IOPS 286 req/sec
THROUGHPUT 143.1MiB/sec
LATENCY P50 P90 P99 P999 MAX
543us 34815us 69631us 77823us 77823us
NAME 4k sequential r/w latency/iops disk test
INFO read run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 8269 req/sec
THROUGHPUT 4.038GiB/sec
LATENCY P50 P90 P99 P999 MAX
191us 447us 831us 2175us 278527us
NAME 8K Network Throughput Test
INFO Test performed against node: 1
TYPE network
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 61612 req/sec
THROUGHPUT 3.76Gib/sec
LATENCY P50 P90 P99 P999 MAX
159us 207us 303us 431us 1151us
NAME 8K Network Throughput Test
INFO Test performed against node: 2
TYPE network
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 60306 req/sec
THROUGHPUT 3.68Gib/sec
LATENCY P50 P90 P99 P999 MAX
159us 215us 351us 495us 11263us
NODE ID: 2 | STATUS: IDLE
=========================
NAME 512K sequential r/w throughput disk test
INFO write run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5001ms
IOPS 1580 req/sec
THROUGHPUT 790MiB/sec
LATENCY P50 P90 P99 P999 MAX
671us 5887us 12287us 47103us 507903us

NAME 512K sequential r/w throughput disk test
INFO read run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 3932 req/sec
THROUGHPUT 1.92GiB/sec
LATENCY P50 P90 P99 P999 MAX
831us 1791us 4351us 7167us 9215us

NAME 4k sequential r/w latency/iops disk test
INFO write run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5027ms
IOPS 280 req/sec
THROUGHPUT 140.1MiB/sec
LATENCY P50 P90 P99 P999 MAX
575us 34815us 73727us 86015us 86015us

NAME 4k sequential r/w latency/iops disk test
INFO read run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 8699 req/sec
THROUGHPUT 4.248GiB/sec
LATENCY P50 P90 P99 P999 MAX
183us 367us 831us 2175us 278527us

NAME 8K Network Throughput Test
INFO Test performed against node: 0
TYPE network
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 60027 req/sec
THROUGHPUT 3.66Gib/sec
LATENCY P50 P90 P99 P999 MAX
159us 223us 351us 511us 11775us

NAME 8K Network Throughput Test
INFO Test performed against node: 1
TYPE network
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 63090 req/sec
THROUGHPUT 3.85Gib/sec
LATENCY P50 P90 P99 P999 MAX
151us 207us 319us 463us 17407us

```
====
include::reference:partial$rpk-self-test-status-output.adoc[]

NOTE: If self-test returns write results that are unexpectedly and significantly lower than read results, it may be because the Redpanda `rpk` client hardcodes the `DSync` option to `true`. When `DSync` is enabled, files are opened with the `O_DSYNC` flag set, and this represents the actual setting that Redpanda uses when it writes to disk.
=== Stop self-test

To stop a running self-test, run the `self-test stop` command.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ Available tests to run:
* *Cloud storage tests*
** Latency test: 1024-bit object.
** Depending on cluster read/write permissions (xref:reference:properties/object-storage-properties.adoc#cloud_storage_enable_remote_read[`cloud_storage_enable_remote_read`], xref:reference:properties/object-storage-properties.adoc#cloud_storage_enable_remote_write[`cloud_storage_enable_remote_write`]), a series of cloud storage operations are performed:
*** Upload an object to an object storage.
*** List objects in the object storage.
*** Download an object from the object storage.
*** Delete the original object from the object storage, if it was uploaded.
+
--
include::reference:partial$rpk-self-test-cloud-tests.adoc[]
--
This command prompts users for confirmation (unless the flag `--no-confirm` is specified), then returns a test identifier ID, and runs the tests.

Expand Down
Loading

0 comments on commit 1ba762c

Please sign in to comment.