Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to RP self test #604

Merged
merged 31 commits into from
Jul 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
7200d6f
adjust configs for 24.2 beta
Deflaimun Jun 19, 2024
b88e42a
adjust configs for 24.2 beta
Deflaimun Jun 19, 2024
8e6765a
JSON Support for Schema Registry (#564)
kbatuigas Jun 27, 2024
da493ce
add fips mode property (#584)
Deflaimun Jun 27, 2024
cbc9932
FIPS Doc (#562)
Feediver1 Jul 3, 2024
8c5bfed
Add user guide for data transforms (#567)
JakeSCahill Jul 5, 2024
1fbdf91
What's New in 24.2 beta (#571)
micheleRP Jul 5, 2024
44be230
Update fallback RC version
JakeSCahill Jul 5, 2024
085e9d7
Add `v` prefix to beta tags
JakeSCahill Jul 6, 2024
0bfcbaf
Fix spacing
JakeSCahill Jul 6, 2024
52e14af
Fix formatting
JakeSCahill Jul 8, 2024
3629801
Merge remote-tracking branch 'origin/v-WIP/24.2' into v-WIP/24.2
kbatuigas Jul 8, 2024
8ac5857
rpk 24.2 (#591)
Deflaimun Jul 9, 2024
e11f67f
Merge remote-tracking branch 'origin/v-WIP/24.2' into v-WIP/24.2
kbatuigas Jul 11, 2024
9cbcda1
Merge branch 'v-WIP/24.2' of github.com:redpanda-data/docs into v-WIP…
kbatuigas Jul 15, 2024
2b3c01a
Merge branch 'v-WIP/24.2' of github.com:redpanda-data/docs into v-WIP…
kbatuigas Jul 17, 2024
263d62b
Merge branch 'v-WIP/24.2' of github.com:redpanda-data/docs into v-WIP…
kbatuigas Jul 22, 2024
429b1eb
Merge branch 'v-WIP/24.2' of github.com:redpanda-data/docs into v-WIP…
kbatuigas Jul 22, 2024
b2772af
Merge branch 'v-WIP/24.2' of github.com:redpanda-data/docs into v-WIP…
kbatuigas Jul 25, 2024
c1fd2c7
Merge branch 'v-WIP/24.2' of github.com:redpanda-data/docs into v-WIP…
kbatuigas Jul 26, 2024
3b97fc6
Add cloud storage self test
kbatuigas Jul 16, 2024
dbfa496
Add link
kbatuigas Jul 16, 2024
9e167d6
Add new block test results to sample output
kbatuigas Jul 19, 2024
d63b1ed
Single source self test status output for maintainability
kbatuigas Jul 22, 2024
e1aa5bb
Split out partial into description and output
kbatuigas Jul 22, 2024
f84ba63
Add subheadings for readability
kbatuigas Jul 22, 2024
3f7256e
Reference self test start page instead
kbatuigas Jul 25, 2024
3bb760c
Single source list of cloud tests
kbatuigas Jul 25, 2024
259e822
Edit per suggestion
kbatuigas Jul 25, 2024
b616eea
Add self test to What's New
kbatuigas Jul 26, 2024
21aefb6
Update modules/reference/partials/rpk-self-test-descriptions.adoc
kbatuigas Jul 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions modules/get-started/pages/whats-new.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,16 @@ Redpanda now includes `rpk` and Redpanda Console support for managing xref:manag

Client throughput quotas, previously applied on a per-shard basis, now apply on a per-broker basis. Cluster configuration properties for managing client quotas are xref:upgrade:deprecated/index.adoc[deprecated], including `target_quota_byte_rate` which is disabled by default with the value `0`.

== Self-test enhancements

New tests are added to the xref:manage:cluster-maintenance/cluster-diagnostics.adoc[Redpanda self-test] suite:

* Cloud storage tests to validate xref:manage:tiered-storage.adoc[Tiered Storage] configuration.
* 16K block size disk tests to better asses block storage performance, particularly in response to I/O depth changes.
* 4K block size disk test with dsync off to asses the impact of fdatasync on the storage layer.

See the xref:reference:rpk/rpk-cluster/rpk-cluster-self-test-status.adoc[`rpk self test`] reference for usage and output examples.

== Next steps

* xref:install-beta.adoc[]
243 changes: 19 additions & 224 deletions modules/manage/pages/cluster-maintenance/cluster-diagnostics.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,19 @@ When anomalous behavior arises in a cluster and you're trying to figure out whet

Self-test runs a set of benchmarks to determine the maximum performance of a machine's disks and network connections. For disks, it runs throughput and latency tests by performing concurrent sequential operations. For networks, it selects unique pairs of Redpanda nodes as client/server pairs, then it runs throughput tests between them. Self-test runs each benchmark for a configurable duration, and it returns IOPS, throughput, and latency metrics.

=== Self-test command examples
== Cloud storage tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
== Cloud storage tests
== Object storage tests


If you use xref:manage:tiered-storage.adoc[Tiered Storage], run self-test to verify that you have configured your cloud storage accounts correctly.

Self-test performs the following tests to validate cloud storage configuration:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Self-test performs the following tests to validate cloud storage configuration:
Self-test performs the following tests to validate object storage configuration:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbatuigas I think all instances of "cloud storage" here should be "object storage", similar to our Tiered Storage doc page. And I think "bucket" should be "bucket or container".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


include::reference:partial$rpk-self-test-cloud-tests.adoc[]

See the xref:reference:rpk/rpk-cluster/rpk-cluster-self-test-start.adoc[`rpk cluster self-test start`] reference for cloud storage test details.

== Self-test command examples

=== Start self-test

To begin using self-test, run the `self-test start` command.

Expand All @@ -34,6 +46,8 @@ rpk cluster self-test status

The `self-test start` command returns immediately, and self-test runs its benchmarks asynchronously.

=== Check self-test status

To check on the status of self-test, run the `self-test status` command.

[,bash]
Expand Down Expand Up @@ -66,231 +80,12 @@ rpk cluster self-test status --format=json

If benchmarks have completed, `self-test status` returns their results.

include::reference:partial$rpk-self-test-descriptions.adoc[]

.Example status output: test results
[%collapsible]
====
Test results are grouped by node ID. Each test returns the following:

- **NAME**: Description of the test.
- **INFO**: Detail about the test run attached by Redpanda itself.
- **TYPE**: Either `disk` or `network` test.
- **TEST ID**: Unique identifier given to jobs of a run. All IDs in a test should match. If they don't match, then newer and/or older test results have been included erroneously.
- **TIMEOUTS**: Number of timeouts incurred during the test.
- **DURATION**: Duration of the test.
- **IOPS**: Number of operations per second. For disk, it's `seastar::dma_read` and `seastar::dma_write`. For network, it's `rpc.send()`
- **THROUGHPUT**: For disk, it's throughput rate in bytes per second. For network, it's throughput rate in bits per second in. (Note: GiB vs. Gib is the correct notation displayed by the UI.)
- **LATENCY**: 50th, 90th, etc. percentiles of operation latency, reported in microseconds.

```
$ rpk cluster self-test status
NODE ID: 1 | STATUS: IDLE
=========================
NAME 512K sequential r/w throughput disk test
INFO write run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5001ms
IOPS 1590 req/sec
THROUGHPUT 795.2MiB/sec
LATENCY P50 P90 P99 P999 MAX
831us 5887us 11263us 24575us 507903us

NAME 512K sequential r/w throughput disk test
INFO read run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5001ms
IOPS 4504 req/sec
THROUGHPUT 2.2GiB/sec
LATENCY P50 P90 P99 P999 MAX
703us 1599us 4351us 6399us 10239us

NAME 4k sequential r/w latency/iops disk test
INFO write run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5031ms
IOPS 289 req/sec
THROUGHPUT 144.7MiB/sec
LATENCY P50 P90 P99 P999 MAX
543us 34815us 69631us 77823us 77823us

NAME 4k sequential r/w latency/iops disk test
INFO read run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 8275 req/sec
THROUGHPUT 4.041GiB/sec
LATENCY P50 P90 P99 P999 MAX
191us 447us 831us 2175us 278527us

NAME 8K Network Throughput Test
INFO Test performed against node: 0
TYPE network
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 61254 req/sec
THROUGHPUT 3.74Gib/sec
LATENCY P50 P90 P99 P999 MAX
159us 207us 303us 415us 1087us

NAME 8K Network Throughput Test
INFO Test performed against node: 2
TYPE network
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 54814 req/sec
THROUGHPUT 3.35Gib/sec
LATENCY P50 P90 P99 P999 MAX
167us 255us 367us 511us 25599us

NODE ID: 0 | STATUS: IDLE
=========================
NAME 512K sequential r/w throughput disk test
INFO write run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5002ms
IOPS 1593 req/sec
THROUGHPUT 796.8MiB/sec
LATENCY P50 P90 P99 P999 MAX
735us 5887us 11263us 69631us 507903us

NAME 512K sequential r/w throughput disk test
INFO read run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 4372 req/sec
THROUGHPUT 2.135GiB/sec
LATENCY P50 P90 P99 P999 MAX
735us 1599us 4351us 7423us 9215us

NAME 4k sequential r/w latency/iops disk test
INFO write run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5026ms
IOPS 286 req/sec
THROUGHPUT 143.1MiB/sec
LATENCY P50 P90 P99 P999 MAX
543us 34815us 69631us 77823us 77823us

NAME 4k sequential r/w latency/iops disk test
INFO read run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 8269 req/sec
THROUGHPUT 4.038GiB/sec
LATENCY P50 P90 P99 P999 MAX
191us 447us 831us 2175us 278527us

NAME 8K Network Throughput Test
INFO Test performed against node: 1
TYPE network
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 61612 req/sec
THROUGHPUT 3.76Gib/sec
LATENCY P50 P90 P99 P999 MAX
159us 207us 303us 431us 1151us

NAME 8K Network Throughput Test
INFO Test performed against node: 2
TYPE network
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 60306 req/sec
THROUGHPUT 3.68Gib/sec
LATENCY P50 P90 P99 P999 MAX
159us 215us 351us 495us 11263us

NODE ID: 2 | STATUS: IDLE
=========================
NAME 512K sequential r/w throughput disk test
INFO write run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5001ms
IOPS 1580 req/sec
THROUGHPUT 790MiB/sec
LATENCY P50 P90 P99 P999 MAX
671us 5887us 12287us 47103us 507903us

NAME 512K sequential r/w throughput disk test
INFO read run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 3932 req/sec
THROUGHPUT 1.92GiB/sec
LATENCY P50 P90 P99 P999 MAX
831us 1791us 4351us 7167us 9215us

NAME 4k sequential r/w latency/iops disk test
INFO write run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5027ms
IOPS 280 req/sec
THROUGHPUT 140.1MiB/sec
LATENCY P50 P90 P99 P999 MAX
575us 34815us 73727us 86015us 86015us

NAME 4k sequential r/w latency/iops disk test
INFO read run
TYPE disk
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 8699 req/sec
THROUGHPUT 4.248GiB/sec
LATENCY P50 P90 P99 P999 MAX
183us 367us 831us 2175us 278527us

NAME 8K Network Throughput Test
INFO Test performed against node: 0
TYPE network
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 60027 req/sec
THROUGHPUT 3.66Gib/sec
LATENCY P50 P90 P99 P999 MAX
159us 223us 351us 511us 11775us

NAME 8K Network Throughput Test
INFO Test performed against node: 1
TYPE network
TEST ID 5e4052f3-b828-4c0d-8fd0-b34ff0b6c35d
TIMEOUTS 0
DURATION 5000ms
IOPS 63090 req/sec
THROUGHPUT 3.85Gib/sec
LATENCY P50 P90 P99 P999 MAX
151us 207us 319us 463us 17407us

```
====
include::reference:partial$rpk-self-test-status-output.adoc[]

NOTE: If self-test returns write results that are unexpectedly and significantly lower than read results, it may be because the Redpanda `rpk` client hardcodes the `DSync` option to `true`. When `DSync` is enabled, files are opened with the `O_DSYNC` flag set, and this represents the actual setting that Redpanda uses when it writes to disk.
=== Stop self-test

To stop a running self-test, run the `self-test stop` command.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@ Available tests to run:
* *Cloud storage tests*
** Latency test: 1024-bit object.
** Depending on cluster read/write permissions (xref:reference:properties/object-storage-properties.adoc#cloud_storage_enable_remote_read[`cloud_storage_enable_remote_read`], xref:reference:properties/object-storage-properties.adoc#cloud_storage_enable_remote_write[`cloud_storage_enable_remote_write`]), a series of cloud storage operations are performed:
*** Upload an object to an object storage.
*** List objects in the object storage.
*** Download an object from the object storage.
*** Delete the original object from the object storage, if it was uploaded.
+
--
include::reference:partial$rpk-self-test-cloud-tests.adoc[]
--

This command prompts users for confirmation (unless the flag `--no-confirm` is specified), then returns a test identifier ID, and runs the tests.

Expand Down
Loading