-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
self-test disk test enhancements #20590
self-test disk test enhancements #20590
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will slow down the test quite a bit but I guess that's not really a problem
Type: adminapi.DiskcheckTagIdentifier, | ||
}, | ||
adminapi.DiskcheckParameters{ | ||
Name: "16KB sequential r/w, high io depth", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intentionally not add something like 4k @ 256 iodepth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking more about "why not 4K" or "why not 256 iodepth"?
In any case it was intentional but open to ideas here. One thing to note is that the parallelism factor here is then multiplied by the shard count, so on modest 8 shard nodes we are already at a very high 512 io depth for parallelism=64, which IME is larger than what you need to get max throughput even on large local SSD configurations (though of course this may not be the case on some other storage configuations, especially high throughput, longer latency network attached storage).
I don't actually like this multiplication because it (a) adds a confounding factor when comparing results against different clusters which may have different shard counts (but at least now we see the effective iodepth in the output) and (b) it means you can't run an iodepth=1 test except on a cluster with 1-shard nodes.
About 4K vs 16K, my goal was to add a 16K test to see the difference between 4K and 16K, i.e., how much performance varies in the range of block sizes Redpanda is already writing with default settings. Then I also wanted to add a "series" of varying iodepth tests, which I sort of arbitrarily chose to be 16K one. I didn't want to do both to keep the number of tests down, and I think maybe I favored 16K over 4K in part because 4K already had parallelism=2, and I wanted 1 and didn't want to charge the existing 4K test to keep some continuity with old results.
That said, very open to changing it. What is your view on the ideal series of tests to run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to note is that the parallelism factor here is then multiplied by the shard count
Wait but right now this all happens on shard zero only. Are you saying we still multiply it by the shard count?
That said, very open to changing it. What is your view on the ideal series of tests to run?
I don't feel strongly. Just really coming from the classic 4k test and I guess it matches the min amount we write.
I guess the 512Kib test is actually the least relevant one for RP as we never write sizes bigger than 16Kib (only when fetching from TS).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait but right now this all happens on shard zero only. Are you saying we still multiply it by the shard count?
No I was simply mistaken. I thought this ran on all shards, but as you say it seems to run only one shard. I was thrown off especially by this comnent and also this code and comment. Perhaps vestigial?
So I will adjust the numbers to hit higher io depths, and maybe add 1 more test.
Just really coming from the classic 4k test and I guess it matches the min amount we write.
I'll change it to 4K.
I guess the 512Kib test is actually the least relevant one for RP as we never write sizes bigger than 16Kib (only when fetching from TS).
It's definitely the least useful for evaluating RP performance at the default settings. As a test to understand more about the disk, especially disks with characteristics different than the most common ones we run on I think it's fine because it is a "max throughput" test, and if it gets a much higher number than the other tests with small blocks then we've learned something.
If we want more data points and we want to keep the same duration per test, I don't really see an alternative to that. However, we could always reduce the default per-test duration if the overall current duration (2 minutes, at default per-test duration) is a "sweet spot" or somethign like that. Note that most of these newly added tests have skipRead=true so they are half the time of the existing tests, so the time expansion is actually half of what you'd guess by looking at it. The increase is 4 tests -> 8 tests, so 2 minutes to 4 minutes at default duration. |
Stupid "close with comment" button sitting there looking so pressable. |
0c1753b
b1f5213
to
0c1753b
Compare
Updated in push: 0c1753b
Example output after this change:
The help output:
|
/dt |
1 similar comment
/dt |
/ci-repeat 1 |
0c1753b
to
fa3908b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor copy edits for consistency with public docs
/ci-repeat 1 |
OK these remaining errors seem legit, looking. |
365233c
to
f01195a
Compare
365233c is a pure rebase. f01195a changes the max iodepth in the new tests to 256 from 512, as RP has a hardcoded limit of 256 in the self test code. I also considered increasing this limit from 256 to 512 on the RP side but then we'd have issues running self-test in cases where the RPK version was newer than the Redpanda version, which is a supported and I think fairly common scenario, so I decided to change RPK instead. This should fix the test failures. |
new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a306-08a6-4d35-b300-695b00ac2af8:
new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a306-08a8-4562-a22d-e76610db099a:
new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a306-08aa-43eb-8b38-c266ab6f395f:
new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a306-08ac-4d5b-81cd-1ebe01054b59:
new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a307-bacc-47d3-ab43-fa4842e939fd:
new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a307-baca-43b9-b035-d624e18befca:
new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a307-bac8-4f3f-b1d7-0a027f3a1d46:
new failures in https://buildkite.com/redpanda/redpanda/builds/51367#0190a307-bace-4ce7-865c-f95725cb07e4:
|
6e11ce0
to
46e90f0
Compare
Hopefully this last push fixes all the failures. All the tests were passing for me locally but it turned out it just because of https://github.com/redpanda-data/vtools/pull/2950 not rebuilding my RPK. |
All spurious failures, retrying. |
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51441#0190b6b2-8b1b-47d6-ac1c-278a444551c1 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51532#0190b7dd-5b51-42bf-a583-11a55136488d ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51669#0190c1d0-6076-404d-a63a-326257de411d ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51730#0190c736-8f0e-4639-8b72-755d2feabab0 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51755#0190c8f2-4424-407b-80f1-45c68a49dd59 |
/ci-repeat 1 |
1 similar comment
/ci-repeat 1 |
Spurious GH download failure in last run. |
/ci-repeat 1 |
46e90f0
to
a747e47
Compare
Last failure was a merge conflict, fixed. Hopefully this CI run is the one. |
Add 16K block size disk tests, a common block size written by Redpanda, at varying IO depths: 1, 8 and 32 times the shard count (the multiplication by the shard count happens in Redpanda and is inevitable). This will help better assess the performance of block storage which is a bit outside the usual, in particular how it response to io depth changes. Additionally, add a 4K test which is the same as the existing one but with dsync off. This is critical to assess the impact of fdatasync on the storage layer: locally, this makes a 257x difference in throughput though the effect is much more muted, perhaps close to zero on other SSD types. Rename slightly the tests to remove extraneous info. Issue redpanda-data/core-internal#1266.
Set the name to unspecified, which is more accurate reflection of the situation when the caller doesn't set a name. Fix a comment which said 1G but was 10G.
When we complete a self test the API returns info about the run including an info field which says "write run" currently (for a disk test). Enhance this to include information about whether dsync was enabled and the total io depth. Issue redpanda-data/core-internal#1266.
a747e47
to
bd8b94d
Compare
bd8b94d is to fix yet another merge conflict (what's up with my luck on this change?). |
rpk: add additional disk self tests
Add 16K block size disk tests, a common block size written by Redpanda,
at varying IO depths: 1, 8 and 32 times the shard count (the
multiplication by the shard count happens in Redpanda and is
inevitable).
This will help better assess the performance of block storage which is
a bit outside the usual, in particular how it response to io depth
changes.
Additionally, add a 4K test which is the same as the existing one but
with dsync off. This is critical to assess the impact of fdatasync
on the storage layer: locally, for me on my consumer SSD this makes a
257x difference (!!) in throughput though the effect is much more muted,
perhaps close to zero on other SSD types.
On the redpanda side, when we complete a self test the API returns
info about the runincluding an info field which says "write run" currently
(for a disk test). Enhance this to include information about whether
dsync was enabled and the total io depth (which is the client-specified
parallelism value times the number of shards).
Backports Required
Release Notes
Improvements