-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster
: Parse unknown self-test checks
#22831
cluster
: Parse unknown self-test checks
#22831
Conversation
Only applied to `[v24.2.x]` and `[dev]` branches. See redpanda-data#22778 for change to `[v24.1.x]`. Also bumps `serde::version` of `start_test_request` to 2. This `serde::version<>` needed bumping to `serde::version<2>`, since `serde::version<0>` is the version of `start_test_request` with just `diskcheck_opts` and `netcheck_opts`, while `serde::version<1>` is the version with `unknown_checks` added, and `serde::version<2>` needs to reflect the most recently added `cloudcheck_opts`. Also, re-order fields in `start_test_request` to allow serde to work correctly in the interim.
In the case that the controller node which originally parsed the `rpk cluster self-test start` command was of a lower version than the follower nodes in a cluster, it is possible that some tests may not have been recognized by the controller, and instead pushed back into `start_test_request.unknown_checks`. On these upgraded follower nodes, they will attempt to parse any `unknown_checks`, in case they recognize them. This is currently only affecting the `cloudcheck` self-test, in mixed clusters with versions before and after `v24.2.x`. However, any future self-tests that get added to `redpanda` should also add their case to `parse_unknown_checks()` for completion's sake.
This struct was erroneously altered by adding a new field without bumping the `serde::version<>`. This struct also lacks a `serde_fields()` implementation, so adding the new field anywhere but the end of the structure is an issue for serialization/deserialization. Update it by moving the field to the end of the struct, and bumping the `serde::version<>` to 1.
NFCs in this commit.
34a7e98
to
a33abd7
Compare
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52745#01913a1f-846c-4171-baee-c91987c9cbf2 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52745#01913a84-7d40-4ad0-9478-caa80c444f3f |
Clean up the `self_test_backend` and `admin/server` parse sites for self-test JSON parsing by adding `parse_self_test_checks()`. This will now be the function called from `server::self_test_start_handler()` as well as `self_test_backend::do_start_test()`. Also, rename `unknown_check` to `unparsed_check` for clarity. These unparsed checks are still considered "unknown" if they exist by the end of the self-test.
/backport v24.2.x |
src/v/cluster/self_test_rpc_types.h
Outdated
serde::version<1>, | ||
serde::version<2>, | ||
serde::compat_version<0>> { | ||
using rpc_adl_exempt = std::true_type; | ||
|
||
uuid_t id; | ||
std::vector<diskcheck_opts> dtos; | ||
std::vector<netcheck_opts> ntos; | ||
std::vector<cloudcheck_opts> ctos; | ||
std::vector<unknown_check> unknown_checks; | ||
std::vector<cloudcheck_opts> ctos; | ||
|
||
auto serde_fields() { | ||
return std::tie(id, dtos, ntos, unknown_checks, ctos); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillemKauf i'm confused about what is going on here, and i can't figure out the answer from the commit message.
looking at the git history i see when version was 0, the order of serde fields in the struct was d,n,c. then it looks like it was bumped to 1 when the unknown checks were appended, which seems correct.
but then here, the commit appears to claim that a bug is being fixed (i'm not sure what the bug was), and also, the ordering of the fields was changed and that's not allowed even when bumping the version.
serde::version<0>, | ||
serde::version<1>, | ||
serde::compat_version<0>> { | ||
using rpc_adl_exempt = std::true_type; | ||
|
||
uuid_t id{}; | ||
self_test_status status{}; | ||
self_test_stage stage{}; | ||
std::vector<self_test_result> results; | ||
self_test_stage stage{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one seems more problematic because the ordering prior to this commit was both wrong, but also probably shipped in a release?
To allow for compatibility between clusters of mixed version nodes (in the case of a rolling upgrade), we add the
parse_unknown_checks()
function to theself_test_backend
.There are two possibilities in a mixed version cluster:
rpk cluster self-test start
request is sent to the controlling node, which is a higher version than the other nodes in the cluster. They do not recognize some of the tests in thestart_test_request
json, and push back an error toself_test_result
. This case is covered by a previous PR, admin: removethrow
on unknown self test type inadmin_server
#21370.rpk cluster self-test start
request is sent to the controlling node, which is a lower version than the other nodes in the cluster. Previously, this request would not be propagated to the other nodes, since it would not be recognized on the leader node. This case is covered in the current PR, in which we attempt to parse the unknown test withparse_unknown_checks()
after serializing and sending the original JSON for the test to the followers.This currently only affects the
cloudcheck
, since it was added between versionsv24.1.x
andv24.2.x
. However, should more self tests be added in the future, they can be made version-compatible in the exact same way usingparse_unknown_checks()
.Also, fixes some
serde::version<>
bugs we had previously in variousself-test
related objects.Backports Required
Release Notes
Improvements