Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORE-7000: Node ID/UUID Override #22972

Merged
merged 8 commits into from
Sep 20, 2024

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented Aug 20, 2024

Implements on-startup node UUID and ID override via CLI options or node config, as described in 2024-08-14 - RFC - Override Node UUID on Startup.

This PR was originally meant as a fully functional proof of concept, but given the mechanical simplicity of the approach (most of the code here is tests and serdes for the new config types), it has been promoted to PR.

Closes CORE-7000
Closes CORE-6830

TODO:

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.2.x
  • v24.1.x
  • v23.3.x

Release Notes

Improvements

  • Adds the ability to configure Node UUID and ID overrides at broker startup.

@oleiman oleiman self-assigned this Aug 20, 2024
@oleiman
Copy link
Member Author

oleiman commented Aug 21, 2024

/ci-repeat 1

@oleiman oleiman force-pushed the nodeuuid/core-7000/poc branch 4 times, most recently from 6b16c80 to 9d48a3f Compare August 24, 2024 04:33
@oleiman
Copy link
Member Author

oleiman commented Aug 24, 2024

/ci-repeat 1

@oleiman
Copy link
Member Author

oleiman commented Aug 24, 2024

@michael-redpanda - leaving this in draft pending final RFC signoff, but this is probably worth a look whenever you're ready 🙏

@oleiman oleiman force-pushed the nodeuuid/core-7000/poc branch from 9d48a3f to 1f9aff4 Compare August 24, 2024 05:03
@oleiman
Copy link
Member Author

oleiman commented Aug 24, 2024

/ci-repeat 1

Copy link
Contributor

@michael-redpanda michael-redpanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks really nice so far!

src/v/config/node_overrides.cc Outdated Show resolved Hide resolved
src/v/config/node_overrides.cc Show resolved Hide resolved
src/v/config/node_overrides.cc Show resolved Hide resolved
src/v/config/node_overrides.cc Outdated Show resolved Hide resolved
src/v/config/node_overrides.cc Outdated Show resolved Hide resolved
src/v/config/node_overrides.cc Outdated Show resolved Hide resolved
src/v/config/node_overrides.cc Outdated Show resolved Hide resolved
src/v/config/node_overrides.h Show resolved Hide resolved
src/v/config/node_overrides.h Outdated Show resolved Hide resolved
src/v/redpanda/application.cc Outdated Show resolved Hide resolved
@oleiman oleiman force-pushed the nodeuuid/core-7000/poc branch from 1f9aff4 to e0bea3a Compare August 28, 2024 23:27
@oleiman
Copy link
Member Author

oleiman commented Aug 28, 2024

force push contents:

  • various CR cleanups
  • add a CLI options test to the multi-node dt case

@oleiman
Copy link
Member Author

oleiman commented Aug 28, 2024

/ci-repeat 1

@oleiman
Copy link
Member Author

oleiman commented Aug 30, 2024

@oleiman oleiman force-pushed the nodeuuid/core-7000/poc branch from e0bea3a to 0e8c804 Compare August 30, 2024 19:24
@oleiman oleiman marked this pull request as ready for review August 30, 2024 19:36
@oleiman oleiman requested a review from a team as a code owner August 30, 2024 19:36
@oleiman oleiman requested a review from mmaslankaprv August 30, 2024 19:40
@oleiman oleiman changed the title CORE-7000: Node ID/UUID Override Proof of Concept CORE-7000: Node ID/UUID Override Aug 30, 2024
micheleRP
micheleRP previously approved these changes Aug 30, 2024
Copy link

@micheleRP micheleRP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from docs

@oleiman oleiman force-pushed the nodeuuid/core-7000/poc branch from 9474132 to b4656fd Compare September 9, 2024 22:56
@oleiman
Copy link
Member Author

oleiman commented Sep 9, 2024

force push contents:

  • Use a regex for cli parsing
  • remove some comments
  • ducktape cleanup

@oleiman oleiman requested a review from pgellert September 9, 2024 23:16
src/v/utils/uuid.h Outdated Show resolved Hide resolved
}
static bool decode(const Node& node, type& rhs) {
auto value = node.as<std::string>();
auto out = [&value]() -> std::optional<model::node_uuid> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any reason for the "construct lambda then immediately call" pattern?

Copy link
Member Author

@oleiman oleiman Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personal preference. "initialize this optional with the result of from_string or, if it throws, nullptr" feels a bit neater than "initialize this optional to nullptr then assign the result of from_string, unless it throws". The ID in the outer scope receives a value at exactly one code point.

@@ -2622,6 +2663,18 @@ void application::wire_up_and_start(::stop_signal& app_signal, bool test_mode) {
"Running with already-established node ID {}",
config::node().node_id());
node_id = config::node().node_id().value();
} else if (auto id = _node_overrides.node_id(); id.has_value()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to try to register with the cluster (next conditional branch) in this case as well. For two reasons:

  • it will get us fresh features and cluster config snapshots from the controller leader
  • it will allow us to reject erroneous configurations (e.g. if the UUID is already registered with a different id).

Not sure if all the code for this is there, but looks like at least the join RPC supports passing existing node ids.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll refactor the conditionals a little bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't we hit the problem of needing a controller leader if we try to register here? Or do we also want to change members_manager::handle_join_request to not require controller leadership when the node is trying to register with a known (node uuid, node id) pair?

members_manager::handle_join_request(join_node_request const req) {
using ret_t = result<join_node_reply>;
using status_t = join_node_reply::status_code;
bool node_id_assignment_supported = _feature_table.local().is_active(
features::feature::node_id_assignment);
bool req_has_node_uuid = !req.node_uuid.empty();
if (node_id_assignment_supported && !req_has_node_uuid) {
vlog(
clusterlog.warn,
"Invalid join request for node ID {}, node UUID is required",
req.node.id());
co_return errc::invalid_request;
}
std::optional<model::node_id> req_node_id = std::nullopt;
if (req.node.id() >= 0) {
req_node_id = req.node.id();
}
if (!node_id_assignment_supported && !req_node_id) {
vlog(
clusterlog.warn,
"Got request to assign node ID, but feature not active",
req.node.id());
co_return errc::invalid_request;
}
if (
req_has_node_uuid
&& req.node_uuid.size() != model::node_uuid::type::length) {
vlog(
clusterlog.warn,
"Invalid join request, expected node UUID or empty; got {}-byte "
"value",
req.node_uuid.size());
co_return errc::invalid_request;
}
model::node_uuid node_uuid;
if (!req_node_id && !req_has_node_uuid) {
vlog(clusterlog.warn, "Node ID assignment attempt had no node UUID");
co_return errc::invalid_request;
}
ss::sstring node_uuid_str = "no node_uuid";
if (req_has_node_uuid) {
node_uuid = model::node_uuid(uuid_t(req.node_uuid));
node_uuid_str = ssx::sformat("{}", node_uuid);
}
vlog(
clusterlog.info,
"Processing node '{} ({})' join request (version {}-{})",
req.node.id(),
node_uuid_str,
req.earliest_logical_version,
req.latest_logical_version);
if (!_raft0->is_elected_leader()) {
vlog(clusterlog.debug, "Not the leader; dispatching to leader node");
// Current node is not the leader have to send an RPC to leader
// controller
co_return co_await dispatch_rpc_to_leader(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good point, maybe it won't work then.

Copy link
Member Author

@oleiman oleiman Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the exact requirement we're trying to circumvent. i was wondering whether the cluster layer might short circuit somewhere (or be made to short circuit) on previously known <ID,UUID>, but I suppose all the request routing is controller leadership based 🤷

self.logger.debug(
f"...and decommission ghost node [{ghost_node_id}]...")

self.admin.decommission_broker(ghost_node_id, node=to_stop[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this won't actually wait for the decom to be successful.


if mode == TestMode.CFG_OVERRIDE:
self.redpanda.restart_nodes(
to_stop,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we are already kind of testing that the nodes will only adopt overrides that match the current uuid, but maybe it will be more realistic to do what k8s will do and perform a full rolling restart.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Will do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm beginning to think that "rolling restart" (I used this language in the RFC and the runbook) is not quite accurate. In DT, for example, we have a RollingRestarter that a) uses maintenance mode by default b) requires the cluster to be healthy before cycling each node. Presumably a k8s rolling restart looks somewhat similar.

Of course we can't meet either of those in this usage case. In DT we can fudge it with an unsafe param or something, but we may want to reconsider this framing with respect to support-facing docs.

Copy link
Member Author

@oleiman oleiman Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worse than that, I think - in this test case, I believe a rolling restart is straightforwardly impossible, since restarting one node (with overrides) gives us a total of 2 live brokers - not enough for to form a cluster due to the presence of ghost nodes.

My intuition is that we can start the nodes concurrently (as written) as long as the number of nodes we're restarting is <= (n_nodes + n_ghosts) / 2. What matters is that the empty nodes don't form a cluster among themselves, independent of any nodes containing a complete controller log, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense to me. We want to restart the minimum number of ghost nodes to form a healthy cluster and wait for them to form a healthy controller quorum before continuing to restart the remaining nodes.

It makes sense to call out in the support docs not to use decommissioning or maintenance mode during any of this.

Perhaps the most important thing here is to not restart the healthy nodes to make sure they are available throughout the whole process.

Copy link
Member Author

@oleiman oleiman Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most important thing here is to not restart the healthy nodes

yest exactly. in fact, if we restart those nodes, I think they will get stuck just the same.

call out in the support docs not to use decommissioning or maintenance mode

Yup, and that they are unavailable.

Generally, I think we're back to the point where we have a stack of assumptions that look right but would benefit from a ✅ from Alexey or @mmaslankaprv when he becomes available

To enable boost::lexical_cast for program_options parsing and UUIDs in configs.

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
- config::node_id_override
- config::node_override_store

Includes json/yaml SerDes and unit tests

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
- node_id_overrides: std::vector<config::node_id_override>

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman force-pushed the nodeuuid/core-7000/poc branch from b4656fd to 0ed66ce Compare September 11, 2024 00:20
@oleiman
Copy link
Member Author

oleiman commented Sep 11, 2024

force push CR changes, mostly hardening tests

tests/rptest/tests/admin_uuid_operations_test.py Outdated Show resolved Hide resolved

if mode == TestMode.CFG_OVERRIDE:
self.redpanda.restart_nodes(
to_stop,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense to me. We want to restart the minimum number of ghost nodes to form a healthy cluster and wait for them to form a healthy controller quorum before continuing to restart the remaining nodes.

It makes sense to call out in the support docs not to use decommissioning or maintenance mode during any of this.

Perhaps the most important thing here is to not restart the healthy nodes to make sure they are available throughout the whole process.

tests/rptest/tests/admin_uuid_operations_test.py Outdated Show resolved Hide resolved
@oleiman oleiman force-pushed the nodeuuid/core-7000/poc branch from 0ed66ce to 2b81be0 Compare September 11, 2024 16:32
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
And wire them into the corresponding node configs.

"--node-id-overrides uuid:uuid:id [uuid:uuid:id ...]"

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
and 'restart_nodes'

Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
@oleiman oleiman force-pushed the nodeuuid/core-7000/poc branch from 2b81be0 to ace0d61 Compare September 11, 2024 16:43
Copy link
Contributor

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

brokers = self.admin.get_brokers(node=n)
for b in brokers:
if b['node_id'] == node_id:
results.append(b['membership_status'] != 'active')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it is better to wait until the node fully disappears ("draining" membership status means that the decom just started).

@michael-redpanda michael-redpanda merged commit a2bbe5a into redpanda-data:dev Sep 20, 2024
18 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.1.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-22972-v24.1.x-79 remotes/upstream/v24.1.x
git cherry-pick -x 9cfd280e986fe256d570e5e486b8c54205315a1c dddbe3ec7e3c5d9494dafca6ea5ad896d439b21c 4d573fab182d4768301724bbbdbfd2a66d7c71de d2b85f2d76e7d1aae21da3a817e76af497d3d678 c96f4ae5c08aad29d95f0be3126dc9fb2066976d 951cc773482cd13fce834207dfdf9e7c8e59a9cc afe076ddbdfe0e25665956e25ab9634120310b66 ace0d611d44a313005ba6f56b565087e7c23aad9

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants