V1: Prevent configs from becoming outdated when scaling down #139

chrisseto · 2024-05-13T16:09:00Z

`0dd40b5` move setup-envtest to nix

This commit moves the setup-envtest shell scripts downloaded by
taskfile to the go binary equivalent and instead utilizes nix to install
it.

`c84a283` V1: Prevent configs from becoming outdated when scaling down

Previously a collection of configurations (Seed Brokers, PandaProxy
Client, and SchemaRegistry Client) could become out of date when scaling
a cluster down. For example 5 replicas to 3. A special case of the
config hasher would ignore these fields to prevent needless restarts in
the case of scaling up which left stale values when scaling down. These
stale configurations would prevent various components of redpanda from
working correctly.

To mitigate these issues, this commit:

Changes the PandaProxy and SchemaRegistry clients configurations to
instead always point at redpanda's headless service.
Caps the list of seed brokers to a maximum of 3.

While these changes will prevent needless restarts in the future, the
upgrade to this new configuration WILL result in a single rolling
restart.

Fixes #125

BenPope · 2024-05-13T16:27:18Z

src/go/k8s/pkg/resources/configmap.go

+
+	const maxSeedServers = 3
+
+	for i := int32(0); i < replicas && i < maxSeedServers; i++ {


When we get the set of replicas, are they in any particular order?

When scaling down, will the first replicas in this list be the ones that remain?

When doing a rolling upgrade of infrastructure (e.g., new K8s version), will this list be sufficient if the nodes are all going away?

Is cr.EmptySeedStartsCluster = False?

How does this work when there is multi-AZ?

I don't remember what happens when I node in the seed_server list can't be found, is that why the list has been truncated to 3?

When scaling down, will the first replicas in this list be the ones that remain?

Yes, this list will always be replicas 0, 1, and 2. Thanks to the STS's ordering guarantees we can rely on them always being around.

When doing a rolling upgrade of infrastructure (e.g., new K8s version), will this list be sufficient if the nodes are all going away?

The listed nodes will always exist (again STS ordinal system). It should be fine as long as nodes are migrated one at a time and this wouldn't really function any differently than what was previously present. Especially in the case of 3 node clusters.

Is cr.EmptySeedStartsCluster = False?

That's handled on the line above and it looks like it's always set to false if it's a supported redpanda version.

How does this work when there is multi-AZ?

I don't really see how that's relevant but it would function the same way it previously worked. This logic only capped the length of the list. The contents is otherwise computed the exact same way.

chrisseto · 2024-05-14T21:17:50Z

The failure is from decommission-on-delete :/ I'll try running it a few more times but It might be time to skip that test and toss an issue in the backlog.

This commit moves the `setup-envtest` shell scripts downloaded by taskfile to the go binary equivalent and instead utilizes nix to install it.

Previously a collection of configurations (Seed Brokers, PandaProxy Client, and SchemaRegistry Client) could become out of date when scaling a cluster down. For example 5 replicas to 3. A special case of the config hasher would ignore these fields to prevent needless restarts in the case of scaling up which left stale values when scaling down. These stale configurations would prevent various components of redpanda from working correctly. To mitigate these issues, this commit: - Changes the PandaProxy and SchemaRegistry clients configurations to instead always point at redpanda's headless service. - Caps the list of seed brokers to a maximum of 3. While these changes will prevent needless restarts in the future, the upgrade to this new configuration WILL result in a single rolling restart. Fixes #125

Current tests are asserting old behavior of brokers list defined in pandaproxy client. With this commit the headless service address will be compared with Redpanda configuration.

RafalKorepta · 2024-05-15T10:00:34Z

Currently kuttl/harness/additional-configuration end to end test is failing.

https://buildkite.com/redpanda/redpanda-operator/builds/1440#018f7b4f-7f3a-45da-80a2-cf015134e2a2/1262-4470

logger.go:42: 09:41:07 \| additional-configuration/3-verify-config-changed \| Difference:
--
  | logger.go:42: 09:41:07 \| additional-configuration/3-verify-config-changed \| 13c13
  | logger.go:42: 09:41:07 \| additional-configuration/3-verify-config-changed \| <   - address: additional-configuration.redpanda-system.svc.cluster.local.
  | logger.go:42: 09:41:07 \| additional-configuration/3-verify-config-changed \| ---
  | logger.go:42: 09:41:07 \| additional-configuration/3-verify-config-changed \| >   - address: additional-configuration-0.additional-configuration.redpanda-system.svc.cluster.local.

It should be fixed with 1844219

chrisseto · 2024-05-15T14:05:00Z

Dang! I missed that one. Thanks for fixing!

chrisseto requested a review from RafalKorepta as a code owner May 13, 2024 16:09

chrisseto self-assigned this May 13, 2024

chrisseto changed the title ~~move setup-envtest to nix~~ V1: Prevent configs from becoming outdated when scaling down May 13, 2024

chrisseto requested review from koikonom and BenPope May 13, 2024 16:09

BenPope reviewed May 13, 2024

View reviewed changes

chrisseto force-pushed the chris/fix-scale-in branch 2 times, most recently from 17047b4 to 805a5c2 Compare May 13, 2024 21:20

RafalKorepta approved these changes May 14, 2024

View reviewed changes

chrisseto force-pushed the chris/fix-scale-in branch 2 times, most recently from 62b4fff to 1167565 Compare May 14, 2024 19:26

RafalKorepta approved these changes May 14, 2024

View reviewed changes

chrisseto added 2 commits May 15, 2024 10:12

move setup-envtest to nix

eb000b8

This commit moves the `setup-envtest` shell scripts downloaded by taskfile to the go binary equivalent and instead utilizes nix to install it.

RafalKorepta force-pushed the chris/fix-scale-in branch from 1167565 to 7b4a5a7 Compare May 15, 2024 08:12

Adjust pandaproxy client headless list of brokers with headless service

1844219

Current tests are asserting old behavior of brokers list defined in pandaproxy client. With this commit the headless service address will be compared with Redpanda configuration.

RafalKorepta approved these changes May 15, 2024

View reviewed changes

chrisseto merged commit ad3ea63 into main May 15, 2024
5 checks passed

chrisseto deleted the chris/fix-scale-in branch May 15, 2024 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V1: Prevent configs from becoming outdated when scaling down #139

V1: Prevent configs from becoming outdated when scaling down #139

chrisseto commented May 13, 2024 •

edited

Loading

BenPope May 13, 2024 •

edited

Loading

chrisseto May 13, 2024

chrisseto commented May 14, 2024

RafalKorepta commented May 15, 2024

chrisseto commented May 15, 2024


		const maxSeedServers = 3

		for i := int32(0); i < replicas && i < maxSeedServers; i++ {

V1: Prevent configs from becoming outdated when scaling down #139

V1: Prevent configs from becoming outdated when scaling down #139

Conversation

chrisseto commented May 13, 2024 • edited Loading

0dd40b5 move setup-envtest to nix

c84a283 V1: Prevent configs from becoming outdated when scaling down

BenPope May 13, 2024 • edited Loading

Choose a reason for hiding this comment

chrisseto May 13, 2024

Choose a reason for hiding this comment

chrisseto commented May 14, 2024

RafalKorepta commented May 15, 2024

chrisseto commented May 15, 2024

chrisseto commented May 13, 2024 •

edited

Loading

`0dd40b5` move setup-envtest to nix

`c84a283` V1: Prevent configs from becoming outdated when scaling down

BenPope May 13, 2024 •

edited

Loading