Prefer upgraded processes for coordinators when the operator chooses the coordinators #2017

johscheuer · 2024-05-08T17:53:12Z

Description

Fixes: #2015

Type of change

Please select one of the options below.

New feature (non-breaking change which adds functionality)

Discussion

As mentioned in the referenced issue, during an upgrade it can happen that the operator chooses new coordinators for the Pod that is currently "updated" (deleted and created). In the previous setup the operator didn't differentiate between upgraded processes and processes to be upgraded. Choosing preferable upgraded processes as coordinators should reduce the overall recoveries caused by coordinator changes.

In addition we opted to prefer log/transaction processes as coordinators as those are not going though the rolling recreation phase but instead all are replaced.

Testing

Updated unit tests and current e2e test should cover all important cases.

Documentation

I can update the docs in another PR. I have to revisit them anyways.

Follow-up

foundationdb-ci · 2024-05-08T20:07:57Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: 0cd7bce
Duration 2:14:34
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

controllers/change_coordinators.go

controllers/choose_removals.go

foundationdb-ci · 2024-05-09T08:05:33Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: 0cd7bce
Duration 2:22:43
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

johscheuer · 2024-05-13T06:48:03Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: 0cd7bce

Duration 2:22:43

Result: ❌ FAILED

Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1

Build Log terminal output (available for 30 days)

Build Workspace zip file of the working directory (available for 30 days)

Summarizing 1 Failure:
  [FAIL] Operator when the operator is allowed to reset the maintenance zone [It] should reset the maintenance mode once the Pod was restarted [e2e, pr]
  /codebuild/output/src3153794519/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/test_operator/operator_test.go:2072

foundationdb-ci · 2024-05-13T08:49:03Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: 0cd7bce
Duration 2:00:46
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2024-05-13T16:15:11Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: 0cd7bce
Duration 2:22:08
Result: ❌ FAILED
Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2024-05-13T17:42:03Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: 20b7861
Duration 2:12:54
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

nicmorales9

lgtm as long as it's not possible for /var/-prefix command line to be used for a fully upgraded process

api/v1beta2/foundationdbcluster_types_test.go

controllers/choose_removals.go

controllers/change_coordinators.go

…the coordinators

foundationdb-ci · 2024-05-14T13:21:38Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: 092860b
Duration 2:32:16
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2024-05-14T15:17:39Z

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

Commit ID: 9631c1b
Duration 2:25:08
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

johscheuer added the enhancement New feature or request label May 8, 2024

johscheuer requested a review from nicmorales9 May 8, 2024 17:53

johscheuer mentioned this pull request May 8, 2024

Consider doing exclusion and coordinator change at once #2018

Closed

johscheuer closed this May 9, 2024

johscheuer reopened this May 9, 2024

johscheuer commented May 9, 2024

View reviewed changes

controllers/change_coordinators.go Outdated Show resolved Hide resolved

controllers/choose_removals.go Show resolved Hide resolved

johscheuer closed this May 13, 2024

johscheuer reopened this May 13, 2024

johscheuer closed this May 13, 2024

johscheuer reopened this May 13, 2024

johscheuer mentioned this pull request May 13, 2024

Choose removals should prefer process groups with conditions for removal. #2020

Open

nicmorales9 approved these changes May 14, 2024

View reviewed changes

api/v1beta2/foundationdbcluster_types_test.go Show resolved Hide resolved

controllers/choose_removals.go Show resolved Hide resolved

controllers/change_coordinators.go Outdated Show resolved Hide resolved

johscheuer added 3 commits May 14, 2024 14:52

Prefer upgraded processes for coordinators when the operator chooses …

e699cf1

…the coordinators

Correct comment for code

d20009c

Fix comment

9631c1b

johscheuer force-pushed the prefer-upgraded-processes-for-coordinators branch from 092860b to 9631c1b Compare May 14, 2024 12:52

johscheuer merged commit 8525360 into FoundationDB:main May 14, 2024
7 of 8 checks passed

johscheuer deleted the prefer-upgraded-processes-for-coordinators branch May 14, 2024 13:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefer upgraded processes for coordinators when the operator chooses the coordinators #2017

Prefer upgraded processes for coordinators when the operator chooses the coordinators #2017

johscheuer commented May 8, 2024

foundationdb-ci commented May 8, 2024

foundationdb-ci commented May 9, 2024

johscheuer commented May 13, 2024

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented May 13, 2024

foundationdb-ci commented May 13, 2024

foundationdb-ci commented May 13, 2024

nicmorales9 left a comment

foundationdb-ci commented May 14, 2024

foundationdb-ci commented May 14, 2024

Prefer upgraded processes for coordinators when the operator chooses the coordinators #2017

Prefer upgraded processes for coordinators when the operator chooses the coordinators #2017

Conversation

johscheuer commented May 8, 2024

Description

Type of change

Discussion

Testing

Documentation

Follow-up

foundationdb-ci commented May 8, 2024

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented May 9, 2024

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

johscheuer commented May 13, 2024

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented May 13, 2024

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented May 13, 2024

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented May 13, 2024

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

nicmorales9 left a comment

Choose a reason for hiding this comment

foundationdb-ci commented May 14, 2024

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

foundationdb-ci commented May 14, 2024

Result of fdb-kubernetes-operator-pr on Linux CentOS 7