Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer upgraded processes for coordinators when the operator chooses the coordinators #2017

Conversation

johscheuer
Copy link
Member

Description

Fixes: #2015

Type of change

Please select one of the options below.

  • New feature (non-breaking change which adds functionality)

Discussion

As mentioned in the referenced issue, during an upgrade it can happen that the operator chooses new coordinators for the Pod that is currently "updated" (deleted and created). In the previous setup the operator didn't differentiate between upgraded processes and processes to be upgraded. Choosing preferable upgraded processes as coordinators should reduce the overall recoveries caused by coordinator changes.

In addition we opted to prefer log/transaction processes as coordinators as those are not going though the rolling recreation phase but instead all are replaced.

Testing

Updated unit tests and current e2e test should cover all important cases.

Documentation

I can update the docs in another PR. I have to revisit them anyways.

Follow-up

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 0cd7bce
  • Duration 2:14:34
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer johscheuer closed this May 9, 2024
@johscheuer johscheuer reopened this May 9, 2024
controllers/change_coordinators.go Outdated Show resolved Hide resolved
controllers/choose_removals.go Show resolved Hide resolved
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 0cd7bce
  • Duration 2:22:43
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer
Copy link
Member Author

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 0cd7bce
  • Duration 2:22:43
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
Summarizing 1 Failure:
  [FAIL] Operator when the operator is allowed to reset the maintenance zone [It] should reset the maintenance mode once the Pod was restarted [e2e, pr]
  /codebuild/output/src3153794519/src/github.com/FoundationDB/fdb-kubernetes-operator/e2e/test_operator/operator_test.go:2072

@johscheuer johscheuer closed this May 13, 2024
@johscheuer johscheuer reopened this May 13, 2024
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 0cd7bce
  • Duration 2:00:46
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 0cd7bce
  • Duration 2:22:08
  • Result: ❌ FAILED
  • Error: Error while executing command: if $fail_test; then exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 20b7861
  • Duration 2:12:54
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Copy link
Contributor

@nicmorales9 nicmorales9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm as long as it's not possible for /var/-prefix command line to be used for a fully upgraded process

api/v1beta2/foundationdbcluster_types_test.go Show resolved Hide resolved
controllers/choose_removals.go Show resolved Hide resolved
controllers/change_coordinators.go Outdated Show resolved Hide resolved
@johscheuer johscheuer force-pushed the prefer-upgraded-processes-for-coordinators branch from 092860b to 9631c1b Compare May 14, 2024 12:52
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 092860b
  • Duration 2:32:16
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@johscheuer johscheuer merged commit 8525360 into FoundationDB:main May 14, 2024
7 of 8 checks passed
@johscheuer johscheuer deleted the prefer-upgraded-processes-for-coordinators branch May 14, 2024 13:26
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 9631c1b
  • Duration 2:25:08
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Coordinator changes during upgrades can be disruptive
3 participants