Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safe guard to reduce risk when a large number of nodes are tainted #2008

Conversation

johscheuer
Copy link
Member

Description

Add safe guard to reduce risk when a large number of nodes are tainted. The idea is to prevent the operator from automatically replacing any process groups on tainted nodes if too many nodes in the cluster are tainted. The MaxConcurrentReplacements is still in place, so this is basically a second safe guard.

Type of change

Please select one of the options below.

  • New feature (non-breaking change which adds functionality)

Discussion

Testing

Updated the unit testing setup for the taint feature and added additional tests.

Documentation

Added in the cluster docs.

Follow-up

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: fafdf24
  • Duration 2:20:54
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Copy link
Contributor

@nicmorales9 nicmorales9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not dig at test refactor, but the changes look good to me other than the comments!

@johscheuer
Copy link
Member Author

Merging as I only changed comments in the second commit.

@johscheuer johscheuer merged commit 1541a73 into FoundationDB:main Apr 29, 2024
7 of 8 checks passed
@johscheuer johscheuer deleted the limit-max-concurrent-trainted-process-groups branch April 29, 2024 13:48
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 11df00f
  • Duration 2:31:03
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants