Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pods check needs removal #1954

Merged

Conversation

simenl
Copy link
Collaborator

@simenl simenl commented Mar 3, 2024

Description

Fixes bug where delete strategy is applied on pods not marked for removal in the replaceMisconfiguredProcessGroups step, #1918, by checking if the pod needs removal in the updatePods reconciler.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Discussion

Are there any design details that you would like to discuss further?

Testing

In addition to the ginkgo tests, also tested in a kubernetes deployment, verifying that update deletion strategy will not be applied if the update requires a removal / replacement.

@simenl simenl marked this pull request as ready for review March 5, 2024 12:16
@simenl simenl requested a review from johscheuer March 5, 2024 12:16
Copy link
Member

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try to add an additional test case for this in our e2e test framework? We should be adding this to the operator tests here: https://github.com/FoundationDB/fdb-kubernetes-operator/blob/main/e2e/test_operator/operator_test.go.

Thanks for the PR!

Changes look mostly fine, just a few comments from my side and it would be great if we could add an e2e test.

internal/replacements/replacements.go Outdated Show resolved Hide resolved
internal/replacements/replacements.go Show resolved Hide resolved
internal/replacements/replacements.go Outdated Show resolved Hide resolved
Co-authored-by: Johannes Scheuermann <johscheuer@users.noreply.github.com>
@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 634eb6d
  • Duration 1:54:19
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Copy link
Member

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make sure that the tests are passing? Based on the CI output you have to run the make fmt command. In general it's a good idea to run make fmt lint test before pushing the changes.

internal/replacements/replacements.go Outdated Show resolved Hide resolved
@simenl
Copy link
Collaborator Author

simenl commented Mar 11, 2024

@johscheuer
I've adjusted the PR, addressing the comments. Note: Only removed the reconciler part from logger.withValues.

(I haven't added e2e tests, this requires more investment from my side. Do you have any recommendations for reducing the feedback loop when implementing a new test?)

Copy link
Member

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johscheuer
I've adjusted the PR, addressing the comments. Note: Only removed the reconciler part from logger.withValues.

Thanks, I take a look today.

(I haven't added e2e tests, this requires more investment from my side. Do you have any recommendations for reducing the feedback loop when implementing a new test?)

You can run a single e2e test with prefixing the test case with F, that's the normal Ginkgo feature. If you run frequent tests, it might be faster to reuse ac cluster: https://github.com/FoundationDB/fdb-kubernetes-operator/tree/main/e2e#reusing-an-existing-test-cluster. Depending on your test, you just have to make sure the cluster is in a good shape again.

Copy link
Member

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM. It would be great if you could add an e2e test for this behaviour.

@foundationdb-ci
Copy link

Result of fdb-kubernetes-operator-pr on Linux CentOS 7

  • Commit ID: 44bf821
  • Duration 2:16:56
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@simenl
Copy link
Collaborator Author

simenl commented Mar 14, 2024

Added an e2e tests which:

  1. Sets maxConcurrentReplacements=3 and replaceInstancesWhenResourcesChange=true
  2. Change the spec for storage pods, by increasing the cpu request of the foundationdb container
  3. Wait for reconciliation.
  4. Check if there is no overlap between current set of PVCs and the previous set of PVCs
    a. Update by replacement will use a new PVC
    b. Update by deletion will reuse the PVC

In the current version, replacement will start on 3 of the storage pods, while the remaining 2 are updated through deletion.
With this PR, replacement will start on 3 of the storage pods, and the remaining storage pods will not be updated through deletion, but wait for replacement.

For step 4, it might be more robust to compare names of the volumes referred by the PVC, and not the whole PVC object

Copy link
Member

@johscheuer johscheuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM 👍 I left a comment on the e2e test but it's only informational.

Thanks for the PR!

e2e/test_operator/operator_test.go Show resolved Hide resolved
e2e/test_operator/operator_test.go Show resolved Hide resolved
@johscheuer johscheuer merged commit 6d65ccf into FoundationDB:main Mar 15, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants