Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Status Updater Retrying on Failures #1062

Merged
merged 16 commits into from
Sep 27, 2023

Conversation

bjee19
Copy link
Contributor

@bjee19 bjee19 commented Sep 15, 2023

Proposed changes

Problem: NKG will not retry on status update failure, thus there is a chance that some resources will not have up-to-do statuses.

Solution: Add retry logic when status update fails with a small exponential backoff after each retry. Also, added logic to allow for a graceful exit of the status updater when the NKG pod context is cancelled.

Testing: Manually tested that the issue found here: #563 no longer occurs. Added unit tests for retry logic. Updated tests so they correctly test that when the context is canceled, the status updater will no longer update statuses.

Please focus on (optional): Feedback on how many attempts the status updater should retry before returning an error and moving on. Also, what the starting backoff sleep time should be.

Closes #1016

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

@bjee19 bjee19 requested a review from a team as a code owner September 15, 2023 21:29
@github-actions github-actions bot added the enhancement New feature or request label Sep 15, 2023
@bjee19
Copy link
Contributor Author

bjee19 commented Sep 15, 2023

Ran into a tricky bug and got help from Saylor where the fake client we use in tests may not be honoring a canceled context and was still getting and updating statuses even after having its context canceled. This was a problem because when the context is canceled, the blocking select statement in the status updater's Update() will be unblocked, while the goroutine finishes in the background. However, even with a canceled context, the goroutine was still getting and updating statuses. Thus we ran into a race where the tests would expect everything to be finished when updater.Update() returns and it would check statements while statuses were still being changed in the goroutine. To fix this I added a check at the top of updater() so that if the context was canceled it wouldn't continue. The issue may only exist with how our fake client doesn't respect a canceled context.

internal/framework/status/statusfakes/fake_client.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/statusfakes/fake_client.go Outdated Show resolved Hide resolved
internal/framework/status/updater_test.go Outdated Show resolved Hide resolved
internal/framework/status/statusfakes/fake_client.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater_test.go Outdated Show resolved Hide resolved
internal/framework/status/updater_test.go Outdated Show resolved Hide resolved
internal/framework/status/updater_test.go Outdated Show resolved Hide resolved
internal/framework/status/updater_test.go Outdated Show resolved Hide resolved
@bjee19
Copy link
Contributor Author

bjee19 commented Sep 21, 2023

Manually tested that the issue found here: #563 still no longer occurs.

internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater_test.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater.go Outdated Show resolved Hide resolved
@bjee19
Copy link
Contributor Author

bjee19 commented Sep 26, 2023

A couple things I'm a bit unsure of:

  1. Is there a better name than ConditionWithContextFunc?
  2. Is it alright that I called the new interface StatusUpdater and put a nolint on it because it was complaining that the interface started with the same word as the package status
  3. How are the tests and is it alright that I split them up like so?

@kate-osborn
Copy link
Contributor

A couple things I'm a bit unsure of:

  1. Is there a better name than ConditionWithContextFunc?
  2. Is it alright that I called the new interface StatusUpdater and put a nolint on it because it was complaining that the interface started with the same word as the package status
  3. How are the tests and is it alright that I split them up like so?
  1. Probably. Naming is hard. Maybe NewRetryUpdateFunc?
  2. Yeah, it's ok. Again, naming is hard. Too bad that we already have an Updater interface. We could call it K8sUpdater. 🤷‍♀️
  3. I think splitting them is fine, but it would be nice if the test file had the prefix updater so it shows up next to the updater_test.go. Maybe updater_retry_test.go.

I wonder if we want to start a new pattern where we name the unit test files unit. Like updater_unit_test.go. Or alternatively, name the gomega test files using bdd.

@bjee19
Copy link
Contributor Author

bjee19 commented Sep 26, 2023

I think naming the unit test files unit is a good idea to me!

Copy link
Contributor

@kate-osborn kate-osborn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few small nits!

internal/framework/status/updater.go Outdated Show resolved Hide resolved
internal/framework/status/updater_retry_test.go Outdated Show resolved Hide resolved
internal/framework/status/updater_retry_test.go Outdated Show resolved Hide resolved
@bjee19 bjee19 merged commit 8e8a2bf into nginxinc:main Sep 27, 2023
23 checks passed
@bjee19 bjee19 deleted the enh/status-updater-retry branch November 20, 2023 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Retry on status update failures
4 participants