Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only update last reconcile status when there are resource changes #1281

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

cezarsa
Copy link
Contributor

@cezarsa cezarsa commented Mar 4, 2024

While rolling out Terranetes I noticed that on some of our clusters (the ones with a larger number of Configuration resources) terranetes was continuously reconciling all resources even when there were no changes. The drift controller was causing this and there was a significant increase in the CPU usage for the terranetes-controller pod. This can be seen on this graph for the workqueue add rate:
Screenshot 2024-03-04 at 12 14 27

Upon further investigation, I noticed this was caused by the drift controller continuously updating the timestamp at Configuration.Status.LastReconcile.Time. The reason the issue only manifested on some clusters was due to how long the drift reconciler takes to run.

Since the Configuration.Status.LastReconcile.Time field is serialized with seconds resolution (e.g. "2024-03-04T14:00:19Z") if the reconciler takes less than 1 second to run, the serialized value will be the same and the Configuration resource will remain unchanged. This is the scenario I saw on some smaller clusters where this issue wasn't present.

However, if reconciling takes longer than 1 second, the controller will update the resource causing the informer to notice the change and enqueue a new reconciliation, which can take more than 1s again and the process repeats ad infinitum. This could happen due to too many Configuration resources or other constraints to the controller Pod.

Now to the proposed fix, I can see two different paths for a solution here. The first would be adding an event filter to the drift controller to ignore changes to the lastReconcile timestamps. The problem with this is that it would only apply to the Terranetes controllers, and every external controllers/operators watching Terranetes resources would also have to apply that same filter to avoid being spammed with reconciliations.

The other option is the one I'm submitting on this PR. I propose the LastReconcile timestamps be updated only if there are other changes to the resource as changing only the timestamp isn't helpful to watchers. This is similar to the behavior of timestamps on Conditions which only track the latest change. I also added some basic unit tests to verify this new behavior.

@cezarsa cezarsa requested a review from gambol99 as a code owner March 4, 2024 15:40
@gambol99 gambol99 changed the base branch from master to develop March 4, 2024 17:18
@gambol99 gambol99 self-assigned this Mar 4, 2024
@gambol99 gambol99 added the bug Something isn't working label Mar 4, 2024
@gambol99
Copy link
Member

gambol99 commented Mar 5, 2024

Thank you kindly @cezarsa

Now to the proposed fix, I can see two different paths for a solution here. The first would be adding an event filter to the drift controller to ignore changes to the lastReconcile timestamps. The problem with this is that it would only apply to the Terranetes controllers, and every external controllers/operators watching Terranetes resources would also have to apply that same filter to avoid being spammed with reconciliations.

I’m inclined to agree on this one … We just need to do some internal testing to get it cut into the next release …

@gambol99 gambol99 merged commit 87031b8 into appvia:develop Mar 5, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants