Only update last reconcile status when there are resource changes #1281

cezarsa · 2024-03-04T15:40:15Z

While rolling out Terranetes I noticed that on some of our clusters (the ones with a larger number of Configuration resources) terranetes was continuously reconciling all resources even when there were no changes. The drift controller was causing this and there was a significant increase in the CPU usage for the terranetes-controller pod. This can be seen on this graph for the workqueue add rate:

Upon further investigation, I noticed this was caused by the drift controller continuously updating the timestamp at Configuration.Status.LastReconcile.Time. The reason the issue only manifested on some clusters was due to how long the drift reconciler takes to run.

Since the Configuration.Status.LastReconcile.Time field is serialized with seconds resolution (e.g. "2024-03-04T14:00:19Z") if the reconciler takes less than 1 second to run, the serialized value will be the same and the Configuration resource will remain unchanged. This is the scenario I saw on some smaller clusters where this issue wasn't present.

However, if reconciling takes longer than 1 second, the controller will update the resource causing the informer to notice the change and enqueue a new reconciliation, which can take more than 1s again and the process repeats ad infinitum. This could happen due to too many Configuration resources or other constraints to the controller Pod.

Now to the proposed fix, I can see two different paths for a solution here. The first would be adding an event filter to the drift controller to ignore changes to the lastReconcile timestamps. The problem with this is that it would only apply to the Terranetes controllers, and every external controllers/operators watching Terranetes resources would also have to apply that same filter to avoid being spammed with reconciliations.

The other option is the one I'm submitting on this PR. I propose the LastReconcile timestamps be updated only if there are other changes to the resource as changing only the timestamp isn't helpful to watchers. This is similar to the behavior of timestamps on Conditions which only track the latest change. I also added some basic unit tests to verify this new behavior.

gambol99 · 2024-03-05T17:55:38Z

Thank you kindly @cezarsa …

Now to the proposed fix, I can see two different paths for a solution here. The first would be adding an event filter to the drift controller to ignore changes to the lastReconcile timestamps. The problem with this is that it would only apply to the Terranetes controllers, and every external controllers/operators watching Terranetes resources would also have to apply that same filter to avoid being spammed with reconciliations.

I’m inclined to agree on this one … We just need to do some internal testing to get it cut into the next release …

Only update reconcile status when there are resource changes

0f2b2e9

cezarsa requested a review from gambol99 as a code owner March 4, 2024 15:40

gambol99 changed the base branch from master to develop March 4, 2024 17:18

gambol99 self-assigned this Mar 4, 2024

gambol99 added the bug Something isn't working label Mar 4, 2024

gambol99 approved these changes Mar 5, 2024

View reviewed changes

gambol99 merged commit 87031b8 into appvia:develop Mar 5, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only update last reconcile status when there are resource changes #1281

Only update last reconcile status when there are resource changes #1281

cezarsa commented Mar 4, 2024 •

edited

Loading

gambol99 commented Mar 5, 2024

Only update last reconcile status when there are resource changes #1281

Only update last reconcile status when there are resource changes #1281

Conversation

cezarsa commented Mar 4, 2024 • edited Loading

gambol99 commented Mar 5, 2024

cezarsa commented Mar 4, 2024 •

edited

Loading