Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: 🐛 use reconcile to reduce thrash in pod watch and service watch #364

Closed
wants to merge 2 commits into from

Conversation

cmwylie19
Copy link
Contributor

@cmwylie19 cmwylie19 commented Apr 23, 2024

Description

We are watching for changes to the Kubernetes service and Neuvector Jobs in order to update network policy. This triggers a cascade of events on each change leading to likely thrash in the kube-apiserver. If we put them into a queue then they will be processes one at a time in the order in which the event came in. Cutting down the load on the API Server.

Visual Proof

Look at the wild spikes in CPU on the watcher. Now with this ordered processing, it seems to throttle the amount used and therefore is not hitting that strange frozen state.

See the issue for additional screenshots

image

52 mins no errors

image
image

First failure after 78 mins (memory seems to be growing)
image

Related Issue

Fixes - Hopefully 🤞 363

Relates to #363 defenseunicorns/pepr#745

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Other (security config, docs update, etc)

Checklist before merging

Signed-off-by: Case Wylie <cmwylie19@defenseunicorns.com>
Signed-off-by: Case Wylie <cmwylie19@defenseunicorns.com>
@cmwylie19 cmwylie19 requested a review from mjnagel April 23, 2024 15:05
@cmwylie19 cmwylie19 changed the title 🐛 : use reconcile to reduce thrash in pod watch and service watch chore: 🐛 use reconcile to reduce thrash in pod watch and service watch Apr 23, 2024
@mjnagel
Copy link
Contributor

mjnagel commented Apr 23, 2024

I opened a parallel PR to added reconcile for the service watch but not the pod one - #362. Might let this one simmer for a bit and see if it seems to help before pressing forward - thinking about the case here for our job termination...unless we are seeing duplicate attempts to terminate the sidecar, I'm not sure sequential/queue-based processing is needed here.

@cmwylie19
Copy link
Contributor Author

no problem, we can close this. thanks! Close as you see fit

@mjnagel mjnagel closed this Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants