Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector making api requests to Kubernetes API server without using resource_version #16797

Closed
jeremy-mi-rh opened this issue Mar 14, 2023 · 6 comments · Fixed by #17095
Closed
Labels
source: kubernetes_logs Anything `kubernetes_logs` source related type: bug A code related bug.

Comments

@jeremy-mi-rh
Copy link

jeremy-mi-rh commented Mar 14, 2023

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

This is to split issues 16753 to its own.

Context

We use vector to deliver kubernetes_logs to our kafka cluster which will be later processed and ingested into Humio. Vector is deployed as a daemon set in our kubernetes clusteres (each with >1000 nodes running).

We recently had an outage in one of our kubernetes clusters (with ~1100 nodes running). There was a failure in ETCD leader node, which resulted in a cascaded failure where pods making 1000x API calls to our API server which eventually brought the kubernetes control plane down entirely.

In the process of remediation, we identified vector as one of the candidate that was hammering the API server. Shutting down vector along with a few other daemon sets eventually reduced the traffic on Control Plane components, which allows ETCD nodes to recover.

Issue: resource_version not set when making API requests to Kube API server

Based on this issue: #7943, resource_version was set to 0 in vector 0.18 - 0.20. PR #11714 adopted kube-rs and dropped the change in #9974. When looking at the Audit logs from kube-api-server, we don't see the resource_version set in the request URL, which makes us wonder if this was an regression

Sample request in audit logs:

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Request","auditID":"xxx","stage":"ResponseComplete","requestURI":"/api/v1/nodes?\u0026fieldSelector=metadata.name%3Dip-10-x-x-x.ec2.internal","verb":"list","user":{"username":"system:serviceaccount:vector:vector-agent","uid":"xxx","groups":["system:serviceaccounts","system:serviceaccounts:vector","system:authenticated"]},"sourceIPs":["10.x.x.x"],"objectRef":{"resource":"nodes","name":"ip-10-x-x-x.ec2.internal","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2023-03-08T02:29:15.992746Z","stageTimestamp":"2023-03-08T02:29:16.534351Z","annotations":{"authentication.k8s.io/legacy-token":"system:serviceaccount:vector:vector-agent","authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"default-reader-role-binding\" of ClusterRole \"cluster-read-all\" to Group \"system:authenticated\""}}

Version

vector 0.27.0 (x86_64-unknown-linux-gnu 5623d1e 2023-01-18)

References

#7943
#16753

@nabokihms
Copy link
Contributor

nabokihms commented Mar 16, 2023

Opened a PR against kube-rs to allow setting a resource version (for list requests only, for now).
kube-rs/kube#1162

@spencergilbert
Copy link
Contributor

Thanks @nabokihms!

@jeremy-mi-rh
Copy link
Author

Thanks @nabokihms !

@yashsarma
Copy link

Thanks @nabokihms

@nabokihms
Copy link
Contributor

kube-rs changes have been merged into master branch. Once the new version of the client is out, I will make the changes to vector accordingly.

@nabokihms
Copy link
Contributor

The new option is added to vector.
With use_apiserver_cache=true, it is possible to reduce the control plane pressure significantly.

Under the hood, this option makes vector use resource_version=0 for all initial list requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
source: kubernetes_logs Anything `kubernetes_logs` source related type: bug A code related bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants