Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve scalability of Kubernetes module in metricbeat #32662

Open
gsantoro opened this issue Aug 11, 2022 · 0 comments
Open

Improve scalability of Kubernetes module in metricbeat #32662

gsantoro opened this issue Aug 11, 2022 · 0 comments
Assignees
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team

Comments

@gsantoro
Copy link
Contributor

Context
During my investigation for the the PR 32539 I have noticed that there might be room for performance improvements in the Kubernetes module of metricbeat.

Each metricbeat instance is storing metrics about all the nodes in the Kubernetes cluster but only metrics about pods and containers on the same node where that instance of metricbeat is running. This replicates how the previous expiring cache worked but it is now more evident and can have detrimental effect in clusters with lots of nodes. This is because, with lots of nodes we might end up wasting lots of memory on unused metrics from other nodes. This behaviour is due to how the watcher notifies events from Kubernetes and it wasn't modified by the afore mentioned PR.

Possible solution is for each metricbeat to filter out events generated by other nodes than the one where it is running. This should simplify the MetricRepo API since we wouldn't need to handle the deletion of nodes but only events from Pods and Containers.

During the same investigation, I noticed that when a Pod is deleted, it first calls the update function (to add its metrics again) to be deleted few seconds after. I am not sure if this is intended since the status of the pod is Terminating already. Also I noticed that the call to deletePod is executing twice. This might be because there is more than 1 watcher or because the code is shared between multiple metricsets.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 11, 2022
@gsantoro gsantoro added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Aug 11, 2022
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 11, 2022
@gsantoro gsantoro self-assigned this Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

No branches or pull requests

1 participant