Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flowlogs pipeline memory consumption large clusters. #641

Open
hihellobolke opened this issue Mar 24, 2024 · 3 comments
Open

flowlogs pipeline memory consumption large clusters. #641

hihellobolke opened this issue Mar 24, 2024 · 3 comments

Comments

@hihellobolke
Copy link

hihellobolke commented Mar 24, 2024

While following https://docs.openshift.com/container-platform/4.12/network_observability/configuring-operator.html

And running flow logs pipeline with k8s enrichment on large clusters ~20k pods, the memory consumtion is huuuge... Now as this was running as daemonset, this kind of ddos' api server.

Would it be better for scaling to allow use some sort of shared k8s enrich cache for all flp... ?

image

Perhaps cache can be smarter like https://redis.io/docs/manual/client-side-caching/

In the end we had to recreate a custom grpc server that used shared cache to achieve network tracing on larger clusters.

Copy link

Congratulations for contributing your first flowlogs-pipeline issue

@hihellobolke hihellobolke changed the title flowlogs pipeline memory consumtion large clusters. flowlogs pipeline memory consumption large clusters. Mar 24, 2024
@jotak
Copy link
Member

jotak commented Mar 25, 2024

Hello @hihellobolke , thanks for reaching out
We are indeed exploring options for a shared cache, however before coming there, we don't recommend the daemonset approach for deploying netobserv on large clusters, as you see this doesn't scale very well. It's recommended to use the Kafka deployment model instead (in FlowCollector, setting spec.deploymentModel to Kafka as documented here). This way, FLP is deployed as a Deployment that you can scale up & down.

Would that work for you?

@jotak
Copy link
Member

jotak commented Mar 25, 2024

Speaking of the proposed solution, I think this could work indeed. We have a PoC that introduces Infinispan as a distributed cache (though in that context it was for another purpose, not for Kube API caching)

Another approach could be to not use k8s informers in FLP, and use k8s watches instead. Because a problem with k8s informers is they cache whole gvks and not just the queried resources, leading to higher memory consumption & traffic bandwidth with kube API. We did this already in the operator, with something we called "narrowcache": netobserv/network-observability-operator#476 , to cut down memory usage.

A downside that would affect both options would be slower processing time on cache misses, as everything would be lazy-loaded in FLP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants