flowlogs pipeline memory consumption large clusters. #641

hihellobolke · 2024-03-24T00:47:33Z

While following https://docs.openshift.com/container-platform/4.12/network_observability/configuring-operator.html

And running flow logs pipeline with k8s enrichment on large clusters ~20k pods, the memory consumtion is huuuge... Now as this was running as daemonset, this kind of ddos' api server.

Would it be better for scaling to allow use some sort of shared k8s enrich cache for all flp... ?

Perhaps cache can be smarter like https://redis.io/docs/manual/client-side-caching/

In the end we had to recreate a custom grpc server that used shared cache to achieve network tracing on larger clusters.

github-actions · 2024-03-24T00:48:06Z

Congratulations for contributing your first flowlogs-pipeline issue

jotak · 2024-03-25T08:10:50Z

Hello @hihellobolke , thanks for reaching out
We are indeed exploring options for a shared cache, however before coming there, we don't recommend the daemonset approach for deploying netobserv on large clusters, as you see this doesn't scale very well. It's recommended to use the Kafka deployment model instead (in FlowCollector, setting spec.deploymentModel to Kafka as documented here). This way, FLP is deployed as a Deployment that you can scale up & down.

Would that work for you?

jotak · 2024-03-25T08:25:19Z

Speaking of the proposed solution, I think this could work indeed. We have a PoC that introduces Infinispan as a distributed cache (though in that context it was for another purpose, not for Kube API caching)

Another approach could be to not use k8s informers in FLP, and use k8s watches instead. Because a problem with k8s informers is they cache whole gvks and not just the queried resources, leading to higher memory consumption & traffic bandwidth with kube API. We did this already in the operator, with something we called "narrowcache": netobserv/network-observability-operator#476 , to cut down memory usage.

A downside that would affect both options would be slower processing time on cache misses, as everything would be lazy-loaded in FLP.

hihellobolke changed the title ~~flowlogs pipeline memory consumtion large clusters.~~ flowlogs pipeline memory consumption large clusters. Mar 24, 2024

jotak mentioned this issue Jun 24, 2024

NETOBSERV-1248: shared k8s cache #681

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flowlogs pipeline memory consumption large clusters. #641

flowlogs pipeline memory consumption large clusters. #641

hihellobolke commented Mar 24, 2024 •

edited

Loading

github-actions bot commented Mar 24, 2024

jotak commented Mar 25, 2024 •

edited

Loading

jotak commented Mar 25, 2024

flowlogs pipeline memory consumption large clusters. #641

flowlogs pipeline memory consumption large clusters. #641

Comments

hihellobolke commented Mar 24, 2024 • edited Loading

github-actions bot commented Mar 24, 2024

jotak commented Mar 25, 2024 • edited Loading

jotak commented Mar 25, 2024

hihellobolke commented Mar 24, 2024 •

edited

Loading

jotak commented Mar 25, 2024 •

edited

Loading