-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant memory growth with the file backend #6010
Comments
The memory profile you posted shows less than 6MB actual heap usage, can you please look at what the actual RSS memory of the |
@neilalexander the memory within container is certainly less as well:
By the way could it be because of constant subjects creation - as we post messages to For the
so as soon as I'll figure out how to get it, I'll post |
25MB is very much in the range of what we'd expect here. Inside the container, can you please do |
from within container. And yes that screenshot from grafana shows while it is still concerning as |
How about the |
I do not have this metrics unfortunatelly for some reasons, but maybe info from cgroup memory.stat will be useful:
|
So we aren't experiencing a memory leak in the NATS Server process in this case, which is something. I've been reading kubernetes/kubernetes#43916 which is quite the rollercoaster, but the consensus seems to be that Kubernetes is incorrectly considering the page cache when deciding the memory pressure. The "fix" seems to be to set equal values for memory request and memory limit. That should limit the kernel page cache for that cgroup up to the container limit only. Then, for good measure, set |
Thank you, I'll do the new test with |
So just to be clear, in this case, the NATS Server process doesn't restart and maintains a stable RSS? If so, after each "see saw" drop, |
@0xAX are you running nats on a GKE cluster? Which version? |
No, it is not GKE, version:
|
No, it does not restart. And it is ok for me but the thing is that client start to get timeout errors around time of such drops.
There is drop:
after:
|
Could it be because of use case itself? I mean we store 1 key per subject within the stream (created with the |
I don't think you have a usage pattern problem here. In fact I don't think you have a problem here at all. The continuous stream writes result in the creation of new filestore blocks (this is normal operation) and therefore the kernel page cache is growing with in-memory caches of those files. What is supposed to happen is that the operating system will evict caches/buffers when the memory needs to be reclaimed for something else. In the In this case, the host system with 64GB RAM has more than enough memory to satisfy application pressure without having to evict the caches because, even with 28GB of buffers/caches, there's still 14GB that isn't being used for anything, so the buffers/caches will just keep growing to fill the free space. This isn't a bad thing BTW — these caches are really good for improving application read performance. The reason this looks so bad is that Kubernetes is counting the buffers/caches in with the pod's real-world memory usage, which in my opinion it shouldn't do, which means that the container reports memory usage as being much higher than the sum total of the RSS of the processes inside it. The only bit that's particularly relevant to us is the |
@neilalexander thank you. We will do the longer test and if it is not killed by OOM, I am pretty ok with this. I'll close issue for now and will return if there will be new findings. |
Observed behavior
Hello,
We are running 3 replicas NATS cluster in a kubernetes cluster and I am trying to use NATS streams for locking, something very similar is described here - #4803. I create some streams (not so much about 5) with:
sorry for the elixir syntax but I suppose it should be clear as names of stream settings should be very similar to original. After streams are created the multiple instances of application start to send and purge messages from these streams (like
stream_name.proc_name.locks.{LOCK_ID}
). If thestorage
is set tofile
(with memory it is not reproduced) we see constant memory growth although the streams are not overflowed at least it is visible on grafana:The memory consumption is:
The NATS cluster configuration is:
The
GOMEMLIMIT
is set to:Resources are not set in kubernetes exactly for this deployment, but I tried to set and it didn't help. Information about the streams:
Some streams are empty as you may see above, but it is ok. As I wrote above messages are constantly purged (or deleted because of TTL). So streams are clearly not overflowed with data.
The information about a stream (only one here but I suppose it is very similar to all) is:
Also attached memory profile taken as described in the https://docs.nats.io/running-a-nats-service/nats_admin/profiling -
mem.prof.zip. The profiler data was taken from the instance with the highest memory consumption (yellow at the screenshot from grafana).
Expected behavior
As number of messages within stream does not grow - stable memory consumption.
Server and client version
The NATS version is -
nats:2.10.21-alpine
. The client version is the latest version of https://github.com/nats-io/nats.ex.The text was updated successfully, but these errors were encountered: