-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipfs daemon memory usage grows overtime: killed by OOM after a 10~12 days running #3532
Comments
Same here: ipfs version 0.4.3 after about 10 days memory grows to about 15G despite only a few hundred files pinned. Issue is replicated across 10 servers. Restarting the daemon fixed it but continues to grow and needs to be restarted. UPDATE: Ah, ha! I found the enable garbage collection flag in the documentation, so trying: ipfs daemon --enable-gc |
@jonnycrunch the The memory leakage is coming from somewhere else... Next time the memory gets out of hand can you get me the debug info described here: https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md#beginning Particularly the heap profile, goroutine dump and ipfs binary |
Hi! We are using this ipfs 0.4.4 at Currently it eats 65-76% of memory at 2GB instance, OOM sometimes kills it and it starts again and usage grows during several hours to given value. But looks like this is enough for the daemon to not be killed - may be it uses some smart way to determine how not to be killed :-) While experimenting with memory limits I saw that usage grows to whole available memory but not more (no swap used for IPFS, but other applications may have problems with available memory).
Also I noted that after running disk gc ( I have no idea how go works, so if you need more debug info or this one is unhelpful - please feel free to ask for more details. Also, I have ipfs node run at 512MB Digitalocean instance, and it's managed by supervisord. OOM kills it there pretty fast (several hours), and supervisord starts it, and it dies again, and again, but generally works okay. |
Carla Sella, from the Ubuntu community, reports that using the ipfs v0.4.4, her virtualbox vm starts to get slow after it connects to over 70 peers. Here are her debugging files. |
Maybe it is time for Garbage Collection to be enabled by default? @whyrusleeping @RichardLitt @diasdavid |
@jonnycrunch as @whyrusleeping said, the The core problem is what we call "connection closing", IPFS is currently connecting with almost everyone which in connection with muxer implementation we are currently using takes a lot of memory. We are working on reducing it but it might take a while. The connection closing is much harder problem we initially expected. The |
This is the debugging information I have collected from 1 node that was still running (2 have died): https://ipfs.io/ipfs/QmXnYzZT1EAq9pzi6snd6KHD8kNrBSDuyJqLPe7QHzUE23 It was also using 150% CPU when I checked it and >80% MEM. They are still on |
stack dump from #61 , ipfs package go-ipfs_v0.4.8_linux-amd64.tar.gz |
Hey everyone, ipfs 0.4.11 should have some significant improvements here. The issue is not entirely resolved, but the leak should be mitigated. |
Still leaking memory in 0.4.13 — killed after ~12 hours. |
At the moment, the largest issue is the peerstore. We had a rather nasty bug that will be fixed in the next release (we, uh, kind of didn't forget any address of any peer to which we had ever connected and, worse, advertised these (sometimes ephemeral) addresses to the network..). |
Does that mean that the fix is already in |
Fixed in a dep. PR pending: #4610
…On January 28, 2018 2:29:49 AM PST, "ᴠɪᴄᴛᴏʀ ʙᴊᴇʟᴋʜᴏʟᴍ" ***@***.***> wrote:
@Stebalien
> that will be fixed in the next release
Does that mean that the fix is already in `master` or is work in
progress?
|
I profiled it and it seems like a lot of the CPU waste is surprisingly in AddAddrs in the AddrManager. Reading that code, it seems very hasty and not performance minded. I'll PR something to go-libp2p-peerstore to optimize that with concurrent maps, which should help. |
Unfortunately, the issue is libp2p/go-libp2p-peerstore#26 and the fact that the number of multiaddrs assigned to a peer can grow unchecked*. The peerstore actually works fine with a sane number of addresses. *The previous version of go-ipfs failed to forget observed multiaddrs for peers and, worse, would gossip these observed multiaddrs. That combined with NATs and ephemeral ports lead to a build up of addresses for some peer. The solution to this is really to sign peer address records (should be doing this anyways), enforce a maximum number of addresses, and require that there only be one valid peer address record per peer. |
Yeah, but that code is still unoptimized and in general really rough, even for a small number of addresses. Agreed that there is a bigger reason though as you describe. |
Still leaking memory in 0.4.18, between 0-100kB/sec (averaging at a rate of somewhere around 10kB/sec). |
@maznu are you sure its leaking memory? go is a garbage collected language, which means memory usage will appear to increase until a GC event. after a GC event, memory doesnt necessarily get released back to the OS, but internally the previously allocated memory will get used. How are you measuring this? |
https://golangcode.com/print-the-current-memory-usage/ Using this periodically, you can gather memory usages of several days. With a graph tool like Microsoft Excel, you can check tendency of memory usages. |
Can someone with bad memory usage please grab a memory trace? |
I am experiencing this issue using For me it takes ~2 days for the daemon to exhaust 1GB of memory and get OOM killed. |
@alexkursell I'm only seeing ~30MiB of memory usage on the heap. Unfortunately, I can't seem to download the goroutine stack traces. When you grabbed that memory dump, how much memory was go-ipfs using (at that point in time). |
The biggest problem i'm seeing with memory usage lately isnt that ipfs always uses a lot of memory, its that it randomly spikes to a lot of memory, and go will pretty much never release that memory. To debug this further, I would put a memory limit on the ipfs process (say, 1GB) so that it panics when the memory spikes, and we can then figure out what the problem is. |
@Stebalien. I've grabbed a new set of diagnostics, along with the output of |
I was able to run an ipfs node just fine for a while but it's started taxing my server so much it's impossible to continue using. It would be fine even if it used a gigabyte, but it continues eating more and more memory until the server simply crashes. |
Go is "only" using about 300MiB of heap memory so it looks like memory usage spiked at some point and go never returned the memory. The largest actual memory users appear to be:
|
+1. I just set up a node on an Ubuntu 19.04 vps, and it died after about a day. I'll try the latest master and see if that fixes it. |
@kaysond (and others) when your nodes die due to running out of memory, can you please send us the stack traces? It will help us track down whats causing the memory spikes. |
@whyrusleeping after a few days it looks like it settled out at a solid 1GB RAM. I've attached all the dumps per the debug guide |
It looks like that memory is:
|
@Stebalien thanks. I'll add that to my config and see how much it helps. Is there a plan to implement said "forgetting"? |
@kaysond not yet but it looks like we'll have to do that at some point. I've never seen that show up in a heap trace. You must have connected to ~0.5M (estimated) unique peers over the course of a few days. I've filed an issue (https://github.com/libp2p/go-libp2p-metrics/issues/17) but it's unlikely to be a priority given that most systems connecting to that many peers have quite a bit of memory (unless that was entirely DHT traffic...). That brings up a good point. If you're memory constrained, try running the daemon with |
I set up a node mainly to serve a single website from ipfs, so the less memory it uses the cheaper my VPS can be. I'm skeptical that the site draws that much traffic... so I guess its just the nature of being connected to the swarm? The node isn't exactly a public gateway, so I'm not sure what caused all of the connections. I'll try it with that option and see what happens. |
Probably the DHT. |
Any updates on this? |
Btw, the command to disable bandwith metrics didn't work anymore, the new one is Is it even needed, anymore? |
With the command |
@kaysond Used that command. This + |
The remaining issue is #2848. Closing this one as it's quite old. |
Version information:
go-ipfs version: 0.4.5-dev-
Repo version: 4
System version: arm/linux
Golang version: go1.7
Type: Problem
Priority: P4
Description:
I have some Raspberry Pis 3 running go-ipfs daemon. Right now they don't do anything. The Pis don't handle any IPFS requests or anything. They are just there running the daemons. After about 10 days ipfs is getting killed in all of them because they are taking too much memory.
The daemons are killed around RSS=783192
My longest running daemon (11 days) has RSS=605868
A newly started daemon has RSS=92020
A one day running daemon has RSS= 542408
Questions:
Related: #3318 and the question about running IPFS on platforms with limited resources.
The text was updated successfully, but these errors were encountered: