-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Goroutine leak leads to OOM #6166
Comments
I have more details to share: After 2 minutes of uptime I see: I count unique and times this way:
Please notice that I have |
@requilence could create a dump as described here: https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md#beginning? We have a tool called stackparse for exactly this. |
@Stebalien thanks. It was challenging to capture all of them before OOM as it becomes worse and eats 3GB in 1 min :-) 0.4.20@74d07eff35965a3f635d03aedaa43561c73679e2: I have also added |
Could you post your config, minus your private keys? It looks like you're running a relay which would explain all the peers. Note: the connection manager tries to keep the number of connections within the target range but it doesn't stop new connections from being created. That's what's killing your CPU (creating/removing connections). We definitely need better back-pressure, it looks like this is a bit of a runaway process. |
@Stebalien
actually the main problem that it eats 3GB of RAM, while heap only showing about 500MB. As I know goroutine is pretty cheap(2KB of memory) and 200k goroutines should eat around 390MB. Where it could come from? |
It could be allocation velocity (#5530). Basically, we're allocating and deallocating really fast so go reserves a bunch of memory it thinks it might need. That's my best guess. |
It was intentionally. So I guess after introducing |
Likely, yes. Basically, this is a combination of two issues:
Ideally, the connection manager and relay would actually talk to eachother and the relay would stop accepting new connections at some point... (libp2p/go-libp2p-circuit#65). |
@requilence has disabling relay helped? |
If you want to enable relay hop you will need to set limits in the connection manager. |
See also libp2p/go-libp2p-circuit#69 |
@Stebalien disabling relay doesn't help. Probably because I have already advertised my peer as a relay through DHT and it needs some time to expire |
@vyzo sounds cool, I will try to use this patch on leaking setup and come back here with results |
We have identified the goroutine buildup culprit as identify. There is a series of patches that should fix the issues: |
@requilence could you try the latest master? |
I think I'm hitting an issue similar to this where at some point connection counts start climbing rapidly past the default Should I create a separate issue? Or would it be worth upload the debug logs (e.g., heap dump, stacks, config, |
@leerspace please file a new issue. Also, try disabling the DHT with |
I'm going to close this issue as "solved" for now. If that's not the case, please yell and I'll reopen it. |
Version information:
tried both 0.4.19 and latest master:
Type:
bug
Description:
I created the fresh repo this morning. It was working good for some time but now every time I run
ipfs daemon
I have a huge goroutine leak that leads to the OOM in a few minutes. I setHighWater = 60, LowWater = 30
to make sure it doesn't depend on swarm sizehttps://gist.github.com/requilence/8f81663a95bec7a4083e2600ff24aeda
I had the same problem a few days ago(recreated the repo after)
It is a really huge list to manually check one by one. Maybe someone has an idea where it could come from?
The text was updated successfully, but these errors were encountered: