-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leaking goroutines when adding massive amount of files #2823
Comments
I think it is goroutine leak, there are over 9000 goroutines. Goroutines heatlist:
I will report more findings later. |
Most popular full goroutines stacks: |
So from my analysis: everything is bound to this context so it should finish inside 15s, for some reason it doesn't. |
@whyrusleeping could you take a look at it. I am not best at debugging those leaks, yet. |
@Kubuxu sure thing. EDIT: sorry, didnt read the original comment thoroughly enough. |
I was actually just investigating this same exact issue on biham (our storage node) yesterday. I made a few small changes to the dht code which should help a little #2841 The issue here is that providing to the dht is expensive (relatively speaking) and we havent yet optimized the process to be smarter about batching outgoing provides. The above PR isnt going to 'solve' the problem, but it will help reduce the memory usage and number of active goroutines during these scenarios. |
But it looks like those goroutines are stuck (there is over 9000 of them), not only "working", some of them are waiting on channels and reads for tenths of minutes which shouldn't happen. DHT providing is very expensive, @magik6k reported terabytes of transfer while adding 21GB. |
ouch... thats way too much... I'll keep debugging this, I had a PR a while ago that batched together provides requests and saved a huge amount of bandwidth, but the math wasnt quite right and we got distracted on other efforts. I'll work on reviving that |
Let's move discussion about bandwidth to #2828 This is about memory/goroutine leak during adding files. |
We seem to be hitting this situation too, how can I pull a heatlist like in #2823 (comment) ? |
|
alternatively:
|
If you need another one here is one after adding ~300k new files (+1million that it didn't have to publish to the network):
The add process crashed a few times before that with a Looking at it right now, it seems unlikely that the process died because it ran out of memory (negligable growth in 1hour and 30GB memory free), but I know little about the specialties of golang. |
Which version of IPFS are you using, it was a known bug in 0.4.3. |
master from about a day ago |
Ok, thanks, can you also send the whole gorutine dump? |
https://ipfs.io/ipfs/QmdJwjmpWqVYfnBUsxzGv3LgYyefDzhmXrcE4ysihy4ET3 This is a different newer one, but it's still from the same process. |
Wait, that are the goroutines of the server, right? Shouldn't I dump the ones of the |
Is the daemon running after add died? |
Yes, and I was able to start another add after that without problem (within bounds of that bug). |
Is there any message when the ipfs add dies? |
Surprisingly after dying one more time the same day, the add has been running for 14 days straight now, adding ~16million files. |
@hobofan thats an awfully long time... Is it still making progress? |
@whyrusleeping Yes, still at it, I think the original estimate was 17 days, and it currently says it will finish in ~1d15h. |
@hobofan Haha, well thanks for sticking with it. Let me know when it completes, getting a full stack dumb of the daemon after its done will be helpful i think |
Finished now! Full stack dump : https://ipfs.io/ipfs/QmXbzRwyTB1WsWy2z8tb2AScztZGkNmK7LMvPuicie6wux ( |
Also getting this using the files API (0.4.3 64 bit linux official binary):
|
@mattseh I've added a code block around it, can you send us whole stack dump using github gist? |
Running into similar issues, but just barely 15 relatively small files in a Linux AWS 2GB RAM instance. The ipfs-go docker container dies after a few days running:
Here's the traceback:
|
I believe the parts of this issue that haven't been fixed are covered by:
We still leak go routines but should probably have individual issues for each case (and again, the main ones mentioned here have been fixed). |
I'm reopening this because that goroutine dump in #2823 (comment) is gold mine. |
In libp2p, Close is assumed to be threadsafe and we'd like to interrupt in-progress reads/writes. As a matter of fact, we're lucky this hasn't caused close to hang. If we had tried to close the reader before closing the writer, we would have blocked on a concurrent read call. Part of ipfs/kubo#2823
OK, there's only one additional unfixed issue: libp2p/go-msgio#8 Another issue is that |
When adding some 1000 000's of files (~10/s), I noticed that memory usage increases. This is using branch
fix/mfs-caching
(PR #2795).The text was updated successfully, but these errors were encountered: