-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lotus Node consumes all available memory and won't release it #4714
Comments
Have you tried #4619 |
@jennijuju no because it was only merged to master 11 hours ago and hasn't hit a release yet but this sounds very encouraging! thanks for the heads-up. |
ah ofc, if you get a chance to try it out and let us know how it goes that'd be great! Feel free to wait for it to be tagged too! |
@peterVG some people has tested the pr and confirms that it was working, so I will close this ticket for now. It will be included in a coming release soon, if you still face issue then please reopen this ticket. |
Better late than never? Forgot to come back to this issue and report that yes, this issue, is resolved on my server as well. Congratulations! This is a major production blocker that is now resolved. Thanks. |
Describe the bug
Lotus has been causing fatal Out Of Memory crashes for many Powergate users.
As an example: between 28 October and 31 October I uploaded 55 1GiB CIDs, 5 or 10 CIDs at a time. My Lotus Node runs as a Docker container as part of a Powergate deployment on AWS which has 8vCPU / 60GB RAM / 120GB Swap.
At Nov 2, 19:00 UTC, I had 192 final deals and 195 pending deals for these 55 CIDs. From October 28 onward, RAM and Swap usage spiked and never went down. RAM continued to max out at 59GB. Swap hovered at 60GB over 3 days and then actually went up to 80 GB on the fourth day. All without addding any new CIDs.
To see if a restart would help, I rebooted all Powergate containers, including Lotus Node. At 6 hours and 19 hours after the Docker container restarts memory was at half the usage it was previously and no Swap was getting used. So that had helped.
However, I then uploaded 10 new 1 GiB files and three hours later I'm seeing maxed out memory again:
From previous experience I know that uploading more files at this point will lead to a OOM crash of the Lotus node with a likelihood of permanent database corruption.
To Reproduce
Expected behavior
Lotus releases memory as it finishes tasks and waits for new jobs.
Screenshots
See
htop
graphs above.Version (run
lotus version
):lotus version 1.1.2*git.d4cdc6d (shipped with Powergate as a Docker container)
Additional context
I have only been able to upload 65 GiBs of data to Filecoin in just under a week. A serious scalability problem IMHO.
The text was updated successfully, but these errors were encountered: