-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v5.2.0 built from source runs out of memory #5970
Comments
Compiling with |
Thanks for raising this! We can take a look into this. Could you try removing or increasing the value for I haven't tried testing using a low state cache size since v5.2.0, but with the introduction of |
Ah yes, the low value for |
Do you have metrics tracking the memory usage of this instance? Would be helpful to see if it's a linear or spiky increase and at what rate. Also please send us debug logs if possible to dive deeper |
I wonder if |
I couldn't repro the OOM with LH v5.2.0 compiled from source with release profile. I ran for about 2 hours and mem didn't bump above 5GB. I didn't run under Docker though. |
I'm trying a long-range sync now, as one of the other OOM reports we had was a node syncing a month of history. |
No luck with the long-range sync either. Mem usage didn't bump above 3GB |
We have another report by @rz4884 on Discord that's facing the same issue, and the user confirmed that it is because he doesn't include @michaelsproul mentions that the Dockerfile appears to not have enabled |
The original patch was missing balance updates and the default features/profiles led to increase memory usage (sigp/lighthouse#5970)
Closed by #5995 |
It is happening again using 5.3.0 when building from source. The image is built from commit d6ba8c3 using
|
@alecalve does the resulting binary show jemalloc as the allocator in |
It does:
|
What's the memory usage getting to now on 5.3.0 when the OOM occurs? It must be something other than the lack of jemalloc. Things to check:
|
The node has finished reconstructing states, we do see a lot of
|
This sounds like the issue. The state cache referred to in this log is for the unfinalized portion of the chain. It shouldn't frequently miss with the default Can you post the output of Can you also provide some info on what's making state queries? How many requests per second, are they made concurrently, etc? |
"split": {
"slot": "9760096",
"state_root": "0x19325b996b812c1c1d11728a0481f1c333c224e0fa6b10ebd9aefeddb34d9f44",
"block_root": "0x52ea319a5ff08c1ca9914952690dff649c59808028cee0e450c50274faad04dc"
} But the node is way beyond that slot:
Full output: {
"schema_version": 21,
"config": {
"slots_per_restore_point": 256,
"slots_per_restore_point_set_explicitly": true,
"block_cache_size": 5,
"state_cache_size": 128,
"historic_state_cache_size": 1,
"compact_on_init": false,
"compact_on_prune": true,
"prune_payloads": true,
"prune_blobs": false,
"epochs_per_blob_prune": 1,
"blob_prune_margin_epochs": 0
},
"split": {
"slot": "9760096",
"state_root": "0x19325b996b812c1c1d11728a0481f1c333c224e0fa6b10ebd9aefeddb34d9f44",
"block_root": "0x52ea319a5ff08c1ca9914952690dff649c59808028cee0e450c50274faad04dc"
},
"anchor": null,
"blob_info": {
"oldest_blob_slot": "9483873",
"blobs_db": true
}
} |
RPC wise, the only users are some L2 nodes, I don't have an insight into how frequently they query the node. |
This is the issue. The state migration must be failing. Do you see an error log like:
There was an old issue prior to v4.6.0 that could cause DB corruption similar to this: But I'm guessing that seeing as it happened in |
Description
We build a Docker image from the Lighthouse v5.2.0
Dockerfile
with very minor changes:curl
inside the image directlyThe binary itself is built the same way as in this repo's Dockerfile, using
make
and the default values forFEATURES
,PROFILE
, etc..The node is run using these arguments:
Version
The Docker image is built from
v5.2.0
Present Behaviour
Once started, the application runs then finally runs out of memory after hitting the limit of 64GB we assigned to it.
This did not happen with the same modifications applied to previous versions of Lighthouse. The last one we had tested this with was
v5.0.0
.Using the image you provide (
sigp/lighthouse:v5.2.0
) with the same arguments on the same datadir results in a reasonable, stable, memory usage.Expected Behaviour
The application should have a stable memory footprint and not run out of memory.
The text was updated successfully, but these errors were encountered: