Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.6.0 beacon_node memory usage issue #5227

Open
SjonHortensius opened this issue Feb 10, 2024 · 8 comments
Open

4.6.0 beacon_node memory usage issue #5227

SjonHortensius opened this issue Feb 10, 2024 · 8 comments

Comments

@SjonHortensius
Copy link
Contributor

Description

I realize 4.6.0 contains #4918 a fix for a previous oom issue (which I never experienced) but ever since I upgraded, I've been getting OOMs with some pretty big numbers (between 20 and 50 GiB used) making my setup highly unstable

Version

latest stable Lighthouse v4.6.0-1be5253

Present Behaviour

I don't think my bn setup includes anything special but fwiw /usr/bin/lighthouse -d /var/lib/lighthouse/beacon beacon_node --validator-monitor-auto --checkpoint-sync-url http://XXX:5052 --staking --port 9000 --http-port 5052 --http-address 0.0.0.0 --execution-endpoint http://127.0.0.1:8551 --execution-jwt /var/lib/lighthouse/beacon/jwtsecret --builder http://localhost:18550 --builder-profit-threshold XXX

Frequent OOMs, roughly 5-10 per day with varying amounts allocated:

Out of memory: Killed process 2045154 (lighthouse) total-vm:49913604kB, anon-rss:7651908kB, file-rss:616kB, shmem-rss:0kB, UID:64470 pgtables:68484kB oom_score_adj:0
Out of memory: Killed process 2289480 (lighthouse) total-vm:38670388kB, anon-rss:7477340kB, file-rss:1576kB, shmem-rss:0kB, UID:64470 pgtables:47296kB oom_score_adj:0
Out of memory: Killed process 2309773 (lighthouse) total-vm:32929348kB, anon-rss:7625508kB, file-rss:524kB, shmem-rss:0kB, UID:64470 pgtables:37356kB oom_score_adj:0
Out of memory: Killed process 2310656 (lighthouse) total-vm:41775684kB, anon-rss:7218900kB, file-rss:2396kB, shmem-rss:0kB, UID:64470 pgtables:47284kB oom_score_adj:0
Out of memory: Killed process 2340820 (lighthouse) total-vm:35665580kB, anon-rss:7158488kB, file-rss:4144kB, shmem-rss:0kB, UID:64470 pgtables:36064kB oom_score_adj:0
Out of memory: Killed process 2345368 (lighthouse) total-vm:20267340kB, anon-rss:7031200kB, file-rss:1488kB, shmem-rss:0kB, UID:64470 pgtables:19156kB oom_score_adj:0
Out of memory: Killed process 2345709 (lighthouse) total-vm:46387020kB, anon-rss:8294852kB, file-rss:0kB, shmem-rss:0kB, UID:64470 pgtables:53688kB oom_score_adj:0
Out of memory: Killed process 2371985 (lighthouse) total-vm:42253384kB, anon-rss:7546744kB, file-rss:2524kB, shmem-rss:0kB, UID:64470 pgtables:56480kB oom_score_adj:0
Out of memory: Killed process 2414108 (lighthouse) total-vm:19549408kB, anon-rss:7417528kB, file-rss:2304kB, shmem-rss:0kB, UID:64470 pgtables:18764kB oom_score_adj:0
Out of memory: Killed process 2414413 (lighthouse) total-vm:35328152kB, anon-rss:7206648kB, file-rss:1300kB, shmem-rss:0kB, UID:64470 pgtables:43248kB oom_score_adj:0
Out of memory: Killed process 2426583 (lighthouse) total-vm:18473132kB, anon-rss:7055376kB, file-rss:0kB, shmem-rss:0kB, UID:64470 pgtables:18672kB oom_score_adj:0
Out of memory: Killed process 2426890 (lighthouse) total-vm:40653920kB, anon-rss:7911312kB, file-rss:2952kB, shmem-rss:0kB, UID:64470 pgtables:52820kB oom_score_adj:0
Out of memory: Killed process 2459040 (lighthouse) total-vm:38758536kB, anon-rss:7408472kB, file-rss:1948kB, shmem-rss:0kB, UID:64470 pgtables:48676kB oom_score_adj:0
Out of memory: Killed process 2487203 (lighthouse) total-vm:26128836kB, anon-rss:7450632kB, file-rss:632kB, shmem-rss:0kB, UID:64470 pgtables:20984kB oom_score_adj:0
Out of memory: Killed process 2487581 (lighthouse) total-vm:22814708kB, anon-rss:7217240kB, file-rss:1772kB, shmem-rss:0kB, UID:64470 pgtables:22144kB oom_score_adj:0
Out of memory: Killed process 2487874 (lighthouse) total-vm:21608876kB, anon-rss:7101656kB, file-rss:1976kB, shmem-rss:0kB, UID:64470 pgtables:21812kB oom_score_adj:0
Out of memory: Killed process 2488170 (lighthouse) total-vm:37179496kB, anon-rss:7037800kB, file-rss:2448kB, shmem-rss:0kB, UID:64470 pgtables:34700kB oom_score_adj:0
Out of memory: Killed process 2489135 (lighthouse) total-vm:50698824kB, anon-rss:7391844kB, file-rss:3028kB, shmem-rss:0kB, UID:64470 pgtables:62416kB oom_score_adj:0

Steps to resolve

Please describe the steps required to resolve this issue, if known.

@michaelsproul
Copy link
Member

If you have debug logs from this machine during the OOM (check $datadir/beacon/logs) please DM them to me on Discord (@sproul) or email them to me ($surname@sigmaprime.io)

@michaelsproul
Copy link
Member

It may be that the message dequeueing isn't happening fast enough, so #5175 will help.

@michaelsproul
Copy link
Member

@SjonHortensius I've just noticed that the RSS for all of these crashes is in the 7GB range. You can ignore the higher total-vm number, that's not relevant.

I think this is probably still a bug on the Lighthouse side, we're looking into it. Logs would be great.

@SjonHortensius
Copy link
Contributor Author

@michaelsproul you're right wrt the memory usage, I misinterpreted those.

I have relevant logs - but I am unwilling to publish them unscrubbed. I'll send some parts through mail

@luarx
Copy link

luarx commented Mar 8, 2024

Execution layer: Erigon
Network: Mainnet

Lighthouse params:

"--debug-level=info",
"--datadir=/beacondata",
"--network=mainnet",
"beacon_node",
"--disable-enr-auto-update",
"--enr-address=127.0.0.1",
"--enr-tcp-port=9000",
"--enr-udp-port=9000",
"--port=9000" ,
"--discovery-port=9000",
"--eth1",
"--http",
"--http-address=0.0.0.0",
"--http-port=5052",
"--metrics",
"--metrics-address=0.0.0.0",
"--metrics-port=5054",
"--listen-address=0.0.0.0",
"--target-peers=100",
"--http-allow-sync-stalled",
"--disable-packet-filter",
"--execution-endpoint=http://localhost:9545",
"--jwt-secrets=/tmp/jwtsecret",
"--disable-deposit-contract-sync",
"--checkpoint-sync-url=https://beaconstate-mainnet.chainsafe.io"

Adding here info from my personal case. Apart from memory spikes, I also see CPU ones (maybe they are related):

  • Since we upgraded to v4.6.0, cpu/memory spikes increased a lot
    image

  • We upgraded to v5.0.0 yesterday but spikes are still around (I pointed in the image when the upgrade was done)
    image

I think that the memory/cpu issue is not already fixed
cc. @michaelsproul @AgeManning

@pawanjay176
Copy link
Member

@luarx could you share some of your debug logs? feel free to ping me over discord

@luarx
Copy link

luarx commented Mar 8, 2024

@pawanjay176 discord user?

@pawanjay176
Copy link
Member

I'm pawan#7432. Should be able to find me on the SigmaPrime role on our discord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants