You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After node is synced a long way (e.g. block 26 million), stop geth and delete state database (but not freezer) using geth removedb subcommand. Start geth again.
Expected behaviour
Within at most 20 minutes or so, geth should pick up where it left off and continue syncing from the top block in the freezer.
Actual behaviour
geth gobbles up all RAM and does not sync even after a long time and lots of swapping.
I narrowed it down to this function parlia.snapshot. It makes sense that Parlia has to trace the evolution of the validator set all the way from the genesis block to see whether the current block is valid.
It looks for the Parlia state in the current block, if cached. If not cached, it looks for the Parlia state of the parent block and then applies any changes from that block. This happens recursively. If there is no Parlia cache at all, it loads all 26 million block headers into RAM, newest first, then reverses the order and applies them. There is also no progress indication while this is happening.
A smarter algorithm could be used here. Executing blocks in the order they happened uses constant memory. Blocks older than 90000 are stored in the freezer so they can't be reorganized; we know the forward and backwards order of those blocks and don't have to traverse the parent links in reverse order. If it would run forwards up to the 90001st newest block and store a snapshot at that point, that would probably be good enough. (If we get a block that forks off a frozen block that isn't the next frozen block then we already know it's wrong and don't have to check any Parlia consensus)
(As a quick hack I made it save the hash of every 10000th block back, then process the chunks of 10000 blocks in forward order, using the existing reverse algorithm within each chunk. Seems to solve this part of the problem, at least. Something else still gets stuck)
The text was updated successfully, but these errors were encountered:
immibis
changed the title
Bootstrapping Parlia gobbles up RAM
Bootstrapping Parlia gobbles up RAM and doesn't show progress
Mar 12, 2023
System information
Geth version:
Version: 1.1.18
Git Commit: d28bcc6
Git Commit Date: 20221202
Architecture: amd64
Go Version: go1.19.4
Operating System: linux [Gentoo]
GOPATH=
GOROOT=
Steps to reproduce the behaviour
After node is synced a long way (e.g. block 26 million), stop geth and delete state database (but not freezer) using
geth removedb
subcommand. Start geth again.Expected behaviour
Within at most 20 minutes or so, geth should pick up where it left off and continue syncing from the top block in the freezer.
Actual behaviour
geth gobbles up all RAM and does not sync even after a long time and lots of swapping.
I narrowed it down to this function
parlia.snapshot
. It makes sense that Parlia has to trace the evolution of the validator set all the way from the genesis block to see whether the current block is valid.It looks for the Parlia state in the current block, if cached. If not cached, it looks for the Parlia state of the parent block and then applies any changes from that block. This happens recursively. If there is no Parlia cache at all, it loads all 26 million block headers into RAM, newest first, then reverses the order and applies them. There is also no progress indication while this is happening.
A smarter algorithm could be used here. Executing blocks in the order they happened uses constant memory. Blocks older than 90000 are stored in the freezer so they can't be reorganized; we know the forward and backwards order of those blocks and don't have to traverse the parent links in reverse order. If it would run forwards up to the 90001st newest block and store a snapshot at that point, that would probably be good enough. (If we get a block that forks off a frozen block that isn't the next frozen block then we already know it's wrong and don't have to check any Parlia consensus)
(As a quick hack I made it save the hash of every 10000th block back, then process the chunks of 10000 blocks in forward order, using the existing reverse algorithm within each chunk. Seems to solve this part of the problem, at least. Something else still gets stuck)
The text was updated successfully, but these errors were encountered: