Bootstrapping Parlia gobbles up RAM and doesn't show progress #1345

immibis · 2023-03-12T03:43:31Z

System information

Geth version:
Version: 1.1.18
Git Commit: d28bcc6
Git Commit Date: 20221202
Architecture: amd64
Go Version: go1.19.4
Operating System: linux [Gentoo]
GOPATH=
GOROOT=

Steps to reproduce the behaviour

After node is synced a long way (e.g. block 26 million), stop geth and delete state database (but not freezer) using geth removedb subcommand. Start geth again.

Expected behaviour

Within at most 20 minutes or so, geth should pick up where it left off and continue syncing from the top block in the freezer.

Actual behaviour

geth gobbles up all RAM and does not sync even after a long time and lots of swapping.

I narrowed it down to this function parlia.snapshot. It makes sense that Parlia has to trace the evolution of the validator set all the way from the genesis block to see whether the current block is valid.

It looks for the Parlia state in the current block, if cached. If not cached, it looks for the Parlia state of the parent block and then applies any changes from that block. This happens recursively. If there is no Parlia cache at all, it loads all 26 million block headers into RAM, newest first, then reverses the order and applies them. There is also no progress indication while this is happening.

A smarter algorithm could be used here. Executing blocks in the order they happened uses constant memory. Blocks older than 90000 are stored in the freezer so they can't be reorganized; we know the forward and backwards order of those blocks and don't have to traverse the parent links in reverse order. If it would run forwards up to the 90001st newest block and store a snapshot at that point, that would probably be good enough. (If we get a block that forks off a frozen block that isn't the next frozen block then we already know it's wrong and don't have to check any Parlia consensus)

(As a quick hack I made it save the hash of every 10000th block back, then process the chunks of 10000 blocks in forward order, using the existing reverse algorithm within each chunk. Seems to solve this part of the problem, at least. Something else still gets stuck)

The text was updated successfully, but these errors were encountered:

brilliant-lx · 2023-03-22T09:23:03Z

confirmed, it is a bug, will fix it according to your suggestion.

immibis changed the title ~~Bootstrapping Parlia gobbles up RAM~~ Bootstrapping Parlia gobbles up RAM and doesn't show progress Mar 12, 2023

brilliant-lx self-assigned this Mar 22, 2023

brilliant-lx added the bug Something isn't working label Mar 22, 2023

brilliant-lx mentioned this issue Mar 22, 2023

fix: parlia.snapshot() run into trouble after removedb #1381

Merged

immibis mentioned this issue Jul 15, 2023

unable to sync from freezer - parlia consensus failure #1769

Closed

zzzckck closed this as completed Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bootstrapping Parlia gobbles up RAM and doesn't show progress #1345

Bootstrapping Parlia gobbles up RAM and doesn't show progress #1345

immibis commented Mar 12, 2023 •

edited

Loading

brilliant-lx commented Mar 22, 2023

Bootstrapping Parlia gobbles up RAM and doesn't show progress #1345

Bootstrapping Parlia gobbles up RAM and doesn't show progress #1345

Comments

immibis commented Mar 12, 2023 • edited Loading

System information

Steps to reproduce the behaviour

Expected behaviour

Actual behaviour

brilliant-lx commented Mar 22, 2023

immibis commented Mar 12, 2023 •

edited

Loading