Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include the genesis compact block for penumbra-1 in the static assets of the extension #161

Closed
2 tasks done
hdevalence opened this issue Aug 16, 2024 · 1 comment
Closed
2 tasks done
Assignees
Labels
performance priority Important to work on next

Comments

@hdevalence
Copy link
Contributor

hdevalence commented Aug 16, 2024

Is your feature request related to a problem? Please describe.

Right now, we see a lot of user reports of being stuck syncing, with no progress at all. We don't see reports of users being stuck in the middle. This suggests there is some genesis-specific issue with syncing: if getting stuck could happen anywhere, we would expect to see reports of syncing getting stuck in the middle, but we only hear about it in the context of syncing the genesis CompactBlock.

This also fits with the structure of block sync, which is designed to be incremental and resumable, but only at the block level (i.e., blocks are scanned and applied one at a time, with a restart from the last synced height if an error is encountered processing the current block). The genesis compact block is significantly larger than all the others (approximately 10MB). The genesis compact block is not handled specially. Instead, it's just the first element in the stream of compact blocks to be processed.

If something was causing the processing of the genesis compact block to fail, this would be consistent with the observed user reports, as a client could be stuck in a loop attempting to process the genesis compact block in one go and continually being interrupted.

How could this get interrupted? There are two possibilities:

  1. There could be an interruption fetching the genesis compactblock from the network, which would cause a retry. This also increases RPC load, so it can potentially cause cascading problems for other users. As this is a network call, there are many external factors that could cause an interruption.
  2. There could be an interruption while scanning the compact block, which would also cause a retry, both of the download and of the scanning. I'm not clear on the exact conditions that could cause this (can Chrome kill the service worker while it's working?) but is otherwise “internal” and seems less likely to occur than a network interruption.

We can avoid (1) by bundling the compact block with the other extension assets. This has fairly minimal effects on the surrounding code, whereas changing (2) would require restructuring all of the block scanning to have incremental intra-block scanning, which is both a much larger lift and also generally unhelpful (since most blocks are small or empty). This means we should try (1) first and see whether it helps resolve the issue.

Describe the solution you'd like

  • Add the genesis compact block for penumbra-1 to the extension assets
  • Check whether the chain ID to be synced is penumbra-1, and if so, process the 0-th CompactBlock using the local asset before beginning syncing from height 1.
@github-project-automation github-project-automation bot moved this to 🗄️ Backlog in Labs web Aug 16, 2024
@grod220 grod220 moved this from 🗄️ Backlog to 📝 Todo in Labs web Aug 16, 2024
@TalDerei TalDerei self-assigned this Aug 16, 2024
@grod220
Copy link
Contributor

grod220 commented Aug 16, 2024

Some additional context from external discussions:

When changing RPCs, Prax will first fetch the chain id and then look to see if there is already a local database for this chain. If so, it pulls the sync status and resumes where it left off. If it's a fresh chain id it hasn't encountered before, it'll start fetching from genesis.

The genesis state seems to be ~10mbs. So this will be the heaviest network request. If that request fails, it will retry (with exponential backoff if it keeps failing).

After successfully getting that block, it then has to decrypt all of that state. In my case, with a fast internet connection, downloading the block is fast, but decrypting is quite CPU intensive and also thread blocking (extension can sometimes go unresponsive). We have work scoped out to 1) move syncing off the main thread and 2) make use of web workers to parallelize decryption (versus doing it one at a time).

Your extension does not need to be in the foreground as the service worker runtime is in the background and governed by the chrome browser (though will eventually be put to sleep if not interacted with).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance priority Important to work on next
Projects
No open projects
Archived in project
Development

No branches or pull requests

3 participants