Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/broken storage after snap sync #7789

Merged
merged 10 commits into from
Nov 25, 2024
Merged

Fix/broken storage after snap sync #7789

merged 10 commits into from
Nov 25, 2024

Conversation

asdacap
Copy link
Contributor

@asdacap asdacap commented Nov 22, 2024

  • During snap sync, when a large storage was updated to a different storage root, a refresh operation would happen and the new storage root would get re-fetched, and the resulting storage tree would be incomplete and require healing. This would get detected during healing as the path to the leaf at state level would be missing and need to be updated. However, if after the refresh, the storage root is updated back to its original value, the path to the leaf in the state tree would be correct as it was already persisted in the beginning and the healing for the storage tree would not get triggered. This seems to be the case with some contract on some network.
  • This PR share the pivot class between snap and state sync in StateSyncPivot, originally was Pivot specific for snap sync. When a storage healing was scheduled, the address to the storage is added to UpdatedStorages in the StateSyncPivot. This usually happens less than 100 times. During healing, once the root was scheduled for persist, the new storage root for each of these contract is checked for existence. If any of the root is missing, the missing storage root is queued for download through standard healing code path and therefore the rest of the missing storage tree will be downloaded.
  • Also fix StateMin/MaxDistanceFromHead did not account for FastSyncLag which cause sync issue on network without CL.
  • Also add a simple code path for Perf/parallelize storage commit #7605 storage fix. Limits the amount of storage that can be parallelized though.

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

  • Added some log when such situation happen, confirmed it happens.

@asdacap asdacap marked this pull request as draft November 22, 2024 08:55
@asdacap asdacap force-pushed the fix/maybe-broken-storage branch from 135022d to e4b6bf2 Compare November 25, 2024 07:23
@asdacap asdacap marked this pull request as ready for review November 25, 2024 07:59
@asdacap asdacap changed the title Fix/maybe broken storage Fix/broken storage after snap sync Nov 25, 2024
Copy link
Contributor

@kamilchodola kamilchodola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho works much better now - much less TrieExceptions but tested a bit too early and still had some but later Ashraf added another fixes.

Have a 20 nodes running on that now but as it takes time to reproduce I'd rather prefer merging that and monitoring "Sync Supported Chains" action to see if it will happen ever again.

@asdacap asdacap merged commit 1a25af3 into master Nov 25, 2024
80 checks passed
@asdacap asdacap deleted the fix/maybe-broken-storage branch November 25, 2024 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants