core/state/snapshot: update generator marker in sync with flushes #21804
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes a data corruption in the snapshot mechanism if it crashes mid-storage.
The generator works by iterating the state trie and pushing the account/storage leaf data directly to disk. To avoid thrashing the database, the generator accumulates writes into a batch and only flushes to disk after a certain threshold is exceeded. When the generator flushes to disk, it's internal progress marker is also updates so that it knows where to continue from if it's interrupted.
The bug was that the marker was not written to disk at the same time as the data batch, rather was kept in memory and only flushed on shutdown. This however means that in case of a crash (or Ctrl+C without waiting for graceful shutdown), if the generator was in progress, then the database would go out of sync, the marker being behind of the data already indexed.
Most of the time this was not noticeable as upon restart, the generator just picked up in it's stale position and reindexed the same data once again. The rare issue however is if the chain progresses before the restart and some of the indexed - but rewound - data slots become deleted. In that case, the generator will not realize that there is junk in the database and will only add the new data, but not delete the stale junk.
The fix is to ensure that every time new snapshot data is pushed to disk, the generator marker is also updated atomically in the same go. From a thrashing perspective this does incur some additional overhead since we need to bump the same database key over and over again on every flush, but it's a price we need to pay.