Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: more precise flat storage errors #8740

Merged
merged 6 commits into from
Mar 16, 2023

Conversation

Longarithm
Copy link
Member

@Longarithm Longarithm commented Mar 15, 2023

Making flat storage errors more clear and fix error types in several places. Fixes flat storage failure on March 14.
This PR needs a review with extra caution, because StorageInternalError in incorrect place may lead to node panic and BlockNotSupported when DB is corrupted may lead to weird behaviour of node.

Here I claim that there are two types of FlatStorageError:

  • BlockNotSupported - can occur in normal circumstances, when flat storage head was moved forward during processing some chunk.
  • StorageInternalError - means that flat storage is corrupted, we have to panic.

I looked at every single place where we return these errors, and replaced some of them:

  • get_block_changes - once it is called, we already know that delta exists in cache, otherwise it is state inconsistency.
  • update_flat_head, store_helper::get_delta_changes - it is more sane to return error, after which we panic later, on apply_transactions_with_optional_storage_proof.
  • update_flat_head, gc_height - delta exists in cache, otherwise we have inconsistency.

Trie::get_ref is fixed to properly propagate flat storage error here.

Testing

  • flat_storage_errors - check for StorageInternalError as well.
  • test_not_supported_block - check that instead of panic, Trie::get_ref returns correct error now.

@Longarithm Longarithm changed the title draft: more precise flat storage errors fix: more precise flat storage errors Mar 15, 2023
@Longarithm Longarithm marked this pull request as ready for review March 15, 2023 21:22
@Longarithm Longarithm requested a review from a team as a code owner March 15, 2023 21:22
Copy link
Contributor

@jakmeier jakmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes make sense to me and I am very happy to see a test added for this scenario.
I'll approve from my side but maybe let's give other reviewers time to take a look as well.

I also left a few comments regarding error semantics. I think the root cause why we had these errors mixed is because we are returning errors in places where we don't know why this is an error, if it is an error at all. And perhaps also because of unclear naming. Fixing that could be valuable.

But you are more familiar with the code. I'll let you decide whether you want to change this or not.

@@ -68,7 +68,7 @@ impl FlatStorageInner {
// and read single `ValueRef` from delta if it is not cached.
self.deltas
.get(block_hash)
.ok_or_else(|| self.create_block_not_supported_error(block_hash))
.ok_or(FlatStorageError::StorageInternalError)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn't return an Option and convert to an error on the caller side. Either that, or make it clear in the comment on fn get_block_changes what the condition is for the block_hash parameter.

My reasoning is that this function locally can't decide whether flat state is in an inconsistent state or if the caller used a block hash outside the supported range. It only knows whether it has the deltas or not. The caller must decide whether this is a problem or not.

The same line of thinking could be applied to storage_helper::* functions, such as get_delta_changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a private function and the expectation is that is called with a proper block hash, so I prefer adding comment here to returning Option

Comment on lines +30 to +31
/// Should not result in node panic, because flat head can move during processing
/// of some chunk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, is my understanding correct that we should only use this error in this specific case where flat head moved while we were processing a fork? Because BlockNotSupported is a very generic name that can easily be used in other contexts, if someone is not aware of this and just looks at the name.

Would it perhaps make sense to rename the error to describe the situation more precisely? Such that nobody will accidentally use this error in other contexts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From FlatStorage API perspective we don't know if the block is not longer supported because flat head has moved or if it is just a wrong block which was never supported. So I think BlockNotSupported is pretty precise.

core/store/src/trie/mod.rs Outdated Show resolved Hide resolved
Comment on lines +541 to +544
// The first result should be FlatStorageError, because we can't read from first chunk view anymore.
// But the node must not panic as this is normal behaviour.
// Ideally it should be tested on chain level, but there is no easy way to
// postpone applying chunks reliably.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very helpful comment, thanks for that. And for adding the test!

.get_trie_for_shard(shard_uid.shard_id(), &block_hash, state_root, true)
.unwrap();
if height == START_HEIGHT - 3 {
env.produce_block(0, START_HEIGHT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it is generally nice to avoid side effects in .map, so I would rewrite this as a loop
doesn't matter that much in tests, so feel free to keep it as it is

@nikurt nikurt removed their request for review March 16, 2023 09:10
@@ -90,7 +90,9 @@ pub enum StorageError {
/// panic in every place that produces this error.
/// We can check if db is corrupted by verifying everything in the state trie.
StorageInconsistentState(String),
/// Error from flat storage
/// Flat storage error, meaning that it doesn't support some block anymore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is a bit confusing and doesn't correspond to the name of the error.
I suggest renaming the error to FlatStorageBlockNotSupported

Copy link
Contributor

@pugachAG pugachAG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some minor comments, otherwise LGTM!

Looogarithm and others added 3 commits March 16, 2023 15:17
Co-authored-by: Jakob Meier <mail@jakobmeier.ch>
@Longarithm
Copy link
Member Author

Thank you for the quick review, it was very helpful!

@Longarithm Longarithm self-assigned this Mar 16, 2023
@pugachAG pugachAG added the A-storage Area: storage and databases label Mar 16, 2023
@near-bulldozer near-bulldozer bot merged commit 289c957 into near:master Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Area: storage and databases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants