-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: more precise flat storage errors #8740
Conversation
012d6ff
to
79d17c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes make sense to me and I am very happy to see a test added for this scenario.
I'll approve from my side but maybe let's give other reviewers time to take a look as well.
I also left a few comments regarding error semantics. I think the root cause why we had these errors mixed is because we are returning errors in places where we don't know why this is an error, if it is an error at all. And perhaps also because of unclear naming. Fixing that could be valuable.
But you are more familiar with the code. I'll let you decide whether you want to change this or not.
@@ -68,7 +68,7 @@ impl FlatStorageInner { | |||
// and read single `ValueRef` from delta if it is not cached. | |||
self.deltas | |||
.get(block_hash) | |||
.ok_or_else(|| self.create_block_not_supported_error(block_hash)) | |||
.ok_or(FlatStorageError::StorageInternalError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we shouldn't return an Option
and convert to an error on the caller side. Either that, or make it clear in the comment on fn get_block_changes
what the condition is for the block_hash
parameter.
My reasoning is that this function locally can't decide whether flat state is in an inconsistent state or if the caller used a block hash outside the supported range. It only knows whether it has the deltas or not. The caller must decide whether this is a problem or not.
The same line of thinking could be applied to storage_helper::*
functions, such as get_delta_changes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a private function and the expectation is that is called with a proper block hash, so I prefer adding comment here to returning Option
/// Should not result in node panic, because flat head can move during processing | ||
/// of some chunk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, is my understanding correct that we should only use this error in this specific case where flat head moved while we were processing a fork? Because BlockNotSupported
is a very generic name that can easily be used in other contexts, if someone is not aware of this and just looks at the name.
Would it perhaps make sense to rename the error to describe the situation more precisely? Such that nobody will accidentally use this error in other contexts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From FlatStorage
API perspective we don't know if the block is not longer supported because flat head has moved or if it is just a wrong block which was never supported. So I think BlockNotSupported
is pretty precise.
// The first result should be FlatStorageError, because we can't read from first chunk view anymore. | ||
// But the node must not panic as this is normal behaviour. | ||
// Ideally it should be tested on chain level, but there is no easy way to | ||
// postpone applying chunks reliably. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very helpful comment, thanks for that. And for adding the test!
.get_trie_for_shard(shard_uid.shard_id(), &block_hash, state_root, true) | ||
.unwrap(); | ||
if height == START_HEIGHT - 3 { | ||
env.produce_block(0, START_HEIGHT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it is generally nice to avoid side effects in .map
, so I would rewrite this as a loop
doesn't matter that much in tests, so feel free to keep it as it is
@@ -90,7 +90,9 @@ pub enum StorageError { | |||
/// panic in every place that produces this error. | |||
/// We can check if db is corrupted by verifying everything in the state trie. | |||
StorageInconsistentState(String), | |||
/// Error from flat storage | |||
/// Flat storage error, meaning that it doesn't support some block anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is a bit confusing and doesn't correspond to the name of the error.
I suggest renaming the error to FlatStorageBlockNotSupported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some minor comments, otherwise LGTM!
Co-authored-by: Jakob Meier <mail@jakobmeier.ch>
Thank you for the quick review, it was very helpful! |
Making flat storage errors more clear and fix error types in several places. Fixes flat storage failure on March 14.
This PR needs a review with extra caution, because
StorageInternalError
in incorrect place may lead to node panic andBlockNotSupported
when DB is corrupted may lead to weird behaviour of node.Here I claim that there are two types of
FlatStorageError
:BlockNotSupported
- can occur in normal circumstances, when flat storage head was moved forward during processing some chunk.StorageInternalError
- means that flat storage is corrupted, we have to panic.I looked at every single place where we return these errors, and replaced some of them:
get_block_changes
- once it is called, we already know that delta exists in cache, otherwise it is state inconsistency.update_flat_head
,store_helper::get_delta_changes
- it is more sane to return error, after which we panic later, onapply_transactions_with_optional_storage_proof
.update_flat_head
,gc_height
- delta exists in cache, otherwise we have inconsistency.Trie::get_ref
is fixed to properly propagate flat storage error here.Testing
flat_storage_errors
- check forStorageInternalError
as well.test_not_supported_block
- check that instead of panic,Trie::get_ref
returns correct error now.