Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(resharding): implemented error handling and test for it #10179

Merged
merged 4 commits into from
Nov 15, 2023

Conversation

wacban
Copy link
Contributor

@wacban wacban commented Nov 15, 2023

  • added a new ReshardingStatus - Failed - and use it when resharding fails
  • made a number of delays and timeouts configurable so that it's testable
  • added a new nayduck test for checking the error handling
  • added a bunch of new debug, warn and error logs

https://nayduck.near.org/#/run/3270

@wacban wacban force-pushed the waclaw-resharding-nayduck-error branch from df1715f to ea77618 Compare November 15, 2023 13:13
@wacban wacban marked this pull request as ready for review November 15, 2023 13:30
@wacban wacban requested a review from a team as a code owner November 15, 2023 13:30
})

let state_snapshot = tries.get_state_snapshot(prev_prev_hash);
if let Err(err) = state_snapshot {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the old code too convoluted to understand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, I just wanted to add that debug line in the middle of it

@@ -1041,7 +1041,7 @@ impl StateSync {
shard_id,
state_split_scheduler,
)?;
tracing::debug!(target: "sync", %shard_id, %sync_hash, ?me, "State sync split scheduled");
tracing::debug!(target: "sync", %shard_id, %sync_hash, ?me, "resharding scheduled");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah, how did we miss this in the past?

let flat_storage_manager = FlatStorageManager::new(store.clone());

let mut store_update = store.store_update();
// TODO(resharding) there must be a better way to get the shard uids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highly depends on what shard version we are trying to corrupt right? Could we pass in the shard_version as input here maybe and use that to get the shard_uids?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad workaround at all, though I prefer to keep the cmdline as user friendly as possible.
It's not a top priority right now so I'll leave that in and we can clean up after all else is done.

@wacban
Copy link
Contributor Author

wacban commented Nov 15, 2023

https://nayduck.near.org/#/run/3273
fingers crossed

@wacban wacban added this pull request to the merge queue Nov 15, 2023
Merged via the queue into master with commit d782f03 Nov 15, 2023
15 of 16 checks passed
@wacban wacban deleted the waclaw-resharding-nayduck-error branch November 15, 2023 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants