-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(test_loop) - drop chunk endorsements instead of partial chunks to simulate missing chunks #12253
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #12253 +/- ##
==========================================
+ Coverage 71.62% 71.66% +0.03%
==========================================
Files 837 838 +1
Lines 167105 167441 +336
Branches 167105 167441 +336
==========================================
+ Hits 119696 119989 +293
- Misses 42180 42221 +41
- Partials 5229 5231 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
9538748
to
7f169d5
Compare
This is surprising. I'm also having issues with current implementation, but I don't understand why it causes issues with block processing. I want to reproduce it first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for looking into this!
The original issue was that the same account is both block producer and chunk producer, it bypasses the network when including its own chunk into its block, and then no one can process this block.
Because chunk endorsements seem to be the new norm, the new method looks better.
/// `drop_chunks_condition` result. | ||
pub fn partial_encoded_chunks_dropper( | ||
pub fn missing_chunks_endorsement_dropper( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to rename this to chunk_endorsement_dropper_by_hash
and the function below to chunk_endorsement_dropper_by_account
, because there is already an existing dropper with different purpose.
.genesis_height(10000) | ||
.gas_prices_free() | ||
.gas_limit_one_petagas() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of options like these can be removed, as they are default anyway.
Add resharding transition to the list of `implicit_transitions` in the state witness. The idea is that shard split is simply a transition from one shard id to another, and only the `PartialState` is enough to execute that. Additionally, we need to move to previous shard id in `collect_state_transition_data` and `validate_source_receipt_proofs`. This and #12253 allow to uncomment tests related to skipped chunks. Note that state witness including resharding transition is still considered always valid; this work is needed in order not to crash during its construction with error "some shard id doesn't exist for this layout"
I tried to use the feature added in #12235 to simulate a protocol upgrade with missing chunks (to test bandwidth scheduler upgrade), but I couldn't get it to work.
I spent some time debugging, and I think that we should drop chunk endorsements instead of partial chunks.
From what I saw, dropping partial chunks without dropping chunk endorsements causes the nodes to end up in a state where they have a block with some non-missing chunks, but they don't have the parts for these chunks.
The node can't process a block, prints out
Error::ChunksMissing
, and gets stuck.I replaced the function that drops partial chunks with a function that drops endorsements and now it works as expected.
I added a basic test which does a protocol upgrade between two specified protocol versions. Before the fix it doesn't work, the chain gets stuck on epoch boundary. With the fix everything works fine.
It uses a hardcoded shard layout, so isn't very useful for resharding, but I think having a generic "update from A to B" test would be useful for other purposes. For example I'd like to test upgrading to the protocol version that has bandwidth scheduler enabled. For that it'd be nice to have something generic that I can reuse.