Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(test_loop) - drop chunk endorsements instead of partial chunks to simulate missing chunks #12253

Merged
merged 5 commits into from
Oct 18, 2024

Conversation

jancionear
Copy link
Contributor

I tried to use the feature added in #12235 to simulate a protocol upgrade with missing chunks (to test bandwidth scheduler upgrade), but I couldn't get it to work.
I spent some time debugging, and I think that we should drop chunk endorsements instead of partial chunks.
From what I saw, dropping partial chunks without dropping chunk endorsements causes the nodes to end up in a state where they have a block with some non-missing chunks, but they don't have the parts for these chunks.
The node can't process a block, prints out Error::ChunksMissing, and gets stuck.

I replaced the function that drops partial chunks with a function that drops endorsements and now it works as expected.

I added a basic test which does a protocol upgrade between two specified protocol versions. Before the fix it doesn't work, the chain gets stuck on epoch boundary. With the fix everything works fine.
It uses a hardcoded shard layout, so isn't very useful for resharding, but I think having a generic "update from A to B" test would be useful for other purposes. For example I'd like to test upgrading to the protocol version that has bandwidth scheduler enabled. For that it'd be nice to have something generic that I can reuse.

Copy link

codecov bot commented Oct 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.66%. Comparing base (ba6c707) to head (c39700e).
Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12253      +/-   ##
==========================================
+ Coverage   71.62%   71.66%   +0.03%     
==========================================
  Files         837      838       +1     
  Lines      167105   167441     +336     
  Branches   167105   167441     +336     
==========================================
+ Hits       119696   119989     +293     
- Misses      42180    42221      +41     
- Partials     5229     5231       +2     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (-0.01%) ⬇️
db-migration 0.16% <0.00%> (-0.01%) ⬇️
genesis-check 1.25% <0.00%> (-0.01%) ⬇️
integration-tests 38.89% <100.00%> (+<0.01%) ⬆️
linux 71.30% <100.00%> (+0.06%) ⬆️
linux-nightly 71.24% <88.88%> (+0.04%) ⬆️
macos 54.27% <0.00%> (-0.04%) ⬇️
pytests 1.56% <0.00%> (-0.01%) ⬇️
sanity-checks 1.37% <0.00%> (-0.01%) ⬇️
unittests 65.38% <0.00%> (-0.05%) ⬇️
upgradability 0.21% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Longarithm
Copy link
Member

This is surprising. I'm also having issues with current implementation, but I don't understand why it causes issues with block processing. I want to reproduce it first.

Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for looking into this!

The original issue was that the same account is both block producer and chunk producer, it bypasses the network when including its own chunk into its block, and then no one can process this block.
Because chunk endorsements seem to be the new norm, the new method looks better.

/// `drop_chunks_condition` result.
pub fn partial_encoded_chunks_dropper(
pub fn missing_chunks_endorsement_dropper(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to rename this to chunk_endorsement_dropper_by_hash and the function below to chunk_endorsement_dropper_by_account, because there is already an existing dropper with different purpose.

Comment on lines 79 to 81
.genesis_height(10000)
.gas_prices_free()
.gas_limit_one_petagas()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of options like these can be removed, as they are default anyway.

@jancionear jancionear added this pull request to the merge queue Oct 18, 2024
Merged via the queue into near:master with commit a743a85 Oct 18, 2024
28 of 29 checks passed
@jancionear jancionear deleted the testloop-missing branch October 18, 2024 12:12
github-merge-queue bot pushed a commit that referenced this pull request Oct 22, 2024
Add resharding transition to the list of `implicit_transitions` in the
state witness.
The idea is that shard split is simply a transition from one shard id to
another, and only the `PartialState` is enough to execute that.
Additionally, we need to move to previous shard id in
`collect_state_transition_data` and `validate_source_receipt_proofs`.
This and #12253 allow to uncomment tests related to skipped chunks. Note
that state witness including resharding transition is still considered
always valid; this work is needed in order not to crash during its
construction with error "some shard id doesn't exist for this layout"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants