Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: wiring for bandwidth scheduler #12234

Open
wants to merge 36 commits into
base: master
Choose a base branch
from
Open

feat: wiring for bandwidth scheduler #12234

wants to merge 36 commits into from

Conversation

jancionear
Copy link
Contributor

@jancionear jancionear commented Oct 16, 2024

Add the wiring needed for the bandwidth scheduler algorithm.

Changes:

  • Add a new ProtocolFeature - BandwidthScheduler, its protocol version is set to nightly
  • Add a struct that will keep the bandwidth requests generated by the shards
  • Propagate the bandwidth requests through the blockchain - put the generated bandwidth requests in the shard headers, pass the previous bandwidth requests to the runtime
  • Add a struct that represents the bandwidth scheduler state, it's stored in the trie and modified on every scheduler invocation.
  • Mock implementation of bandwidth scheduler - it takes the previous bandwidth requests and the state and mocks the scheduler algorithm. It activates the requests propagation logic and breaks some tests.

Propagation of bandwidth requests

The flow of bandwidth requests looks as follows:

  • A chunk is applied and generates bandwidth requests. They are put in ApplyResult and ApplyChunkResult
  • The requests are taken from the apply result and put in ChunkExtra. ChunkExtra is persisted in the database
  • During chunk production, Client fetches ChunkExtra of the previous chunk and puts the bandwidth requests in chunk header
  • The produced chunks are included in the block
  • The new chunks are applied, their ApplyState contains bandwidth requests taken from all the chunk headers in the block that contains the applied chunks.
  • During the application, bandwidth scheduler looks at the requests created at the previous height and grants banwidth
  • Receipts are sent out
  • Then the chunk generates new bandwidth requests
  • etc

The flow is very similar to the one for congestion info.

Scheduler state

Bandwidth scheduler needs to keep some persistent state. In the future it'll be something like "how much every shard was granted lately", it'll be used to maintain fairness. For now it's just mock data.
Scheduler state should always be the same on all shards. All shards start with the same scheduler state, apply the scheduler at the same heights with the same inputs and always end up with the same scheduler state.
This means that the bandwidth scheduler also needs to be run for missing chunks. Luckily that can be easily achieved thanks to existing apply_old_chunk infrastructure (all missing chunk are applied, it counts as "implicit state transitions").
The state_root will now change at every height, even when there are no receipts to be processed. It breaks some tests which assumed that the state root wouldn't change.

The pull request is meant to be reviewed commit-by-commit, I tried to make the commit history nice.

Add the structs which will be used to represent bandwidth requests
generated by a shard.
For now the BandwidthRequest doesn't have the requested values,
they will be added later.
Chunk headers will contain bandwidth requests
generated during the previous chunk application.

We will collect bandwidth requests from all the shards
and use them in bandwidth scheduler during chunk application.
Bandwidth requests will be generated during chunk application
and then they'll be available in the ApplyResult.
Result of chunk application should keep the generated bandwidth requests.
ChunkExtra stores the results of chunk application
in a persistent way. Let's put the generated bandwidth
requests there and then fetch them when producing the
next chunk.
Collect the bandwith requests generated by all shards
at the previous height and expose them to the runtime.
Runtime needs to have the requests, as they're needed
to run the bandwidth scheduler.
Add a struct that keeps the persistent state
used by the bandwidth scheduler.
Add a mock implementation of the bandwidth scheduler algorithm.
Bandwidth scheduler takes the current state and previous bandwidth
requests and generates bandwidth grants. The mock implementation
takes the inputs and generates deterministic state changes based
on them, but it doesn't generate the bandwidth grants yet.
The mock implementation is enough to activate the logic that
propagates bandwidth requests throughout the blockchain and
break some tests.
This test assumed that the state root doesn't change when there are no receipts,
but this is no longer true. Bandwidth scheduler modifies the state on every height,
so now the state root changes every time.
state_viewer::apply_chunk has the ability to apply a chunk
when the block that contains the chunk isn't available.

Initially I passed empty bandwidth requests in the ApplyState,
as usually they're taken from the chunk headers in the block
that contains the applied chunk, and this block isn't available here.

But that breaks test_apply_chunk - a test which applies a chunk
in a normal way and compares the result with a chunk that was
applied without providing the block. It expects the state roots
to be the same, but that's not the case because the bandwidth
requests are different and bandwidth scheduler produces different state.

To deal with this we can try to fetch the original bandwidth requests
from the chunk extra of the previous chunks. It's an opportunistic
reconstruction - if the chunk extra is available it adds the requests
to the apply_state, if not it leaves them empty.
This is enough to fix the state root mismatch.
This tests creates a situation where the last few chunks
in an epoch are missing. It performs state sync, then
takes the state root of the first missing chunk on one node
and expects the state roots of all the missing chunks on the
other node to be the same as that first state root.
This breaks because bandwidth scheduler changes the state
at every height - even for missing chunks - so the state
root for later missing chunks is not the same as the state
root of the first missing chunk.
Fix the problem by comparing state roots of missing chunks
at the same heights.
This test performs state sync and then does a function call on the synced node
to test that the sync worked.
The function call at the end of the test started failing with `MissingTrieValue`.
I suspect that the function call is done with the wrong state root - it worked previously,
when all the state roots were the same, as the chunks don't have any transactions,
but broke when bandwidth scheduler started changing the state at every height.
The `MissingTrieValue` error stops occuring when the state root is taken from
the previous block.
My understanding of state sync isn't very good, but I think this theory makes sense.
Add an extra check to ensure that the scheduler state
stays the same on all shards.
@jancionear jancionear requested a review from a team as a code owner October 16, 2024 13:49
Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 84.59870% with 71 lines in your changes missing coverage. Please review.

Project coverage is 71.61%. Comparing base (e0f4592) to head (42eae3b).

Files with missing lines Patch % Lines
core/primitives/src/views.rs 73.50% 31 Missing ⚠️
chain/chain/src/validate.rs 54.83% 13 Missing and 1 partial ⚠️
runtime/runtime/src/bandwidth_scheduler/mod.rs 90.00% 5 Missing and 2 partials ⚠️
core/primitives/src/bandwidth_scheduler.rs 84.00% 4 Missing ⚠️
tools/state-viewer/src/util.rs 33.33% 4 Missing ⚠️
chain/rosetta-rpc/src/adapters/transactions.rs 0.00% 3 Missing ⚠️
core/primitives/src/types.rs 93.33% 2 Missing ⚠️
tools/state-viewer/src/apply_chunk.rs 84.61% 1 Missing and 1 partial ⚠️
chain/chain-primitives/src/error.rs 0.00% 1 Missing ⚠️
..._validation/chunk_validator/orphan_witness_pool.rs 50.00% 1 Missing ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12234      +/-   ##
==========================================
+ Coverage   71.57%   71.61%   +0.04%     
==========================================
  Files         836      838       +2     
  Lines      168073   168463     +390     
  Branches   168073   168463     +390     
==========================================
+ Hits       120294   120644     +350     
- Misses      42524    42561      +37     
- Partials     5255     5258       +3     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (-0.01%) ⬇️
db-migration 0.16% <0.00%> (-0.01%) ⬇️
genesis-check 1.24% <0.00%> (-0.01%) ⬇️
integration-tests 38.94% <76.57%> (+0.13%) ⬆️
linux 71.14% <47.28%> (-0.08%) ⬇️
linux-nightly 71.18% <78.52%> (+0.01%) ⬆️
macos 54.19% <44.37%> (-0.07%) ⬇️
pytests 1.55% <0.00%> (-0.01%) ⬇️
sanity-checks 1.36% <0.00%> (-0.01%) ⬇️
unittests 65.38% <82.60%> (+0.04%) ⬆️
upgradability 0.21% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

… is enabled

All chunks produced in the protocol version where bandwidth scheduler is enabled
should use ShardChunkHeaderInner::V4, I missed this in the previous commit.
The test iterates over all items in the trie and creates a StateRecord for each of them.
The problem is that some types of trie entries don't have a corresponding StateRecord variant.
For example outgoing buffers, yield resume data, and bandwidth scheduler state can't be made
into a StateRecord.
The test started failing because it tries to unwrap result of `StateRecord::from_raw_key_value`
for a trie entry that represents BandwidthSchedulerState. The function returns None and the
unwrap panics.
Fix the problem by removing the unwrap and instead looking for `Some` value. The test only
looks for one type of StateRecord, it doesn't matter if it skips over the scheduler state.
@jancionear
Copy link
Contributor Author

For the testloop test I need another fix to be merged: #12253
For now I cherry-picked commits from the fix (a0ea12d, ca1ebd6) to get the test running. I will revert them when the fix gets merged.

@jancionear
Copy link
Contributor Author

I will revert them when the fix gets merged.

The fix got merged. Reverted commits with the old version of the test, merged master and added a test using the functionality on master.

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through half of it, so far looks good to me. The only piece that is a bit concerning is the validate - please see my comment

@tayfunelmas Do you remember what was that trouble we had with congestion control initialization in genesis? Would we need to handle it similarly here?

Comment on lines +109 to +113
/// Requests for bandwidth to send receipts to other shards.
/// Will be None for protocol versions that don't have the BandwidthScheduler feature enabled.
pub bandwidth_requests: Option<BandwidthRequests>,
/// Used only for a sanity check.
pub bandwidth_scheduler_state_hash: CryptoHash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mini nit: Maybe put it next to congestion control? They are very similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +223 to +232
if extra_bandwidth_requests.is_none()
&& header_bandwidth_requests == Some(&BandwidthRequests::empty())
{
// This corner case happens for the first chunk that has the BandwidthScheduler feature enabled.
// The previous chunk was applied with a protocol version which doesn't have bandwidth scheduler
// enabled and because of that the bandwidth requests in ChunkExtra are None.
// The header was produced in the new protocol version, and the newer version of header always
// has some bandwidth requests, it's not an `Option`. Because of that the header requests are `Some(BandwidthRequests::empty())`.
return Ok(());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do it the same way as it worked for congestion info? Basically they always have to match. I don't remember the exact reasons but it was way cleaner way to handle protocol upgrade this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not sure if it's better, the way congestion control does it causes the protocol to accept two different chunk header versions for one protocol version :/
I don't know, I can try it in a follow up PR. I'll merge this one as is, there's enough mess here already. Let's discuss it in a different PR.

Comment on lines +198 to +201
ShardChunkHeaderInner::V1(_) => unimplemented!(),
ShardChunkHeaderInner::V2(inner) => inner.encoded_length = encoded_length,
ShardChunkHeaderInner::V3(inner) => inner.encoded_length = encoded_length,
_ => unimplemented!(),
ShardChunkHeaderInner::V4(inner) => inner.encoded_length = encoded_length,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, always better to be explicit and exhaust all variants, ❤️

@@ -227,6 +227,7 @@ pub fn create_chunk(
decoded_chunk.prev_outgoing_receipts(),
header.prev_outgoing_receipts_root(),
header.congestion_info(),
header.bandwidth_requests().cloned(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity why do you need cloned here for requests and not for congestion info? Is congestion info Copy or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup CongestionInfo is Copy. Bandwidth requests are not because they contain a vec.

Comment on lines 262 to +263
// TODO(#11099): Move this feature to Nightly.
ProtocolFeature::ExcludeContractCodeFromStateWitness => 146,
ProtocolFeature::ExcludeContractCodeFromStateWitness => 147,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, there is nightly and then there is after nightly xD

Comment on lines +32 to +40
HashMap::from_iter(
[
(ShardId::new(0), 0..0),
(ShardId::new(1), -2..0),
(ShardId::new(2), 0..2),
(ShardId::new(3), -2..2),
]
.into_iter(),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's sweet

Comment on lines +52 to +53
let state_record = StateRecord::from_raw_key_value(key, value);
if let Some(StateRecord::Account { account_id, .. }) = state_record {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an actual change in semantics, is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional, explained in the commit description

Comment on lines +3714 to +3715
let chunk_extra =
env.clients[0].chain.get_chunk_extra(block.header().prev_hash(), &shard_uid).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's strange, what happened here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out commit description

Comment on lines +748 to +754
let expected_height = if ProtocolFeature::BandwidthScheduler
.enabled(genesis.config.protocol_version)
{
height
} else {
last_chunk_height
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I want to know?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote about it in the commit description

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO(wacban) - continue reviewing from here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah crap, maybe I should've mentioned that the PR is meant to be reviewed commit-by-commit. I made an effort to make the commit history nice, and added some additional comments about the changes in commit descriptions to make reviewing easier.

@jancionear
Copy link
Contributor Author

@tayfunelmas Do you remember what was that trouble we had with congestion control initialization in genesis? Would we need to handle it similarly here?

The good thing about bandwidth requests is that they can always be empty, it doesn't matter what the state is. AFAIK congestion info needs to be initialized based on the shard state, but for requests this is not a problem. Maybe there's some issue I'm missing, but IMO it should be fine.

Clippy started complaining that these structs are too large (>1024 bytes)
and recommended to put them in Box. I did as clippy suggested.
Putting structs in boxes shouldn't change the serialization format.
Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 2137 to 2142
bandwidth_requests,
contract_accesses,
bandwidth_scheduler_state_hash: bandwidth_scheduler_output
.as_ref()
.map(|o| o.scheduler_state_hash)
.unwrap_or_default(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: fix the ordering

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fixed now, the master merge fiddled with it.

@@ -56,6 +56,7 @@ pub mod col {
/// backpressure on the receiving shard.
/// (`primitives::receipt::Receipt`).
pub const BUFFERED_RECEIPT: u8 = 14;
pub const BANDWIDTH_SCHEDULER_STATE: u8 = 15;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I mixed that up with nibbles. You're right, we should be just fine. That's awesome :)

@jancionear jancionear added this pull request to the merge queue Oct 23, 2024
ShardChunkHeader::V3(header) => header.inner.bandwidth_requests(),
}
}

/// Returns whether the header is valid for given `ProtocolVersion`.
pub fn valid_for(&self, version: ProtocolVersion) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to myself - make sure that we check this in witness validation. Didn't matter before, but might matter now.

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants