Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Shadow tracking #11689

Merged
merged 5 commits into from
Jul 5, 2024
Merged

feat: Shadow tracking #11689

merged 5 commits into from
Jul 5, 2024

Conversation

staffik
Copy link
Contributor

@staffik staffik commented Jun 28, 2024

Part of: near/near-one-project-tracking#65
An option for non-validator node to track shards of given validator.

During stateful -> stateless protocol upgrade a node will track all shards and will require a lot of RAM. After the migration we can move the validator key to a new, smaller node, that does not track all shards.
To make it with minimal downtime, the new node needs to have appropriate shards in place and memtries loaded in memory, then we hot swap the validator key without stopping the new node.
But before that happen the new node is not a validator and we need a way to tell it which validator's shards it should track.

@staffik staffik added the A-stateless-validation Area: stateless validation label Jun 28, 2024
@staffik staffik requested review from wacban and tayfunelmas June 28, 2024 19:04
@staffik staffik requested a review from a team as a code owner June 28, 2024 19:04
@staffik staffik force-pushed the shadow_tracking branch 2 times, most recently from 5fca122 to e1e1095 Compare June 29, 2024 16:35
Copy link

codecov bot commented Jun 30, 2024

Codecov Report

Attention: Patch coverage is 37.25490% with 32 lines in your changes missing coverage. Please review.

Project coverage is 71.78%. Comparing base (59e2b88) to head (ba96204).
Report is 27 commits behind head on master.

Files Patch % Lines
chain/chain/src/test_utils/kv_runtime.rs 0.00% 17 Missing ⚠️
chain/epoch-manager/src/adapter.rs 0.00% 9 Missing ⚠️
chain/epoch-manager/src/shard_tracker.rs 0.00% 3 Missing and 1 partial ⚠️
chain/epoch-manager/src/lib.rs 88.23% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11689      +/-   ##
==========================================
- Coverage   71.81%   71.78%   -0.03%     
==========================================
  Files         790      792       +2     
  Lines      161945   162586     +641     
  Branches   161945   162586     +641     
==========================================
+ Hits       116294   116710     +416     
- Misses      40618    40824     +206     
- Partials     5033     5052      +19     
Flag Coverage Δ
backward-compatibility 0.23% <0.00%> (-0.01%) ⬇️
db-migration 0.23% <0.00%> (-0.01%) ⬇️
genesis-check 1.35% <2.00%> (-0.01%) ⬇️
integration-tests 37.89% <35.29%> (-0.01%) ⬇️
linux 71.27% <37.25%> (+2.13%) ⬆️
linux-nightly 71.38% <37.25%> (+0.08%) ⬆️
macos 54.27% <37.25%> (+1.70%) ⬆️
pytests 1.58% <2.00%> (-0.01%) ⬇️
sanity-checks 1.38% <2.00%> (-0.01%) ⬇️
unittests 66.20% <37.25%> (-0.15%) ⬇️
upgradability 0.27% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test that it works well when the shard assignment is changing and the node should state sync?

chain/chain/src/test_utils/kv_runtime.rs Show resolved Hide resolved
chain/epoch-manager/src/lib.rs Outdated Show resolved Hide resolved
chain/epoch-manager/src/shard_tracker.rs Outdated Show resolved Hide resolved
core/chain-configs/src/client_config.rs Outdated Show resolved Hide resolved
@staffik staffik requested a review from wacban July 1, 2024 12:39
@staffik
Copy link
Contributor Author

staffik commented Jul 1, 2024

Can you test that it works well when the shard assignment is changing and the node should state sync?

Tested in forknet20, I will add a dedicated pytest.

@@ -90,6 +96,9 @@ impl ShardTracker {
let subset = &schedule[index as usize];
Ok(subset.contains(&shard_id))
}
TrackedConfig::ShadowValidator(account_id) => {
self.epoch_manager.cares_about_shard_in_epoch(*epoch_id, account_id, shard_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also give some warning somewhere (potentially here) if the accountid is not eligible to be a chunk validator as all? I was thinking about when loading the config but at that point I am not sure if the node will have the full info about the other validators.

Copy link
Contributor Author

@staffik staffik Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging it here is spammy: resulted in thousands of logs in a very short nayduck test.

@staffik staffik added this pull request to the merge queue Jul 5, 2024
@staffik staffik removed this pull request from the merge queue due to a manual request Jul 5, 2024
@staffik staffik enabled auto-merge July 5, 2024 18:28
@staffik staffik added this pull request to the merge queue Jul 5, 2024
Merged via the queue into master with commit 03a8b5d Jul 5, 2024
28 of 30 checks passed
@staffik staffik deleted the shadow_tracking branch July 5, 2024 19:02
VanBarbascu pushed a commit that referenced this pull request Jul 6, 2024
Part of: near/near-one-project-tracking#65
An option for non-validator node to track shards of given validator.

During stateful -> stateless protocol upgrade a node will track all
shards and will require a lot of RAM. After the migration we can move
the validator key to a new, smaller node, that does not track all
shards.
To make it with minimal downtime, the new node needs to have appropriate
shards in place and memtries loaded in memory, then we hot swap the
validator key without stopping the new node.
But before that happen the new node is not a validator and we need a way
to tell it which validator's shards it should track.
marcelo-gonzalez added a commit that referenced this pull request Jul 8, 2024
This was added in #11634 and
is required by the previous cherry pick of #11689 (36ab9db)
because otherwise the build fails
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stateless-validation Area: stateless validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants