-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add feature flag for LastIndex and Erasure duplicate proofs #34360
Conversation
1000898
to
412385b
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #34360 +/- ##
========================================
Coverage 81.8% 81.8%
========================================
Files 820 820
Lines 220869 221087 +218
========================================
+ Hits 180791 180987 +196
- Misses 40078 40100 +22 |
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
core/src/window_service.rs
Outdated
feature_set: &FeatureSet, | ||
epoch_schedule: &EpochSchedule, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to send root_bank: &Bank
here instead of these 2 args.
The reason I was using feature_set
and epoch_schedule
in the other code was because I didn't want to keep a reference to bank in order to avoid reintroducing: #33105
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about LastIndex and Erasure duplicate proofs received from other nodes from gossip?
e.g. 50% of the cluster have upgraded to v1.17, and 50% have not; and then one node which does not have this patch starts sending LastIndex and Erasure duplicate proofs over gossip. Wouldn't that cause an issue because 50% of the cluster recognize that as valid duplicate proof and the other 50% don't?!
core/src/window_service.rs
Outdated
fn run_check_duplicate( | ||
cluster_info: &ClusterInfo, | ||
blockstore: &Blockstore, | ||
shred_receiver: &Receiver<PossibleDuplicateShred>, | ||
duplicate_slots_sender: &DuplicateSlotSender, | ||
bank_forks: &Arc<RwLock<BankForks>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
&Arc
is redundant. Either just &
or if you really need Arc
then just Arc
.
6a76edc
to
2798583
Compare
As stated here #34292 (comment) we don't consume duplicate proofs from gossip in fork choice. #32963 is the change which accomplishes that, and is planned to be enabled alongside the new vote tx change. |
2798583
to
ae703c9
Compare
ae703c9
to
e6215cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like feature gating the consensus part, but I am wondering if we should feature gate earlier in the pipeline in the ingestion/blockstore code which identifies these duplicates.
Like the new code wouldn't store the shreds. Wouldn't that cause issues if the duplicate is confirmed and the node has go with it anyways?
core/src/window_service.rs
Outdated
@@ -137,17 +141,50 @@ impl WindowServiceMetrics { | |||
} | |||
} | |||
|
|||
fn should_send_index_and_erasure_conflicts(shred_slot: Slot, root_bank: &Arc<Bank>) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be just &Bank
. not &Arc<Bank>
.
core/src/window_service.rs
Outdated
match root_bank | ||
.feature_set | ||
.activated_slot(&feature_set::index_erasure_conflict_duplicate_proofs::id()) | ||
{ | ||
None => false, | ||
Some(feature_slot) => { | ||
let epoch_schedule = root_bank.epoch_schedule(); | ||
let feature_epoch = epoch_schedule.get_epoch(feature_slot); | ||
let shred_epoch = epoch_schedule.get_epoch(shred_slot); | ||
// Has a 1 epoch delay, as we don't have enough information | ||
// on the epoch boundary of the feature activation | ||
feature_epoch < shred_epoch | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make this function:
https://github.com/solana-labs/solana/blob/2971e84ec/turbine/src/cluster_nodes.rs#L514-L526
and reuse it here.
core/src/window_service.rs
Outdated
) -> Result<()> { | ||
let mut root_bank = bank_forks.read().unwrap().root_bank().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clone
is redundant:
https://github.com/solana-labs/solana/blob/2971e84ec/runtime/src/bank_forks.rs#L166
core/src/window_service.rs
Outdated
if last_updated.elapsed().as_millis() as u64 > DEFAULT_MS_PER_SLOT { | ||
// Grabs bank forks lock once a slot | ||
last_updated = Instant::now(); | ||
root_bank = bank_forks.read().unwrap().root_bank().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clone is redundant
if send_index_and_erasure_conflicts { | ||
(shred, conflict) | ||
} else { | ||
return Ok(()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we actually feature-gate earlier in the blockstore code which identifies these conflicts?
One other issue is that I believe if blockstore already has a duplicate/conflict for a slot, future duplicates/conflicts are not processed the same ways as before (like they are not inserted into blockstore anymore, not sure if they are still channeled to replay or not). |
e6215cf
to
8314fa2
Compare
from a consensus perspective it doesn't matter which scenario a duplicate block proof is constructed for. they all result in the slot being marked as invalid in fork choice.
this is correct, if there already is a duplicate proof no further action will be taken. replay/gossip will not be notified. |
I believe #29227 should have been feature gated for the scenario you're describing. Receive duplicate proof from gossip -> store in blockstore -> no notification to replay. It seems #29227 was already in 1.16, so I believe it is too late to feature gate it. |
does this mean in current v1.16 code if it sees a last index or erasure conflict it will not mark the slot as dead but also will no longer recognize any kind of duplicates in the block? separately, didn't you submit a change which would extend erasure conflicts detection? |
yes that's correct. We will add the proof to the blockstore column but not invoke solana/ledger/src/blockstore.rs Lines 1212 to 1214 in 85e3058
however because we have stored it in the column solana/ledger/src/blockstore.rs Lines 1537 to 1540 in 85e3058
for any future simple duplicate shred case (2 shreds with the same index), we will no longer be able to take any action, as there is already a proof in the column: solana/core/src/window_service.rs Lines 142 to 157 in 85e3058
|
If you're referring to #33037, it is still in development. |
8314fa2
to
d7ffab5
Compare
* Add feature flag for LastIndex and Erasure duplicate proofs * pr feedback: use root bank instead of 2 params * pr feedback: & instead of &Arc * pr feedback: reuse fn, remove redundant clones * rebase: fix feature set conflict (cherry picked from commit def3bc4) # Conflicts: # sdk/src/feature_set.rs
…ackport of #34360) (#34541) * Add feature flag for LastIndex and Erasure duplicate proofs (#34360) * Add feature flag for LastIndex and Erasure duplicate proofs * pr feedback: use root bank instead of 2 params * pr feedback: & instead of &Arc * pr feedback: reuse fn, remove redundant clones * rebase: fix feature set conflict (cherry picked from commit def3bc4) # Conflicts: # sdk/src/feature_set.rs * fix feature set conflict --------- Co-authored-by: Ashwin Sekar <ashwin@solana.com>
Problem
These cases introduced in 1.17 #32965 need to be feature flagged as
solana/core/src/window_service.rs
Lines 172 to 173 in 74c54a7
will update fork choice.
Summary of Changes
Add feature flag to ensure that fork choice is consistent along the cluster when upgrading to v1.17.