-
-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: prune oldest slots first in aggregator tracker #6668
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## unstable #6668 +/- ##
============================================
- Coverage 61.81% 61.80% -0.02%
============================================
Files 556 556
Lines 59050 59073 +23
Branches 1898 1898
============================================
+ Hits 36502 36508 +6
- Misses 22505 22522 +17
Partials 43 43 |
Performance Report✔️ no performance regression detected 🚀🚀 Significant benchmark improvement detected
Full benchmark results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to have unit tests to ensure no regression are introduced to those kind of expectations.
|
||
pruneSetToMax(this.subnetAggregatorsBySlot, MAX_SLOTS_CACHED); | ||
pruneSetToMax( | ||
this.subnetAggregatorsBySlot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative could be to keep subnetAggregatorsBySlot
sorted at insertion time? We might not have the necessary data structure yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had three options in mind
- sort keys when pruning (as implemented)
- switch to
pruneBySlot
logic, I am not sure if this strategy is correct here, although it could keep the cache size lower, maybe @tuyennhv has some thoughts - as you suggested @jeluard always have keys sorted, I think we wanna keep a map for quick lookup times, although might not matter much since the array would have max 64 items
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current approach is probably fine given the sizes involved, probably better not to over engineer yet :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked this yesterday, we have a ordered map
export class OrderedMap<K, V> { |
This one keeps track of the keys which is the tradeoff I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative using an ordered map unstable...nflaig/ordered-map-aggregator-tracker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out our "ordered" map isn't actually ordered... added units tests now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current approach is probably fine given the sizes involved, probably better not to over engineer yet :)
I agree with this considering the cache size is just 64, may want to add a note for the reasoning we choose this approach
Could add unit tests for aggregator tracker but the logic is pretty simple, the problem with this issue is that it involves multiple parts of the code base, so only good way too test this is in an e2e or even better a sim test. |
@tuyennhv sorry I misclicked and re-requested your review |
Keeping as draft for now, happy to elaborate on different approaches, and probably good idea to have a minimal set of units tests on aggregator tracker. |
🎉 This PR is included in v1.18.0 🎉 |
Motivation
Due to the fact the we are pruning based on FIFO it might happen that we prune a slot which is in the future and should not yet be pruned.
This is completely random as is depends on the order of which the validator client submits the subscriptions, current epoch vs. next epoch first, and the order of the items themselves. For example at the start of epoch 0, the client submits epoch 1 first, then epoch 0 and if there is an aggregator for each slot (likely with a lot of keys) then the cache is already full (64 items). Then at the start of epoch 1, some clients only submit for epoch 2 (next epoch) but this means we start pruning items of epoch 1, but that's the current epoch which should not yet be pruned.
While Lodestar calls prepareBeaconCommitteeSubnet always for current and next epoch, other clients like Nimbus, Teku and Vouch do not, which might lead to the issue that Lodestar does not prepare an aggregated attestation (#6631, #6419), as at the time when the attestations are submitted via submitPoolAttestations the aggregator check will return false.
Description
Prune oldest slots first in aggregator tracker to ensure we only prune slots that are in the past.
Alternative solution
We could also increase the max cache slots to 3 epochs
Closes #6631
Closes #6419