-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use LRU strategy for shuffling/epoch caches #4196
Conversation
When EL `newPayload` is slow (e.g., Raspberry Pi with Besu), the epoch and shuffling caches tend to fill up with multiple copies per epoch when processing gossip and performing validator duties close to wall slot. The old strategy of evicting oldest epoch led to the same item being evicted over and over, leading to blocking of over 5 minutes in extreme cases where alternate epochs/shuffling got loaded repeatedly. Changing the cache eviction strategy to least-recently-used seems to improve the situation drastically. A simple implementation was selected based on single linked-list without a hashtable.
Example dump, listing all active
|
A simpler implementation will put a timestamp or (shared) counter in each entry, significantly simplifying the implementation - hash table indeed wouldn't make sense for so few entries |
also - which of the shuffling or epoch caches is it? shuffling caches should be a lot more stable given the 1-epoch lag in selecting them - epoch caches are more fickle indeed |
I encountered this with the epoch cache on my Pi, not sure how much the shuffling cache is affected as well. However, spending the 17 extra-bytes is cheaper than risking running into a pessimistic situation; also in situations of heavy forking. Note that the previous solution also is O(n) to access, so there's not much to gain from using the previous logic compared to a LRU scheme (the rationale for shufflingref was copy/pasted from the epochref implementation). Logical timestamp solution: Regular timestamps:
Single linked-list:
Double linked-list:
Considering the size and access frequency, I think the logical timestamp solution may indeed be preferred. Has the advantages that it was also tested for years already ^^ Which one would you prefer? |
logical is fine - as it iterates through the items it can just collect the highest "timestamp" so far and +1 or something similar - I agree that full LRU is overkill, but I also consider the half-assed singly-linked-list overkill (in terms of maintenance complexity).. a full LRU with hash and all would be generally useful perhaps (and maybe motivate the maintenance), and if we had one we would perhaps use it here too, but as is, the dumb logical timetstamp approach feels more right. |
this is important: the epoch cache is cheap per-instance, the shuffling cache is not - also, there may potentially be many epochrefs that reuse the same shuffling, so it might be that we should bump epochref caches to 64 for example |
Will add a small debug level log whenever an item is evicted due to capacity limits. This allows analysis in adverse forking situations. So far, with 32 / 16 unchanged, and only updating to LRU, I managed to get Nimbus+Besu synced for the first time in 3 days on mainnet. |
@@ -265,6 +265,55 @@ func atSlot*(dag: ChainDAGRef, bid: BlockId, slot: Slot): Opt[BlockSlotId] = | |||
else: | |||
dag.getBlockIdAtSlot(slot) | |||
|
|||
func nextTimestamp[I, T](cache: var LRUCache[I, T]): uint32 = | |||
if cache.timestamp == uint32.high: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most of us lazy people would have gone with a uint64
and no reset loop :)
break | ||
if min != 0: | ||
{.noSideEffect.}: | ||
debug "Cache full - evicting LRU", cache = typeof(T).name, capacity = I |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing it for both ShufflingRef
and EpochRef
on Raspi around ~10 minutes behind wall clock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird, means there's deep forks happening at a high rate or we're creating shufflings / states unnecessarily .. is this mainnet? I don't remember seeing that much churn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ie might be worth looking into why we're even creating these shufflings / epochrefs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainnet with in-BN validators and Besu (slow processing / pessimistic sync)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but why the shufflings? until we're synced, we shouldn't be making shufflings and after, we should only make them for legit forks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"We're synced" depends on syncHorizon
, so there are 3 parallel tasks going on:
- Catching up, this is around ~10 minutes behind when full caches are observed
- Gossip, validating those triggers new EpochRef to get proposerkey, and as part of epochref init it triggers shuffling
- Validator duties, based on however far it is caught up, but forward propagated to current slot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching up, this is around ~10 minutes behind when full caches are observed
This should certainly not trigger epochref starvation - ie when were syncing it's a single history where the time-based eviction should work just fine.
Gossip, validating those triggers new EpochRef to get proposerkey, and as part of epochref init it triggers shuffling
does this trigger before we've validated the parent? because if we've validated the parent, we shouldn't need a new shuffling except once per epoch.
Ditto attestations: we can't gossip-check those until the block being voted for has been synced so head-compatible votes shouldn't cause epochrefs
Validator duties, based on however far it is caught up, but forward propagated to current slot
this one is uglier indeed - computing proposers while syncing is messy but necessary, but why isn't the epoch-based cache working out here? it should be roughly equivalent to LRU...
I did 3 syncs from ~10 minutes behind on Mainnet using Pi with validators and Besu 22.7.4 EL + restart script on incorrect |
The patch is good in general (LRU makes more sense for sure) - but it still sounds weird to me that we're generating that many epochrefs on an ordinary sync - something else is wrong also if this is happening on mainnet, or the various params are not in tune with each other (ie too few caches vs sync horizon etc) |
Per discussion elsewhere, and for future reference - a significant contributing factor to the poor eviction performance of the pre-LRU design is the usage of that said, LRU should work in a similar way. |
This |
Good catch, looks unrelated to this PR though, so still fine to go ahead with this one. aarch64 is the one without Geth, so it should be the least complex one of the finalization tests. |
When EL
newPayload
is slow (e.g., Raspberry Pi with Besu), the epoch and shuffling caches tend to fill up with multiple copies per epoch when processing gossip and performing validator duties close to wall slot. The old strategy of evicting oldest epoch led to the same item being evicted over and over, leading to blocking of over 5 minutes in extreme cases where alternate epochs/shuffling got loaded repeatedly. Changing the cache eviction strategy to least-recently-used seems to improve the situation drastically. A simple implementation was selected based on single linked-list without a hashtable.