-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PVF: prioritise execution depending on context #4632
Comments
I'm not prefectly happy about disputes being higher priority than approvals, depending upon what approvals means: If we delay approval announcements then we risk adversaries breaking soundness when honest nodes get bogged down. If we delay approval checks, then we risk unecessary escalations. If we add some "I'm here but overworked" vote then we incur that code complexity. We do not necessarily have a choice but to prioritize disputes of course, since disputes mess everything up anyways. We could discuss "soft" options like a dispute appearing on-chain turns off backing & inclusion for a few blocks. It's tricky picking how long though: A dispute could resolve fast, but not a timeout dispute, although maybe polkavm fixes that. |
This is what I had in mind and will indeed cause escalations, but that is fine it back pressures on backing PVF execution (least priority) which means less approval work for next block allowing the network to recover.
I totally agree, we will have to prioritise disputes and maybe we should not starve approvals, like we would do with backing and still allow 10% approval pvf execution vs 90% disputes or smth like that. |
Alright delaying the approval checks is secure. It'll just cause more slowdowns, which'll ultimately hit backing. If approvals remain at 100%, but backing stops completely for one block, then we've made time to process one dispute, well except for the extra reconstruction overhead.We might've more than one dispute of course, but someone is being billed for them. We cannot stop backing compeltely either, but.. Also, if approvals run at maybe 60% or something, then we typically still finish approval checks before we no-show, which avoids escalations. I think the first question here is: At the time disputes occur, how hard do we back pressue our local backing? We could stip 2-3 backing slots or something every time a dispute occurs, even if we do not feel the pressure ourselves? |
My current thinking is to still allow 10% backing. Getting at least 10% approvals is not going to have any impact on finality if it is lagging because of disputes, but will reduce the number of no-shows which is beneficial. However, because disputes are something that happen very rarely I don't think it is worth doing it. To get a clear answer we should run some dispute load tests on Versi with both scenarios. |
We're discussing situations where disputes have occured across many cores, or in many times slots for only a few parachains. If we've only a few parachains causing many disputes, then maybe they could be handled in sequence, but not sure doing so warrants extra complexity. We might however sequence disputes anyways, like maybe oldest first so reversions turn the rest into remote disputes. In principle, we could do remote disputes more lazily, but we've no concensus upon which disputes are remote, unless grandpa finalizes something remote from the dispute's perspective. Again not sure if we need code for this: Yes, this could same considerable CPU during a storm of correct disputes of invalid blocks, but that's even more rare. Imho, incorrect storms of disputes of valid blocks sounds way more likely. We should likely keep that part simple for now, but just discussing the design space there. Anyways.. Approvals should imho never halt completely, not sure how much reserve, but 10% sounds reasonable.
Approvals might never be empty, so not sure this works per se. In theory, I'd halt backing completely if we've really exhausted all other resources. We'd maybe want some exceptions though, like sytem parachains dong ellections, or DKGs, or maybe other governance. If we'd want such special reserved system parachains, then we might start by simply never halting backing completely, and then later if necessary then add per parachain configujration there. Approval's should always have more reserved capacity, as otherwise approvals might never catch up vs backing. At what points should we trigger this back pressure on backing?
That's like 5 possible overworked metrics. Any thoughts on what's simplest? If we only do one, then queue fullness sounds good, because maybe our queue fills up because others no-show. If we do two, then maybe add network no-shows or finality. I think beyond this then we should more carefully assess. |
With validator disabling these things are looking very good now. I don't think we should optimize for disputes and in this current proposal they have max priority.
Sure, we just want some simple prioritization to better control system load. Further optimizations should be done later, only if needed.
What is the reasoning for this 10% ? Do you believe that reducing no-shows by a small margin improves the overall outcome of a dispute storm scenario ?
Yes, that might be true to some extent, so we need to reserve some % resources for backing.
Yes, we should reserve capacity for the system parachains, giving something like 10% to backing no matter the load should not make the situation worse.
Yeah, they have 90% if no disputes. We can consider lowering backing reservation to just 5%, 1 in 20 executions is guaranteed to be backing. If these are light blocks it should still work fine. However if blocks are 2s, it will take 40s for the the candidate to be backed by which time it's relay parent has already expired.
This is hard to measure in practice, but the prioritization proposed here actually achieves this without measuring the load directly.
I think it is a good idea, this can happen for example because availability recovery is slow because of the nework. In this case, there wouldn't be any actual CPU load, so the prioritization doesn't help at all.
The prioritization here should delay backing if queue is full in terms of there is more work that we can sustain in a reasonable time. The fullness depends on how much time the execution takes. You could have 10 executions queued and each one will take 2s, but you could also have 30 executions of 100ms each. Hard to know how full the queue is until you execute it. We could optimize and add a TTL for the backing executions to drop them if the relay parent has expired.
This must be done in the runtime. We just need to put finality proofs and we can select how many get backed.
No-shows tipically will raise the CPU/network load as more tranches needed to cover so we will a bit later slow down backing. Doing it earlier as you propose can improve the situation probably faster
We want to implement the prioritization first. Then we do some glutton testing and calibrate the backing percentage. We should also test 10% approval reservation. |
I'd actually prever higher minimum for approvals, like max 70% for disputes, minimum 25% for approvals, and minimum 5% for critical system parachains, but no backing reservation for other parachains, or even non-criticial system parachains really. We could likely go below this 25% but then we should think about crticial vs non-critical system parachains. We could ignore crticial vs non-critical system parachains and simply reserve more approval times for them. Or maybe we'll have so few system parachains that 10% is alraedy enough we can ignore this distinction?
Yeah, every no-show add 2.25 more checkers.
Alright yeah let's not directly measure CPU load. :)
Interesting, the queue could be empty but slow recovery might be causing no-shows. At what times do we expect this? At least one scenario goes: Some validators get elected who have network disruptions to many other validators, like NATs or BGP attacks. Or maybe some validators try earning themselves extra points in availability by voting early that they have their chunk, assuming they'll get it eventually. If enough, then this hurts availability. I guess back pressure makes sense here, but it's clearly secondary to the queue.
Can we just assume worse case here? We'd tune based on gluttons I guess, so maybe that's easiest anyways?
Why? I'd think the opposite: We know what our grandpa is voting on, so if it's too old than apply back pressure.
Alright so we've boiled down my criteria list to:
Yeah of course, one thing at a time. We can talk later about whether we really need all of a,b,c. :) I'd missed one factor above: We need back pressure not only in our own backing like discussed here, but also in the relay chain block production: If dishonest or confused backers produce backing statements while being back presured, then an honest relay chain block producer should skip their statments too. We need not rush into doing this, since it only matters if nodes are being dishonest, conecting poorly, etc. Anyeways, we'd eventually wire some of those three criteria into (x) back pressuring doing & gossiping backing statements, and (y) placing backing statments into relay chain blocks. |
In my proposal we have 3 levels of priority. So we'll have that mapping:
I'd start with a 70/20/10 split and then we can tweak it as we test. |
Are those numbers CPU time "priorities" for those tasks when back pressure gets applied? Or are they overall allocations at all times? If a distinguished state, then any idea how quickly we can enter and leave the "back pressured" state? I presume those priorities fall through, so if there are no disputes then the 100% is divided among the remaining categories? Approvals & system parachains backing should not be given the same weight. We do not notice this problem right now, but if someone does something silly like give longer runtimes (JAM) then this'll break. Approvals should be given much more CPU time, which under good conditions resembles the paramater choices in #640. In other words, if there are no problems then approvals should've roughly At a more intuitive level, two backing statement creates work for like 35 other nodes, but possibly 1000 other nodes if we're experencing network disruptions, so approvals should be like 14 times more CPU than all backing, even if zero no-shows occur, not so different from the Anyways.. It sounds like you do not have a distinguished back pressure state yet, so then maybe the number should be:
If there are no disputes then we'd choose backing over approvals 6.7 % of the time, which sounds reasonable. It's still possible this change results in more delayed backing votes, but that likely signals some problems elsewhere. Any thoughs @alexggh ? Also.. We can seperate system and other partachains once we eventually add some distinguished back pressure state, but the simplest breakdown under back pressure would be
In this, worse case includes several meanings:
|
Yes, we definitely, need to implement it like that, otherwise we waste capacity.
The way I see it, backing is more time sensitive than approving, but approving is more important than backing, so our prioritisation mechanisms needs to reflect that. I don't think it is a problem if we delay backing if we are overloaded since we need to apply back-pressure to not create more approval work, but it is an issue if we delay backing because we have a steady stream of approvals and always prioritise approvals over backing or if we add a significant delay to backing when would actually be better served, by doing the backing first and then do all the other approvals(since we've got time for them). I worry, that allocating just 2% to backing might add the extra delay in situations where we are not overloaded. My simple heuristic would be something like that: If the execution queue is under NEW_WORK_PER_BLOCK(MODULO_SAMPLES + 1), backing should be allocated first, If not use this statistical prioritisation. Bare in mind that we have at least 2(maybe 4 in the future) parallel execution streams so approval votes wouldn't be starved because we should have no more than one block to back. |
That 2% becomes 6.7% if there are no disputes. What is
This make sense only if we detect the system needs back pressue, and then change the rules. All those backing votes create more work for the whole system. I launched off into discussing detection above, but afaik the current discussion is really only about back pressue without explicit detection, aka only using disputes as the trigger in a simplistic way. |
It is how many parachain blocks we expect normally to execute/verify per relay-chain blocks it should be around
Yeah, simplistic is what I'm thinking as well, but this is not only for disputes, I'm more concerned about the situation where sharding breaks because of no-shows and nodes have to check a lot more candidates than the available CPU resources, so we need to slow-down so we can catch up. I'm not too opinionated about this, but I think any solution/split we end up with needs to make sure it doesn't affect the base-case where the node is not over-loaded. |
Ain't clear "expect" is well defined there. We've the current queue length which depend upon our assignements from parameters and no-shows, and the unknowable workload they represent. I'll suggest roughly this logic: If there exists a dispute that's pending, aka not yet being run, then we probabalisatically run the dispute 70-80% of the time, or fall back to the non-dispute system 30-20% of the time. In our non-dispute system..
Again it's possible this happy queue test messes things up, because now we do not back unless we're caught up on our workload from the previous relay chain block, so this'll require some care and revision. We could set If you need a constant then |
Resolves #4632 The new logic optimizes the distribution of execution jobs for disputes, approvals, and backings. Testing shows improved finality lag and candidate checking times, especially under heavy network load. ### Approach This update adds prioritization to the PVF execution queue. The logic partially implements the suggestions from #4632 (comment). We use thresholds to determine how much a current priority can "steal" from lower ones: - Disputes: 70% - Approvals: 80% - Backing System Parachains: 100% - Backing: 100% A threshold indicates the portion of the current priority that can be allocated from lower priorities. For example: - Disputes take 70%, leaving 30% for approvals and all backings. - 80% of the remaining goes to approvals, which is 30% * 80% = 24% of the original 100%. - If we used parts of the original 100%, approvals couldn't take more than 24%, even if there are no disputes. Assuming a maximum of 12 executions per block, with a 6-second window, 2 CPU cores, and a 2-second run time, we get these distributions: - With disputes: 8 disputes, 3 approvals, 1 backing - Without disputes: 9 approvals, 3 backings It's worth noting that when there are no disputes, if there's only one backing job, we continue processing approvals regardless of their fulfillment status. ### Versi Testing 40/20 Testing showed a slight difference in finality lag and candidate checking time between this pull request and its base on the master branch. The more loaded the network, the greater the observed difference. Testing Parameters: - 40 validators (4 malicious) - 20 gluttons with 2 seconds of PVF execution time - 6 VRF modulo samples - 12 required approvals ![Pasted Graphic 3](https://github.com/user-attachments/assets/8b6163a4-a1c9-44c2-bdba-ce1ef4b1eba7) ![Pasted Graphic 4](https://github.com/user-attachments/assets/9f016647-7727-42e8-afe9-04f303e6c862) ### Versi Testing 80/40 For this test, we compared the master branch with the branch from #5616. The second branch is based on the current one but removes backing jobs that have exceeded their time limits. We excluded malicious nodes to reduce noise from disputing and banning validators. The results show that, under the same load, nodes experience less finality lag and reduced recovery and check time. Even parachains are functioning with a shorter block time, although it remains over 6 seconds. Testing Parameters: - 80 validators (0 malicious) - 40 gluttons with 2 seconds of PVF execution time - 6 VRF modulo samples - 30 required approvals ![image](https://github.com/user-attachments/assets/42bcc845-9115-4ae3-9910-286b77a60bbf) --------- Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
From the node perspective, the goal is to trade off parachain liveness for finality . When finality lag hits a threshold, the mechanism should kick in and enforce the following priority:
In case of node overload, this change adds back pressure to candidate backing but not dropping backed candidates. Due to async backing delayed candidate backing still allows for candidates to be backed in next relay chain block if relay parent does not expire.
Some more details on the subject can be found in this discussion
The text was updated successfully, but these errors were encountered: