-
Notifications
You must be signed in to change notification settings - Fork 772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase backing and approval timeouts for async backing #1366
Comments
We'd ideally pseudo-remove the gap between backer timeouts and approval checker timesout once we charge backers fr time overruns: #742 In practice there is inherently some gap where the backer earns something for oversized blocks, but in expectation they risk only a minor fee for the time overrun. We discussed the fee curve shape at the team retreat. I'd suggested time overrun fees should be curve fit to the costs of no shows, but you maybe had something more nuanced, or just different. I've forgotten now, maybe in paritytech/polkadot#7239 At present I've no thoughts on doing something different short-term, but yeah maybe a hacky setting makes sense if we want to deploy something faster. |
In my opinion we should start from the amount of work backers should do and work backwards to what collators should use as a timeout. There's no reason to maintain the 4x difference in any case, as the 0.5s timeout on collators pre-async-backing is primarily determined by the relay chain block time and p2p network latency, not execution time. |
I doubt "timeout" is even well defined for collators. An optimized collator would often run quite different code from validiator, but optimized collators become a parachain logic specific problem. We're worried about the gap between backers vs approval checkers here. Alfonso proposed a 6x gap in paritytech/polkadot#1656 We never used his 6x but we ended up with at least the 4x gap @Sophia-Gold mentions. In fact I thought we're currently running some larger gap, like 6x or 8x. We'll pseduo-remove the gap in #742 like I say, but tight now the question is: Can we deploy async backing without #742 but with a smaller gap? I've long forgotten how the developers selected such a large gap, so we'll just need to ask whoever put the gap into the code (maybe you) where it came from, and try to reason from that point.
Is this really the only reason we have the 4x gap? If we only have a 1x gap then we should often trigger validity disputes due to timeouts, under the current scheme. If we've high variance in validator hardware then even a 2x gap could be insufficient, even assuming 100% honesty, although maybe single core performance in not such high variance anymore. Did we not just have testnet failures which required a larger gap? |
We should really separate these concerns and be clear about what we're discussing
This issue was mostly made in the context of (2). We previously had limitations on block building in Cumulus by default which were 0.5s of authoring. However, this timeout was not set with the goal of fulfilling (2) and was majority constrained by other factors. With async backing (2) becomes a direct concern. That's all. (1) is orthogonal - except if we want to make sure that 2s of authoring on collators always falls within the backing limit, in which case the timeouts may need to be adjusted upwards. This is probably the wrong order of operations, IMO.
It's the only reason we have 4x gap between the 0.5s allocated to collators block building and the 2s backing timeout, yes. Building, backing, and propagating within a 6s window required a very short block-building timeout mostly due to network and relay chain block propagation overhead. This is not related to the 6x gap between the 2s allocated to backing and the 12s allocated to approval, as mentioned. I suspect that honest collators building a block out of an on-disk DB for 2s would likely always stay within the 2s execution limit for backers executing out of memory. Collators building out of an in-memory DB or a RAMdisk would need to be more conservative as a larger proportion of their authoring time is spent in RAM or the CPU. |
So, I guess we can close this then. We might end up wanting to increase backing timeout slightly to something like 2.5s at some point, but for now we should be good. |
Since async backing increases execution time for collators from 500ms to 2s, we should also increase the execution time for backers and approval checkers beyond 2s to avoid larger parachain blocks not being backed due to small resource differences, nondeterminism, or cache misses.
The question is by how much. It's unclear whether we need to retain the 4x difference as 8s is quite long. This warrants testing.
The text was updated successfully, but these errors were encountered: