-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic QBFT Timeout #37
base: main
Are you sure you want to change the base?
Conversation
|
||
*Proposals* | ||
|
||
No changes from [SIP#6](https://github.com/bloxapp/SIPs/blob/main/sips/constant_qbft_timeout.md). Doesn't rely on the slot/duty start time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want to change this?
The only issue I see is that we have baseDuration=0
andf preconsensus stage might be slow, but not sure if it is so bad...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the issue is that if precon finished at a late time the first RC message should be for a later round?
Same issue with Aggregator duty (but very unlikely)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds to me like we do need to do something with proposals. , a dynamic timeout will solve an issue where some nodes might receive the block at different times (different beacon nodes) and then start and end the rounds on different times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We start the timer at the duty start time
- We need to configure the timeout parameters to something reasonable
- If you are not the leader of the first round then prepare a full block for the next rounds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@moshe-blox
The issue is for selecting reasonable timeout parameters, if the leader of the first round is bad and timeout is too slow we risk proposing the block late
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proposals won't be considered in this SIP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Work!
sips/dynamic_qbft_timeout.md
Outdated
|
||
```go | ||
var ( | ||
quickTimeoutThreshold = Round(8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually not sure why this is so high...
On Round 6 the slot already ended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so maybe it should be 6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 is a more logical choice for more reasons even:
- For 4 nodes committee we would have 2 nodes doing quick-timeouts twice
- For 7 nodes committee, 6 nodes would do quick timeouts and the last node a slow one. But we managed to circle all the nodes.
- For 13 nodes committee, 7 nodes would be more than half the committee...
Also later, not in this SIP, we may decide that there is no point to keep on doing consensus after the end of the slot, so this would be a preparation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GalRogozinski
agreed, and if this change would be a fork, then i'd suggest to also reduce CutoffRound
down to 1 slot for proposal and sync committee duties to reduce RC spam (afaik, they can't be retried after the slot has ended)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally looks good, I think proposals need a little bit more thinking, I remember @olegshmuelov had something to say about it.
|
||
*Proposals* | ||
|
||
No changes from [SIP#6](https://github.com/bloxapp/SIPs/blob/main/sips/constant_qbft_timeout.md). Doesn't rely on the slot/duty start time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds to me like we do need to do something with proposals. , a dynamic timeout will solve an issue where some nodes might receive the block at different times (different beacon nodes) and then start and end the rounds on different times.
We will take care of proposals in a different place |
Please lets mention in this SIP that it doesn't take into consideration proposals |
The current reference is not enough? |
IMO No. should be part of the SIP itself. |
Changes to the constant timeouts we defined in [SIP#6](https://github.com/bloxapp/SIPs/blob/main/sips/constant_qbft_timeout.md) that should improve performance. | ||
|
||
**Rational** | ||
Currently for all duties, we have a `quickTimeout` of 2 seconds before sending a `RoundChange` message and a `slowTimeout` of 2 minutes starting from the 8th round.The timer starts when the QBFT instance starts. This approach has 2 issues: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify btw why there is a need for slowTimeout
? Is it just purely for qbft or do we have anything in SSV node that would make use of such long timeout (I think most/all duties must finish execution much faster or just will be missed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iurii-ssv It was introduced in https://github.com/ssvlabs/SIPs/pull/22/files, attester duties may be submitted within 32 slots/6.4m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is simply wait for a very long time if after 8 rounds you didn't reach consensus. This is because we are probably in a very high latency environment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is simply wait for a very long time if after 8 rounds you didn't reach consensus. This is because we are probably in a very high latency environment
right, I understand the intent, just trying to figure out how exactly it applies/fits to duties,
in particular I've also left a related comment here
@iurii-ssv It was introduced in https://github.com/ssvlabs/SIPs/pull/22/files, attester duties may be submitted within 32 slots/6.4m
yup, true
at any rate, I always thought the spec should define an interface (not an implementation) everybody wants to conform to and implement it as they see fit for their use-case/application, meaning:
- QBFT shouldn't have to know anything about duty(s), it should be the other way around - duty configures QBFT Instance to work as it needs to; concretely it means, for example, that
qbft/timeout.go
file shouldn't import (know about) any ethereum/ssv-related stuff - currently it does import it to get the access to func(s) that tell it how long a slot lasts etc. (in fact, ideally it doesn't even need to know what a slot is - such parameters should bear names from QBFT domain and duty processor would fill these in as needed) - different duty types should have different
total round count
,round 1 timeout
,round 2 timeout
, ... (seems like these values should not be constants, but rather vars - so it doesn't send the wrong message that "these values are uniform for everyone")
It's not really an issue to worry about, just trying to get some consistency (and knowledge) if possible.
Hey @GalRogozinski, I've been looking into QBFT and round-timeout related workflows/issues recently and have some thoughts / developer feedback on this you might find useful,
not sure what's the up-to-date state/status of this proposal, it does seem like the general approach this proposal takes is sound - but we probably don't want to leave Ethereum block proposer duty out of the loop (because it might inform the overall design), so I recommend we find path(s) we want to take for resolving ssvlabs/ssv#1825 and ssvlabs/ssv#1829 before we commit to changes in this proposal, thoughts? |
// quickTimeoutThreshold is the round after which the timeout duration increases to 2 minutes | ||
quickTimeoutThreshold = Round(6) | ||
// quickTimeout is the timeout in seconds for the first 6 rounds | ||
quickTimeout int64 = 2 // 2 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO time.Duration
is more readable
quickTimeout int64 = 2 // 2 seconds | |
quickTimeout time.Duration = 2 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to move away from using go libs in SIPs
The idea is that the spec shouldn't be for helping implement a node in any language
Also for future SIPs I want to start using python like pseudo-code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my 2 cents,
I would say while we still use Golang might as well do it as Golang prefers to do it (as Nikita mentioned), but the idea to use pseudo-code is the right one ofc (using Python is probably not ideal, but probably better than golang for this kind of thing, as long as we keep away from using advanced Python-specific syntax),
the even more important thing though is that spec "provides interface and not implementation" - I elaborate a bit on it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as we keep away from using advanced Python-specific syntax
That's why I said "python like pseudo code" :-)
spec "provides interface and not implementation"
Currently spec has too many implementation details including full fledged consensus impl.
This is due to legacy reasons and how the project was inherited. We are planning a Markdown spec.
All implementations that we will have will just be used for creating tests.
Hey @iurii-ssv and cc @y0sher So this SIP was supposed to simply document the changes that the SSV team did to the protocol without talking with the spec team (well I was on vacation). It didn't get merged due to shifting priorities. Since it wasn't merged, we can do proposal solutions on this SIP maybe. Btw, there is a PR for your idea of doing an asynchronous call ssvlabs/ssv-spec#371. It wasn't merged due to shifting priorities. Note that RC changes may degrade performance temporarily if not done in a coordinated fork |
Description
New timeout rules.
From the slot start time after
baseDuration
delay wait 2 seconds before emitting aRoundChange
message until the 8th round (inclusive). Afterwards, wait 2 minutes.