Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic QBFT Timeout #37

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Dynamic QBFT Timeout #37

wants to merge 5 commits into from

Conversation

GalRogozinski
Copy link
Contributor

Description

New timeout rules.

From the slot start time after baseDuration delay wait 2 seconds before emitting a RoundChange message until the 8th round (inclusive). Afterwards, wait 2 minutes.

@GalRogozinski GalRogozinski changed the title New-qbft-timeout Dynamic QBFT Timeout Dec 25, 2023

*Proposals*

No changes from [SIP#6](https://github.com/bloxapp/SIPs/blob/main/sips/constant_qbft_timeout.md). Doesn't rely on the slot/duty start time.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to change this?
The only issue I see is that we have baseDuration=0 andf preconsensus stage might be slow, but not sure if it is so bad...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the issue is that if precon finished at a late time the first RC message should be for a later round?

Same issue with Aggregator duty (but very unlikely)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds to me like we do need to do something with proposals. , a dynamic timeout will solve an issue where some nodes might receive the block at different times (different beacon nodes) and then start and end the rounds on different times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moshe-blox

  1. We start the timer at the duty start time
  2. We need to configure the timeout parameters to something reasonable
  3. If you are not the leader of the first round then prepare a full block for the next rounds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moshe-blox
The issue is for selecting reasonable timeout parameters, if the leader of the first round is bad and timeout is too slow we risk proposing the block late

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proposals won't be considered in this SIP

Copy link
Contributor

@MatheusFranco99 MatheusFranco99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Work!


```go
var (
quickTimeoutThreshold = Round(8)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not sure why this is so high...
On Round 6 the slot already ended

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so maybe it should be 6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 is a more logical choice for more reasons even:

  1. For 4 nodes committee we would have 2 nodes doing quick-timeouts twice
  2. For 7 nodes committee, 6 nodes would do quick timeouts and the last node a slow one. But we managed to circle all the nodes.
  3. For 13 nodes committee, 7 nodes would be more than half the committee...

Also later, not in this SIP, we may decide that there is no point to keep on doing consensus after the end of the slot, so this would be a preparation...

Copy link
Contributor

@moshe-blox moshe-blox Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GalRogozinski
agreed, and if this change would be a fork, then i'd suggest to also reduce CutoffRound down to 1 slot for proposal and sync committee duties to reduce RC spam (afaik, they can't be retried after the slot has ended)

Copy link
Contributor

@MatheusFranco99 MatheusFranco99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well changed

Copy link
Contributor

@y0sher y0sher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good, I think proposals need a little bit more thinking, I remember @olegshmuelov had something to say about it.


*Proposals*

No changes from [SIP#6](https://github.com/bloxapp/SIPs/blob/main/sips/constant_qbft_timeout.md). Doesn't rely on the slot/duty start time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds to me like we do need to do something with proposals. , a dynamic timeout will solve an issue where some nodes might receive the block at different times (different beacon nodes) and then start and end the rounds on different times.

@GalRogozinski
Copy link
Contributor Author

@y0sher

We will take care of proposals in a different place

@y0sher
Copy link
Contributor

y0sher commented Mar 24, 2024

@y0sher

We will take care of proposals in a different place

Please lets mention in this SIP that it doesn't take into consideration proposals

@GalRogozinski
Copy link
Contributor Author

@y0sher
Copy link
Contributor

y0sher commented Mar 24, 2024

@y0sher

The current reference is not enough? https://github.com/bloxapp/SIPs/pull/37/files#diff-cd9cb4d346903b051ff59aa32693b26216634cfcbface288021be06b912f2e82R66-R68

IMO No. should be part of the SIP itself.

Changes to the constant timeouts we defined in [SIP#6](https://github.com/bloxapp/SIPs/blob/main/sips/constant_qbft_timeout.md) that should improve performance.

**Rational**
Currently for all duties, we have a `quickTimeout` of 2 seconds before sending a `RoundChange` message and a `slowTimeout` of 2 minutes starting from the 8th round.The timer starts when the QBFT instance starts. This approach has 2 issues:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify btw why there is a need for slowTimeout ? Is it just purely for qbft or do we have anything in SSV node that would make use of such long timeout (I think most/all duties must finish execution much faster or just will be missed)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iurii-ssv It was introduced in https://github.com/ssvlabs/SIPs/pull/22/files, attester duties may be submitted within 32 slots/6.4m

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is simply wait for a very long time if after 8 rounds you didn't reach consensus. This is because we are probably in a very high latency environment

Copy link

@iurii-ssv iurii-ssv Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is simply wait for a very long time if after 8 rounds you didn't reach consensus. This is because we are probably in a very high latency environment

right, I understand the intent, just trying to figure out how exactly it applies/fits to duties,

in particular I've also left a related comment here

@iurii-ssv It was introduced in https://github.com/ssvlabs/SIPs/pull/22/files, attester duties may be submitted within 32 slots/6.4m

yup, true

at any rate, I always thought the spec should define an interface (not an implementation) everybody wants to conform to and implement it as they see fit for their use-case/application, meaning:

  • QBFT shouldn't have to know anything about duty(s), it should be the other way around - duty configures QBFT Instance to work as it needs to; concretely it means, for example, that qbft/timeout.go file shouldn't import (know about) any ethereum/ssv-related stuff - currently it does import it to get the access to func(s) that tell it how long a slot lasts etc. (in fact, ideally it doesn't even need to know what a slot is - such parameters should bear names from QBFT domain and duty processor would fill these in as needed)
  • different duty types should have different total round count, round 1 timeout, round 2 timeout, ... (seems like these values should not be constants, but rather vars - so it doesn't send the wrong message that "these values are uniform for everyone")

It's not really an issue to worry about, just trying to get some consistency (and knowledge) if possible.

@iurii-ssv
Copy link

iurii-ssv commented Oct 29, 2024

Hey @GalRogozinski, I've been looking into QBFT and round-timeout related workflows/issues recently and have some thoughts / developer feedback on this you might find useful,

  • it does seem like different duties need different timeout values
  • but also, different rounds need different timeout values
  • needless to say - all operators must have same configuration of those timeout values, but also (like you mentioned above) we very much want to start all operator QBFT instances "at the same time" (if possible) which in practice can be done by anchoring to some deterministic function - I expand on it here while mentioning how it relates to Ethereum block proposer duty

not sure what's the up-to-date state/status of this proposal, it does seem like the general approach this proposal takes is sound - but we probably don't want to leave Ethereum block proposer duty out of the loop (because it might inform the overall design), so I recommend we find path(s) we want to take for resolving ssvlabs/ssv#1825 and ssvlabs/ssv#1829 before we commit to changes in this proposal, thoughts?

// quickTimeoutThreshold is the round after which the timeout duration increases to 2 minutes
quickTimeoutThreshold = Round(6)
// quickTimeout is the timeout in seconds for the first 6 rounds
quickTimeout int64 = 2 // 2 seconds

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO time.Duration is more readable

Suggested change
quickTimeout int64 = 2 // 2 seconds
quickTimeout time.Duration = 2 * time.Second

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to move away from using go libs in SIPs
The idea is that the spec shouldn't be for helping implement a node in any language

Also for future SIPs I want to start using python like pseudo-code

Copy link

@iurii-ssv iurii-ssv Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my 2 cents,

I would say while we still use Golang might as well do it as Golang prefers to do it (as Nikita mentioned), but the idea to use pseudo-code is the right one ofc (using Python is probably not ideal, but probably better than golang for this kind of thing, as long as we keep away from using advanced Python-specific syntax),

the even more important thing though is that spec "provides interface and not implementation" - I elaborate a bit on it here

Copy link
Contributor Author

@GalRogozinski GalRogozinski Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as we keep away from using advanced Python-specific syntax

That's why I said "python like pseudo code" :-)

spec "provides interface and not implementation"

Currently spec has too many implementation details including full fledged consensus impl.
This is due to legacy reasons and how the project was inherited. We are planning a Markdown spec.
All implementations that we will have will just be used for creating tests.

@GalRogozinski
Copy link
Contributor Author

GalRogozinski commented Oct 30, 2024

Hey @iurii-ssv and cc @y0sher

So this SIP was supposed to simply document the changes that the SSV team did to the protocol without talking with the spec team (well I was on vacation). It didn't get merged due to shifting priorities.

Since it wasn't merged, we can do proposal solutions on this SIP maybe.
But currently, I prefer to merge it as it is along with the accompanying spec PR ssvlabs/ssv-spec#352. This is because it is hanging for a long time.

Btw, there is a PR for your idea of doing an asynchronous call ssvlabs/ssv-spec#371. It wasn't merged due to shifting priorities.

Note that RC changes may degrade performance temporarily if not done in a coordinated fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants