Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(lib/babe): add --babe-lead flag, update epoch handling logic #1895

Merged
merged 55 commits into from
Oct 21, 2021
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
f9d3d0c
implement tipSyncer.hasCurrentWorker
noot Oct 11, 2021
f11edb8
don't handle transaction messages when syncing
noot Oct 11, 2021
af49037
fix core tests
noot Oct 11, 2021
da7f10e
add unit test
noot Oct 11, 2021
56136ea
fix grandpa >2/3 check
noot Oct 11, 2021
3449217
lint
noot Oct 11, 2021
04fc7b0
fix grandpa unit tests
noot Oct 12, 2021
701dc09
fix grandpa skipped tests
noot Oct 12, 2021
d839ed9
fix get grandpa-ghost
noot Oct 12, 2021
a58ef35
update tracker, update network to cache/not re-gossip consensus msgs
noot Oct 12, 2021
e411560
merge w other branch
noot Oct 12, 2021
cd71260
re-gossip consensus msgs again
noot Oct 12, 2021
eb87877
Merge branch 'development' into noot/sync-cont
noot Oct 12, 2021
2a7eb15
update logs
noot Oct 13, 2021
3c23837
address comments
noot Oct 13, 2021
e267b53
Merge branch 'noot/sync-cont' of github.com:ChainSafe/gossamer into n…
noot Oct 13, 2021
2cc41c2
Merge branch 'development' of github.com:ChainSafe/gossamer into noot…
noot Oct 13, 2021
7aa48d0
Merge branch 'noot/sync-cont' of github.com:ChainSafe/gossamer into n…
noot Oct 13, 2021
fde6e68
add epoch timeout in invokeBlockAuthoring, add lead bool to BABE, rem…
noot Oct 13, 2021
ea75c91
Merge branch 'development' of github.com:ChainSafe/gossamer into noot…
noot Oct 13, 2021
d13d5fd
fix waiting for first block
noot Oct 13, 2021
70bd312
lint
noot Oct 13, 2021
9345b0e
lint
noot Oct 13, 2021
2294703
Merge branch 'development' of github.com:ChainSafe/gossamer into noot…
noot Oct 13, 2021
2f59f64
cleanup
noot Oct 13, 2021
059e122
Merge branch 'noot/sync-finality' of github.com:ChainSafe/gossamer in…
noot Oct 13, 2021
7539451
add --babe-lead flag
noot Oct 13, 2021
21d2fb3
update log
noot Oct 14, 2021
1bbc646
Merge branch 'noot/sync-finality' of github.com:ChainSafe/gossamer in…
noot Oct 14, 2021
ec5fba9
address comments
noot Oct 15, 2021
674af10
update waitForFirstBlock to return err
noot Oct 15, 2021
67809ae
lint
noot Oct 15, 2021
6c3f134
Merge branch 'noot/sync-finality' of github.com:ChainSafe/gossamer in…
noot Oct 15, 2021
7a7d68c
update dev cfg to have babe-lead=true
noot Oct 15, 2021
edfee2c
Merge branch 'development' into noot/sync-finality
noot Oct 15, 2021
bae7217
merge w development
noot Oct 15, 2021
4fccb85
Merge branch 'noot/sync-finality' of github.com:ChainSafe/gossamer in…
noot Oct 15, 2021
5ce9c8e
fix grandpa round test
noot Oct 15, 2021
05e8e88
merge w base
noot Oct 15, 2021
dc8c892
Merge branch 'development' of github.com:ChainSafe/gossamer into noot…
noot Oct 18, 2021
9719415
fix unit tests
noot Oct 18, 2021
fceafe3
fix integration tests
noot Oct 18, 2021
b0ee396
fix unit tests
noot Oct 19, 2021
9f2e5e7
address comments
noot Oct 19, 2021
c2dce86
lint
noot Oct 19, 2021
bc41726
fix log
noot Oct 19, 2021
7cd77c1
Merge branch 'development' of github.com:ChainSafe/gossamer into noot…
noot Oct 19, 2021
911a7fc
fix tests
noot Oct 19, 2021
398eab2
fix deadlock
noot Oct 19, 2021
57717f4
fix stress tests
noot Oct 19, 2021
486001d
address comments
noot Oct 19, 2021
5b53ca0
address comments
noot Oct 20, 2021
b62dd86
lint
noot Oct 20, 2021
8d9dd14
Merge branch 'development' into noot/sync-babe-fix
noot Oct 20, 2021
bec16d8
Merge branch 'development' into noot/sync-babe-fix
noot Oct 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions chain/dev/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ unlock = ""
roles = 4
babe-authority = true
grandpa-authority = true
babe-lead = true

[network]
port = 7001
Expand Down
1 change: 1 addition & 0 deletions chain/gssmr/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ port = 7001
nobootstrap = false
nomdns = false
discovery-interval = 10
min-peers = 1

[rpc]
enabled = false
Expand Down
2 changes: 2 additions & 0 deletions chain/gssmr/defaults.go
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ var (
DefaultNoBootstrap = false
// DefaultNoMDNS disables mDNS discovery
DefaultNoMDNS = false
// DefaultMinPeers is the default minimum desired peer count
DefaultMinPeers = 1

// DefaultDiscoveryInterval is the default interval for searching for DHT peers
DefaultDiscoveryInterval = time.Second * 10
Expand Down
1 change: 1 addition & 0 deletions cmd/gossamer/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -555,6 +555,7 @@ func setDotCoreConfig(ctx *cli.Context, tomlCfg ctoml.CoreConfig, cfg *dot.CoreC
cfg.Roles = tomlCfg.Roles
cfg.BabeAuthority = tomlCfg.Roles == types.AuthorityRole
cfg.GrandpaAuthority = tomlCfg.Roles == types.AuthorityRole
cfg.BABELead = ctx.GlobalBool(BABELeadFlag.Name)

// check --roles flag and update node configuration
if roles := ctx.GlobalString(RolesFlag.Name); roles != "" {
Expand Down
11 changes: 11 additions & 0 deletions cmd/gossamer/flags.go
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,14 @@ var (
}
)

// BABE flags
var (
BABELeadFlag = cli.BoolFlag{
Name: "babe-lead",
Usage: `specify whether node should build block 1 of the network. only used when starting a new network`,
qdm12 marked this conversation as resolved.
Show resolved Hide resolved
}
)

// flag sets that are shared by multiple commands
var (
// GlobalFlags are flags that are valid for use with the root command and all subcommands
Expand Down Expand Up @@ -366,6 +374,9 @@ var (

// telemetry flags
NoTelemetryFlag,

// BABE flags
BABELeadFlag,
}
)

Expand Down
2 changes: 2 additions & 0 deletions dot/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ type NetworkConfig struct {
type CoreConfig struct {
Roles byte
BabeAuthority bool
BABELead bool
GrandpaAuthority bool
WasmInterpreter string
}
Expand Down Expand Up @@ -183,6 +184,7 @@ func GssmrConfig() *Config {
NoBootstrap: gssmr.DefaultNoBootstrap,
NoMDNS: gssmr.DefaultNoMDNS,
DiscoveryInterval: gssmr.DefaultDiscoveryInterval,
MinPeers: gssmr.DefaultMinPeers,
},
RPC: RPCConfig{
Port: gssmr.DefaultRPCHTTPPort,
Expand Down
1 change: 1 addition & 0 deletions dot/config/toml/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ type CoreConfig struct {
SlotDuration uint64 `toml:"slot-duration,omitempty"`
EpochLength uint64 `toml:"epoch-length,omitempty"`
WasmInterpreter string `toml:"wasm-interpreter,omitempty"`
BabeLead bool `toml:"babe-lead,omitempty"`
}

// RPCConfig is to marshal/unmarshal toml RPC config vars
Expand Down
29 changes: 0 additions & 29 deletions dot/core/service.go
Original file line number Diff line number Diff line change
Expand Up @@ -248,12 +248,6 @@ func (s *Service) handleBlock(block *types.Block, state *rtstorage.TrieState) er
return err
}

// check if block production epoch transitioned
if err := s.handleCurrentSlot(&block.Header); err != nil {
logger.Warn("failed to handle epoch for block", "block", block.Header.Hash(), "error", err)
return err
}

go func() {
s.Lock()
defer s.Unlock()
Expand Down Expand Up @@ -313,29 +307,6 @@ func (s *Service) handleCodeSubstitution(hash common.Hash, state *rtstorage.Trie
return nil
}

func (s *Service) handleCurrentSlot(header *types.Header) error {
head := s.blockState.BestBlockHash()
if header.Hash() != head {
return nil
}

epoch, err := s.epochState.GetEpochForBlock(header)
if err != nil {
return err
}

currEpoch, err := s.epochState.GetCurrentEpoch()
if err != nil {
return err
}

if currEpoch == epoch {
return nil
}

return s.epochState.SetCurrentEpoch(epoch)
}
qdm12 marked this conversation as resolved.
Show resolved Hide resolved

// handleBlocksAsync handles a block asynchronously; the handling performed by this function
// does not need to be completed before the next block can be imported.
func (s *Service) handleBlocksAsync() {
Expand Down
1 change: 1 addition & 0 deletions dot/services.go
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,7 @@ func createBABEService(cfg *Config, st *state.Service, ks keystore.Keystore, cs
BlockImportHandler: cs,
Authority: cfg.Core.BabeAuthority,
IsDev: cfg.Global.ID == "dev",
Lead: cfg.Core.BABELead,
}

if cfg.Core.BabeAuthority {
Expand Down
82 changes: 72 additions & 10 deletions lib/babe/babe.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,14 @@ import (
"github.com/ChainSafe/gossamer/dot/types"
"github.com/ChainSafe/gossamer/lib/crypto/sr25519"
"github.com/ChainSafe/gossamer/lib/runtime"

log "github.com/ChainSafe/log15"
ethmetrics "github.com/ethereum/go-ethereum/metrics"
)

var (
logger log.Logger
initialWaitTime time.Duration
logger log.Logger
firstBlockTimeout = time.Minute
qdm12 marked this conversation as resolved.
Show resolved Hide resolved
)

// Service contains the VRF keys for the validator, as well as BABE configuation data
Expand All @@ -43,6 +44,10 @@ type Service struct {
cancel context.CancelFunc
authority bool
dev bool
// lead is used when setting up a new network from genesis.
// the "lead" node is the node that is designated to build block 1, after which the rest of the nodes
// will sync it and determine the first slot of the network based on that block
qdm12 marked this conversation as resolved.
Show resolved Hide resolved
lead bool

// Storage interfaces
blockState BlockState
Expand Down Expand Up @@ -79,6 +84,7 @@ type ServiceConfig struct {
AuthData []types.Authority
IsDev bool
Authority bool
Lead bool
}

// NewService returns a new Babe Service using the provided VRF keys and runtime
Expand Down Expand Up @@ -119,20 +125,18 @@ func NewService(cfg *ServiceConfig) (*Service, error) {
authority: cfg.Authority,
dev: cfg.IsDev,
blockImportHandler: cfg.BlockImportHandler,
lead: cfg.Lead,
}

epoch, err := cfg.EpochState.GetCurrentEpoch()
if err != nil {
return nil, err
}

err = babeService.setupParameters(cfg)
if err != nil {
if err = babeService.setupParameters(cfg); err != nil {
return nil, err
}

initialWaitTime = babeService.slotDuration * 5

logger.Debug("created service",
"epoch", epoch,
"block producer", cfg.Authority,
Expand All @@ -143,6 +147,11 @@ func NewService(cfg *ServiceConfig) (*Service, error) {
"threshold", babeService.epochData.threshold,
"randomness", babeService.epochData.randomness,
)

if cfg.Lead {
logger.Debug("node designated to build block 1")
}

return babeService, nil
}

Expand Down Expand Up @@ -191,13 +200,41 @@ func (b *Service) Start() error {
return nil
}

// wait a bit to check if we need to sync before initiating
<-time.NewTimer(initialWaitTime).C
go func() {
// if we aren't leading node, wait for first block
if !b.lead {
if err := b.waitForFirstBlock(); err != nil {
logger.Crit("failed to start BABE", "error", err)
}
}

b.initiate()
}()

go b.initiate()
return nil
}

func (b *Service) waitForFirstBlock() error {
ch := b.blockState.GetImportedBlockNotifierChannel()
defer b.blockState.FreeImportedBlockNotifierChannel(ch)

timeout := time.After(firstBlockTimeout)
qdm12 marked this conversation as resolved.
Show resolved Hide resolved
qdm12 marked this conversation as resolved.
Show resolved Hide resolved

// loop until block 1
for {
select {
case block := <-ch:
if block != nil && block.Header.Number.Int64() > 0 {
return nil
}
qdm12 marked this conversation as resolved.
Show resolved Hide resolved
case <-timeout:
return errFirstBlockTimeout
case <-b.ctx.Done():
return nil
qdm12 marked this conversation as resolved.
Show resolved Hide resolved
}
}
}

// SlotDuration returns the current service slot duration in milliseconds
func (b *Service) SlotDuration() uint64 {
return uint64(b.slotDuration.Milliseconds())
Expand Down Expand Up @@ -327,6 +364,12 @@ func (b *Service) invokeBlockAuthoring() error {
return err
}

logger.Debug("initiated epoch",
"threshold", b.epochData.threshold,
"randomness", b.epochData.randomness,
"authorities", b.epochData.authorities,
)

epochStartSlot, err := b.waitForEpochStart(epoch)
if err != nil {
logger.Error("failed to wait for epoch start", "epoch", epoch, "error", err)
Expand Down Expand Up @@ -355,12 +398,25 @@ func (b *Service) invokeBlockAuthoring() error {

logger.Info("current epoch", "epoch", epoch, "slots into epoch", intoEpoch)

// get start slot for current epoch
nextEpochStart, err := b.epochState.GetStartSlotForEpoch(epoch + 1)
if err != nil {
logger.Error("failed to get start slot for next epoch", "epoch", epoch, "error", err)
return err
}

nextEpochStartTime := getSlotStartTime(nextEpochStart, b.slotDuration)
epochDoneCtx, cancel := context.WithDeadline(context.Background(), nextEpochStartTime)
defer cancel()
qdm12 marked this conversation as resolved.
Show resolved Hide resolved

slotDone := make([]<-chan time.Time, b.epochLength-intoEpoch)
for i := 0; i < int(b.epochLength-intoEpoch); i++ {
slotDone[i] = time.After(b.getSlotDuration() * time.Duration(i))
}

for i := 0; i < int(b.epochLength-intoEpoch); i++ {
done := false

select {
case <-b.ctx.Done():
return nil
Expand All @@ -380,6 +436,12 @@ func (b *Service) invokeBlockAuthoring() error {
logger.Warn("failed to handle slot", "slot", slotNum, "error", err)
continue
}
case <-epochDoneCtx.Done():
done = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just break when case <-epochDoneCtx.Done() is achieved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would break the select, not the for loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, right haha

}

if done {
break
}
}

Expand Down Expand Up @@ -489,7 +551,7 @@ func (b *Service) handleSlot(epoch, slotNum uint64) error {
"epoch", epoch,
"slot", slotNum,
)
logger.Debug("built block",
logger.Trace("built block",
qdm12 marked this conversation as resolved.
Show resolved Hide resolved
"header", block.Header,
"body", block.Body,
"parent", parent.Hash(),
Expand Down
1 change: 1 addition & 0 deletions lib/babe/babe_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -323,6 +323,7 @@ func TestService_SlotDuration(t *testing.T) {

func TestService_ProducesBlocks(t *testing.T) {
babeService := createTestService(t, nil)
babeService.lead = true

babeService.epochData.authorityIndex = 0
babeService.epochData.authorities = []types.Authority{
Expand Down
2 changes: 1 addition & 1 deletion lib/babe/epoch.go
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ func (b *Service) initiateEpoch(epoch uint64) error {
randomness: data.Randomness,
authorities: data.Authorities,
authorityIndex: idx,
threshold: b.epochData.threshold,
threshold: b.epochData.threshold, // TODO: threshold might change if authority count changes
}
}

Expand Down
1 change: 1 addition & 0 deletions lib/babe/errors.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ var (
errNilEpochState = errors.New("cannot have nil EpochState")
errInvalidResult = errors.New("invalid error value")
errNoEpochData = errors.New("no epoch data found for upcoming epoch")
errFirstBlockTimeout = errors.New("timed out waiting for first block")

other Other
invalidCustom InvalidCustom
Expand Down
2 changes: 2 additions & 0 deletions lib/babe/state.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ type BlockState interface {
NumberIsFinalised(num *big.Int) (bool, error)
GetRuntime(*common.Hash) (runtime.Instance, error)
StoreRuntime(common.Hash, runtime.Instance)
GetImportedBlockNotifierChannel() chan *types.Block
FreeImportedBlockNotifierChannel(ch chan *types.Block)
}
qdm12 marked this conversation as resolved.
Show resolved Hide resolved

// StorageState interface for storage state methods
Expand Down
2 changes: 1 addition & 1 deletion tests/rpc/rpc_03-chain_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ func TestChainRPC(t *testing.T) {
nodes, err := utils.InitializeAndStartNodes(t, 1, utils.GenesisDev, utils.ConfigDefault)
require.Nil(t, err)

time.Sleep(time.Second) // give server a second to start
time.Sleep(time.Second * 5) // give server a second to start
qdm12 marked this conversation as resolved.
Show resolved Hide resolved

chainBlockHeaderHash := ""
for _, test := range testCases {
Expand Down
10 changes: 10 additions & 0 deletions tests/utils/gossamer_utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ type Node struct {
basePath string
config string
WSPort string
BabeLead bool
qdm12 marked this conversation as resolved.
Show resolved Hide resolved
}

// InitGossamer initialises given node number and returns node reference
Expand Down Expand Up @@ -132,6 +133,10 @@ func StartGossamer(t *testing.T, node *Node, websocket bool) error {
"--rpc",
"--log", "info"}

if node.BabeLead {
params = append(params, "--babe-lead")
}

if node.Idx >= len(KeyList) {
params = append(params, "--roles", "1")
} else {
Expand Down Expand Up @@ -222,6 +227,10 @@ func RunGossamer(t *testing.T, idx int, basepath, genesis, config string, websoc
os.Exit(1)
}

if idx == 0 {
node.BabeLead = true
}

err = StartGossamer(t, node, websocket)
if err != nil {
logger.Crit("could not start gossamer", "error", err)
Expand Down Expand Up @@ -501,6 +510,7 @@ func CreateConfigNoBabe() {
func generateConfigNoGrandpa() *ctoml.Config {
cfg := generateDefaultConfig()
cfg.Core.GrandpaAuthority = false
cfg.Core.BabeLead = true
return cfg
}

Expand Down