Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(lib/babe): fix timing for transition between epochs #1636

Merged
merged 20 commits into from
Jun 15, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 14 additions & 20 deletions lib/babe/babe.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ var logger log.Logger
type Service struct {
ctx context.Context
cancel context.CancelFunc
paused bool
authority bool
dev bool

Expand Down Expand Up @@ -228,13 +227,7 @@ func (b *Service) EpochLength() uint64 {
func (b *Service) Pause() error {
b.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock is not required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there might be a chance that Pause() is called when the b.pause channel is already close, thus causing a close of closed channel panic

defer b.Unlock()

if b.paused {
return errors.New("service already paused")
}

b.paused = true
b.pause <- struct{}{}
close(b.pause)
return nil
}

Expand All @@ -243,22 +236,33 @@ func (b *Service) Resume() error {
b.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acquire lock after checking b.IsPaused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function should be atomic in the case that Pause or Resume are called concurrently by multiple processes, process A might see it's not paused, acquire the lock, then do b.initiate() but in between seeing it's not paused an acquiring the lock another process might also see the same thing, thus causing b.initiate() to be called twice which would be bad

defer b.Unlock()

if !b.paused {
if !b.IsPaused() {
return nil
}

b.pause = make(chan struct{})

epoch, err := b.epochState.GetCurrentEpoch()
if err != nil {
logger.Error("failed to get current epoch", "error", err)
return err
}

b.paused = false
go b.initiate(epoch)
logger.Info("service resumed", "epoch", epoch)
return nil
}

// IsPaused returns if the service is paused or not (ie. producing blocks)
func (b *Service) IsPaused() bool {
select {
case <-b.pause:
return true
default:
return false
}
}

// Stop stops the service. If stop is called, it cannot be resumed.
func (b *Service) Stop() error {
b.Lock()
Expand Down Expand Up @@ -301,13 +305,6 @@ func (b *Service) IsStopped() bool {
return b.ctx.Err() != nil
}

// IsPaused returns if the service is paused or not (ie. producing blocks)
func (b *Service) IsPaused() bool {
b.RLock()
defer b.RUnlock()
return b.paused
}

func (b *Service) safeSend(msg types.Block) error {
b.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to take lock here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return errors.New("service has been stopped")

S -> s
Error starts with smallcase letters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for your first comment, the lock is here so there isn't a send on a closed channel. maybe there is a better way to implement that? either way, I think I will be removing this section anyways and doing a refactor

defer b.Unlock()
Expand Down Expand Up @@ -413,9 +410,6 @@ func (b *Service) invokeBlockAuthoring(epoch uint64) error {
// resume it when ready
if b.epochLength <= intoEpoch && !b.dev {
logger.Debug("pausing BABE, need to sync", "slots into epoch", intoEpoch, "startSlot", startSlot, "epochStart", epochStart)
go func() {
<-b.pause
}()
return b.Pause()
}

Expand Down