Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop retrying event tracker on context cancellation and timeout #1705

Merged
merged 2 commits into from
Jul 13, 2023

Conversation

sergerad
Copy link
Contributor

@sergerad sergerad commented Jul 6, 2023

Changes

  • Check for context completion in retry forever loops (in case anonymous func doesn't stop on context done)
  • Check for context completion in anonymous funcs passed into retry forever loops where we want to avoid printing an error log in such cases

Background

When we shutdown, the loops that retry forever treat context cancellation as a retryable error. Instead, these loops should exit.

We also log these as ERRORs when they should not be logged at all.

These are the relevant log lines:

message:2023-07-05T23:40:04.443Z [ERROR] polygon.server.polybft.state-sync-manager.event_tracker: failed to sync: error="context canceled"

and

message:2023-07-05T23:39:49.465Z [ERROR] polygon.server.polybft.state-sync-manager.event_tracker: failed to start blocktracker: error="context canceled"

Local Tests

Running with fix, shutdown looks like:

[SIGNAL] Caught signal: terminated
Gracefully shutting down client...
...

Without fix (v1.0.1):

[SIGNAL] Caught signal: terminated
Gracefully shutting down client...
...
2023-07-06T21:34:05.747+1200 [ERROR] polygon.server.polybft.state-sync-manager.event_tracker: failed to start blocktracker: error="context canceled"
2023-07-06T21:34:05.747+1200 [ERROR] polygon.server.polybft.state-sync-manager.event_tracker: failed to sync: error="context canceled"

@sergerad sergerad marked this pull request as draft July 6, 2023 06:48
@sergerad sergerad force-pushed the rm-shutdown-error-log branch 4 times, most recently from c479709 to 60d5ac0 Compare July 6, 2023 09:42
@sergerad sergerad marked this pull request as ready for review July 6, 2023 09:42
@sergerad
Copy link
Contributor Author

sergerad commented Jul 6, 2023

@Stefan-Ethernal could you please review? Fix is not critical but the false ERROR log is a bit annoying for ops. Thanks

@igorcrevar
Copy link
Contributor

I have approved the pr. However, I have also found that currently we have a bug if tt.Sync fails, next time there will be panic because of closing of already closed channel. Please add line (tt.ReadyCh = make(chan struct{})) in code before merge

// Sync concurrently, retrying indefinitely
go common.RetryForever(ctx, time.Second, func(ctx context.Context) error {
        tt.ReadyCh = make(chan struct{}) // re-initialize channel
       
        if err := tt.Sync(ctx); err != nil {
             if common.IsContextDone(err) {

@igorcrevar igorcrevar merged commit affd397 into 0xPolygon:develop Jul 13, 2023
5 of 6 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants