Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(server): reduce the screaming heartbeat logs #836

Merged
merged 1 commit into from
Jun 8, 2014

Conversation

philips
Copy link
Contributor

@philips philips commented Jun 7, 2014

Currently the only way we know that a peer isn't getting a heartbeat is an
edge triggered event from go raft on every missed heartbeat. This means that
we need to do some book keeping in order to do exponential backoff.

The upside is that instead of screaming thousands of log lines before a
machine hits the default removal of 30 minutes it is only ~100.

@philips
Copy link
Contributor Author

philips commented Jun 7, 2014

@unihorn @xiangli-cmu

@@ -97,6 +109,7 @@ func NewPeerServer(psConfig PeerServerConfig, client *Client, registry *Registry
serverStats: serverStats,

timeoutThresholdChan: make(chan interface{}, 1),
logBackoffs: make(map[string]*logBackoff),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go fmt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@philips
Copy link
Contributor Author

philips commented Jun 7, 2014

@unihorn Please review 6dc583a.

/cc @xiangli-cmu

@philips
Copy link
Contributor Author

philips commented Jun 7, 2014

@unihorn I am sorry b71282c. I fixed a tight loop by putting the leader check inside the ticker.

/cc @xiangli-cmu

@philips
Copy link
Contributor Author

philips commented Jun 7, 2014

Gah, 23cc856. Need a cup of tea before doing PRs.

@yichengq
Copy link
Contributor

yichengq commented Jun 7, 2014

Good fix.

@yichengq
Copy link
Contributor

yichengq commented Jun 7, 2014

coffee

@philips
Copy link
Contributor Author

philips commented Jun 7, 2014

@unihorn Fixed the unknown thing.

}
log.Infof("%s: warning: heartbeat timed out: '%v'", s.Config.Name, name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philips Can we check the lastActivity here. If the lastActivity is smaller than the next backoff and greater than the previous backoff, we should reset the backoff. So we probably do not need to monitor routine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. That is a better idea.
On Jun 7, 2014 10:37 AM, "Xiang Li" notifications@github.com wrote:

In server/peer_server.go:

    }
  •   log.Infof("%s: warning: heartbeat timed out: '%v'", s.Config.Name, name)
    

@philips https://github.com/philips Can we check the lastActivity here.
If the lastActivity is smaller than the next backoff and greater than the
previous backoff, we should reset the backoff. So we probably do not need
to monitor routine?


Reply to this email directly or view it on GitHub
https://github.com/coreos/etcd/pull/836/files#r13520506.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiangli-cmu fixed in 56a0e6a can you review?

@xiang90
Copy link
Contributor

xiang90 commented Jun 7, 2014

@philips lgtm if it works.

Currently the only way we know that a peer isn't getting a heartbeat is
an edge triggered event from go raft on every missed heartbeat. This
means that we need to do some book keeping in order to do exponential
backoff.

The upside is that instead of screaming thousands of log lines before a
machine hits the default removal of 30 minutes it is only ~100.
philips added a commit that referenced this pull request Jun 8, 2014
fix(server): reduce the screaming heartbeat logs
@philips philips merged commit 868b7f7 into etcd-io:master Jun 8, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants