Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[agent] debug logs for session, node events on dispatcher, heartbeats #2486

Merged
merged 1 commit into from
Jan 24, 2018

Conversation

anshulpundir
Copy link
Contributor

Added info logs to the agent to track the manager its connecting to, timeouts, heartbeat from the dispatcher.

Signed-off-by: Anshul Pundir anshul.pundir@docker.com

agent/agent.go Outdated
backoff = initialSessionFailureBackoff + 2*backoff
if backoff > maxSessionFailureBackoff {
backoff = maxSessionFailureBackoff
}
log.G(ctx).WithError(err).Errorf("agent: session failed. Backoff period: %d", backoff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a field here: log.G(ctx).WithError(err).WithField("backoff", backoff).Errorf("agent: session failed").

Note this is the maximum backoff range for a random backoff. Actual backoff is rand([0, backoff)). ;)

@@ -1094,6 +1094,7 @@ func (d *Dispatcher) Heartbeat(ctx context.Context, r *api.HeartbeatRequest) (*a
}

period, err := d.nodes.Heartbeat(nodeInfo.NodeID, r.SessionID)
log.G(ctx).WithField("dispatcher", "heartbeat").Infof("agent heartbeat period %v", period)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to use the module here, rather than a field.

@codecov
Copy link

codecov bot commented Jan 17, 2018

Codecov Report

Merging #2486 into master will increase coverage by 0.19%.
The diff coverage is 72.41%.

@@            Coverage Diff             @@
##           master    #2486      +/-   ##
==========================================
+ Coverage   61.23%   61.42%   +0.19%     
==========================================
  Files          49      129      +80     
  Lines        6890    21313   +14423     
==========================================
+ Hits         4219    13092    +8873     
- Misses       2241     6812    +4571     
- Partials      430     1409     +979

@@ -58,6 +60,8 @@ func (b *Broker) Select(dialOpts ...grpc.DialOption) (*Conn, error) {
// connection.
func (b *Broker) SelectRemote(dialOpts ...grpc.DialOption) (*Conn, error) {
peer, err := b.remotes.Select()
log.G(context.Background()).Infof("Manager selected by agent for session: %v", peer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: all messages start with lowercase.

@anshulpundir anshulpundir force-pushed the worker_logs branch 2 times, most recently from 5bf93bc to 8f1bb1b Compare January 17, 2018 22:17
@anshulpundir anshulpundir changed the title [agent] Debug logs [agent] debug logs for session, node events on dispatcher, heartbeats Jan 17, 2018
@anshulpundir anshulpundir force-pushed the worker_logs branch 3 times, most recently from e0e280f to f822e10 Compare January 23, 2018 00:58
@anshulpundir
Copy link
Contributor Author

ping @stevvooe @dperny for review

@nishanttotla
Copy link
Contributor

@anshulpundir CI is failing with a bunch of data races.

@anshulpundir
Copy link
Contributor Author

Will check it out tomorrow @nishanttotla

agent/session.go Outdated
resp, err := client.Heartbeat(heartbeatCtx, &api.HeartbeatRequest{
SessionID: s.sessionID,
})
cancel()
if err != nil {
if grpc.Code(err) == codes.NotFound {
log.G(ctx).WithFields(fields).WithError(err).Errorf("heartbeat failed")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to add here again the manager details to make it easier to identify it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

@@ -1094,6 +1108,8 @@ func (d *Dispatcher) Heartbeat(ctx context.Context, r *api.HeartbeatRequest) (*a
}

period, err := d.nodes.Heartbeat(nodeInfo.NodeID, r.SessionID)

log.G(ctx).WithField("method", "(*Dispatcher).Heartbeat").Infof("received heartbeat from worker %v, expect next heartbeat in %v", nodeInfo, period)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this one debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its every 5 seconds, I thought thats not too frequent for logging ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's per node in the cluster, so can easily be hundred every 5 sec :D

Copy link
Contributor Author

@anshulpundir anshulpundir Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes, I mistook this for the log on the agent.

@anshulpundir anshulpundir force-pushed the worker_logs branch 3 times, most recently from f8af5f4 to fc5a878 Compare January 23, 2018 23:39
agent/session.go Outdated
resp, err := client.Heartbeat(heartbeatCtx, &api.HeartbeatRequest{
SessionID: s.sessionID,
})
cancel()
if err != nil {
if grpc.Code(err) == codes.NotFound {
log.G(ctx).WithFields(fields).WithError(err).Errorf("heartbeat to manager %v failed", s.conn.Peer())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to move this one line above, so it's printed no matter from the grpc code?

Signed-off-by: Anshul Pundir <anshul.pundir@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants