Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log flood when node down #595

Closed
Soulou opened this issue Feb 23, 2014 · 7 comments
Closed

Log flood when node down #595

Soulou opened this issue Feb 23, 2014 · 7 comments

Comments

@Soulou
Copy link

Soulou commented Feb 23, 2014

After setting up a cluster ≥ 3 nodes, if one of these node is down, the leader is completely flooded by log messages (~20-30/sec)

[etcd] Feb 23 16:31:02.232 INFO      | node2: warning: heartbeat timed out: 'node1'

I don't think these messages are useful, at least the frequency should be different. How long does it need for a cluster to consider that a node is completely down/not part of the cluster anymore?

@xiang90
Copy link
Contributor

xiang90 commented Feb 23, 2014

@Soulou We do not remove a node automatically for now. But I do think we should clean up the logs.
For the heartbeat probing, we can do back-off too.

@Soulou
Copy link
Author

Soulou commented Feb 23, 2014

Yes because it can be annoying, hundreds of MB per day, when a node is considered as down, it's too much.

So far, the only way to remove a node is to rebuild the cluster without it? It's a bit heavy oO

@xiang90
Copy link
Contributor

xiang90 commented Feb 23, 2014

@Soulou You can remove it by sent HTTP DELETE request to http://leader:7001/remove/[nodename]. We do not recommend this since there is a race issue we need to fix.

@philips
Copy link
Contributor

philips commented Feb 23, 2014

We need to add exponential back off on this log entry.
On Feb 23, 2014 9:01 AM, "Xiang Li" notifications@github.com wrote:

@Soulou https://github.com/Soulou You can remove it by sent delete
request to http://leader:7001/remove/[nodename]. We do not recommend this
since there is a race issue we need to fix.

Reply to this email directly or view it on GitHubhttps://github.com//issues/595#issuecomment-35836425
.

@Soulou
Copy link
Author

Soulou commented Feb 23, 2014

Great, thank you for these pieces of information. I'll keep looking for what will happen!

@Asmod4n
Copy link
Contributor

Asmod4n commented Apr 7, 2014

Any news on this? Was planning to use a Laptop as a cluster node.

@yichengq
Copy link
Contributor

fixed in #836.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants