-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3 nodes etcd cluster with two node in high cpu usage #11012
Comments
logs of the etcd instances would be helpful here. |
about 2019-08-09 10:50, the cluster switch to unstable mode, leader node changes frequently, as seeing log detail below. we enable debug log on etcd1 : 2019-08-09 11:54:06.674422 [pkg_logger.go:147] D | etcdserver/api/v2http: [GET] /metrics remote:prometheus:40558 --------error log on etcd 2, lots of go routine error--------- |
log maybe the result, not the reason。you can perf top -p pid to deal with cpu hot trouble。 |
Can some body help this issue? $ perf top -g
|
As a sanity check, make sure all 3 etcd servers are up and running, and they are connected to each other. |
Every time we restart the cluster, each node are up and running and they are connected to each other. But several day later, CPU overload problem will make node failed to receive heart beat from each other. The only thing we can do is restarting the cluster, but it can't help decreasing the high cpu and virtual memory usage. |
this bug fixed in [#10218] |
thx, this problem doesn't reappear since we update our etcd cluster. |
@weopm May I know if etcd 3.3.12 still has the issue? We have similar high CPU usage issue on etcd 3.3.12. Thanks! |
this bug is still exit on etcd 3.3.18. Maybe you can do a patch with (https://github.com/etcd-io/etcd/pull/10218/files#diff-a5a4bca15b031f18356513fe1382c3c7L560) or update to etcd 4.4.3. |
@weopm thanks for the quick reply, to double confirm, should it be 3.4.3 has the fix? |
this bug is fixed on etcd 4.4.3, not 3.4.3. you can review and confirm the code by yourself with (https://github.com/etcd-io/etcd/pull/10218/files#diff-a5a4bca15b031f18356513fe1382c3c7L560). |
@weopm - We're seeing this issue in production workloads on 3.3.12. I noticed that there was a cherry-pick commit for 3.4, which was merged, and a cherry-pick for 3.3, which was closed and not merged. Is it possible to get this fix into 3.3 and create a new patch release? I do see it's in the v3.4.3 tag as well: https://github.com/etcd-io/etcd/blob/v3.4.3/etcdserver/v3_server.go#L607 |
PR to cherry-pick into 3.3: #11378 |
system aws ec2, Linux version 4.14.128-112.105 x86_64
etcd version 3.3.11
start up configuration
top info
this happen the next day after I change root passwd through gateway:
echo "newpass"|etcdctl user passwd root --user="root:root" --interactive=false --endpoints="http://gateway:23790" (though this may not the direct cause)
The text was updated successfully, but these errors were encountered: