Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The etcd(v3.5.2) process often causes the server's iowait to become high #13879

Closed
dotbalo opened this issue Apr 4, 2022 · 6 comments
Closed

Comments

@dotbalo
Copy link

dotbalo commented Apr 4, 2022

What happened?

I deployed a k8s cluster with three master nodes and 15 worker nodes.This cluster runs about 367 nginx pods, just for testing.There are also three etcd nodes to form a cluster. etcd and apiserver are deployed on the same node.Recently, it is often found that the iowait of the master node has become high, and it can be restored to normal after restarting etcd.If I don't restart etcd, iowait will never come down.
I tried to find the root cause of the problem with iostat, top, iotop, lsof, but none of them found the cause, and the problem appears irregularly.

What did you expect to happen?

Hope to find the root cause of iowait rise and fix it.

How can we reproduce it (as minimally and precisely as possible)?

My deployed environment information is as follows:

# uname -a
Linux xxx- k8s-master02 4.20.13-1.el7.elrepo.x86_64 #1 SMP Wed Feb 27 10:02:05 EST 2019 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/redhat-release 
CentOS Linux release 7.9.2009 (Core)

# containerd -v
containerd containerd.io 1.5.11 3df54a852345ae127d1fa3092b95168e4a88e2f8

# etcd --version
etcd Version: 3.5.2
Git SHA: 99018a77b
Go Version: go1.16.3
Go OS/Arch: linux/amd64

# kube-apiserver --version
Kubernetes v1.22.6

# free  -m
              total        used        free      shared  buff/cache   available
Mem:          32164        1785       28140         482        2238       29487

# cat /proc/cpuinfo | grep process
processor: 0
processor: 1
processor: 2
processor: 3
processor: 4
processor: 5
processor: 6
processor: 7

In the etcd data directory, a simple test is performed.
#  dd bs=1M count=2048 if=/dev/zero of=test conv=fdatasync
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 23.5204 s, 91.3 MB/s
From the monitoring point of view this disk does not have any pressure, but the system“/”partition iops is full.

iowait becomes high from time to time, not often. But as long as it appears, the etcd process must be restarted to descend.

Anything else we need to know?

Here are some records I checked:

iostat -x 1
image
It can be seen that the r/s of sda is full, but the etcd data directory is /var/lib/etcd, which is a separate SSD hard disk.

iotop --only
image
Although kube-apiserver and controller are also high at this time, they can be restored after restarting etcd.

top
07bba4877ad631f8ec9376ce91ec853

lsof, 1338 is the main process PID of etcd
image

Check iotop again after restarting etcd
5334d28de6dfc4f1a9847e84e29169d

Etcd version (please run commands below)

$ etcd --version
# paste output here
# etcd --version
etcd Version: 3.5.2
Git SHA: 99018a77b
Go Version: go1.16.3
Go OS/Arch: linux/amd64

$ etcdctl version
# paste output here

etcdctl is not installed

Etcd configuration (command line flags or environment variables)

# paste your configuration here

cat /usr/lib/systemd/system/etcd.service

[Unit]
Description=Etcd Service
Documentation=https://coreos.com/etcd/docs/latest/
After=network.target

[Service]
Type=notify
ExecStart=/usr/local/bin/etcd --config-file=/etc/etcd/etcd.config.yml
Restart=on-failure
RestartSec=10
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
Alias=etcd3.service

cat /etc/etcd/etcd.config.yml

name: 'xxx-k8s-master02-168-0-22'
data-dir: /var/lib/etcd
wal-dir: /var/lib/etcd/wal
snapshot-count: 5000
heartbeat-interval: 100
election-timeout: 1000
quota-backend-bytes: 0
listen-peer-urls: 'https://192.168.0.22:2380'
listen-client-urls: 'https://192.168.0.22:2379,http://127.0.0.1:2379'
max-snapshots: 3
max-wals: 5
cors:
initial-advertise-peer-urls: 'https://192.168.0.22:2380'
advertise-client-urls: 'https://192.168.0.22:2379'
discovery:
discovery-fallback: 'proxy'
discovery-proxy:
discovery-srv:
initial-cluster: 'xxx-k8s-master01-168-0-21=https://10.97.43.21:2380,xxx-k8s-master02-168-0-22=https://192.168.0.22:2380,servicecloud-k
8s-master03-168-0-23=https://10.97.43.23:2380'
initial-cluster-token: 'etcd-k8s-cluster'
initial-cluster-state: 'existing'
strict-reconfig-check: false
enable-v2: true
enable-pprof: true
proxy: 'off'
proxy-failure-wait: 5000
proxy-refresh-interval: 30000
proxy-dial-timeout: 1000
proxy-write-timeout: 5000
proxy-read-timeout: 0
client-transport-security:
cert-file: '/etc/kubernetes/pki/etcd/etcd.pem'
key-file: '/etc/kubernetes/pki/etcd/etcd-key.pem'
client-cert-auth: true
trusted-ca-file: '/etc/kubernetes/pki/etcd/etcd-ca.pem'
auto-tls: true
peer-transport-security:
cert-file: '/etc/kubernetes/pki/etcd/etcd.pem'
key-file: '/etc/kubernetes/pki/etcd/etcd-key.pem'
peer-client-cert-auth: true
trusted-ca-file: '/etc/kubernetes/pki/etcd/etcd-ca.pem'
auto-tls: true
debug: false
log-package-levels:
log-outputs: [default]
force-new-cluster: false

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here
+------------------+---------+------------------------------------+--------------------------+--------------------------+
|        ID        | STATUS  |                NAME                |        PEER ADDRS        |       CLIENT ADDRS       |
+------------------+---------+------------------------------------+--------------------------+--------------------------+
| 58b6a7ae8b3ccc08 | started | xxx-k8s-master03-168-0-23 | https://192.168.0.23:2380 | https://192.168.0.23:2379 |
| a72d40042d3dcfc3 | started | xxx-k8s-master01-168-0-21 | https://192.168.0.21:2380 | https://192.168.0.21:2379 |
| fed575f46661e44c | started | xxx-k8s-master02-168-0-22 | https://192.168.0.22:2380 | https://192.168.0.22:2379 |
+------------------+---------+------------------------------------+--------------------------+--------------------------+

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here
+------------------+------------------+---------+---------+-----------+-----------+------------+
|     ENDPOINT     |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------+------------------+---------+---------+-----------+-----------+------------+
| 192.168.0.21:2379 | a72d40042d3dcfc3 |   3.5.2 |  131 MB |      true |        91 |   19828175 |
| 192.168.0.22:2379 | fed575f46661e44c |   3.5.2 |  131 MB |     false |        91 |   19828175 |
| 192.168.0.23:2379 | 58b6a7ae8b3ccc08 |   3.5.2 |  131 MB |     false |        91 |   19828175 |
+------------------+------------------+---------+---------+-----------+-----------+------------+

Relevant log output

Intercepted part of the log

Apr  4 08:38:41 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:41.744+0800","caller":"etcdserver/v3_server.go:815","msg
":"waiting for ReadIndex response took too long, retrying","sent-request-id":16450664162270192920,"retry-timeout":"500ms"}

Apr  4 08:38:41 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:41.754+0800","caller":"etcdserver/util.go:166","msg":"ap
ply request took too long","took":"747.319986ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/calico/resources/v3/projectca
lico.org/nodes/xxx-k8s-master03-168-0-23\" ","response":"range_response_count:1 size:996"}

Apr  4 08:38:41 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:41.754+0800","caller":"v3rpc/interceptor.go:197","msg":"
request stats","start time":"2022-04-04T08:38:41.053+0800","time spent":"700.63208ms","remote":"192.168.0.23:33062","response type":"/etcdserverpb.KV/Txn
","request count":1,"request size":117,"response count":0,"response size":43,"request content":"compare:<target:MOD key:\"/registry/masterleases/10.97.4
3.23\" mod_revision:16993013 > success:<request_put:<key:\"/registry/masterleases/192.168.0.23\" value_size:67 lease:5747578185647075638 >> failure:<requ
est_range:<key:\"/registry/masterleases/192.168.0.23\" > >"}
Apr  4 08:38:41 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:41.754+0800","caller":"etcdserver/util.go:166","msg":"ap
ply request took too long","took":"699.269271ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/masterleases/\" rang
e_end:\"/registry/masterleases0\" ","response":"range_response_count:3 size:399"}

Apr  4 08:38:41 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:41.754+0800","caller":"v3rpc/interceptor.go:197","msg":"
request stats","start time":"2022-04-04T08:38:41.077+0800","time spent":"676.532394ms","remote":"192.168.0.21:41074","response type":"/etcdserverpb.KV/Tx
n","request count":1,"request size":520,"response count":0,"response size":43,"request content":"compare:<target:MOD key:\"/registry/leases/kube-system/
kube-controller-manager\" mod_revision:16993058 > success:<request_put:<key:\"/registry/leases/kube-system/kube-controller-manager\" value_size:460 >> f
ailure:<request_range:<key:\"/registry/leases/kube-system/kube-controller-manager\" > >"}
Apr  4 08:38:42 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:42.255+0800","caller":"etcdserver/v3_server.go:815","msg
":"waiting for ReadIndex response took too long, retrying","sent-request-id":16450664162270192921,"retry-timeout":"500ms"}
Apr  4 08:38:42 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:42.755+0800","caller":"etcdserver/v3_server.go:815","msg
":"waiting for ReadIndex response took too long, retrying","sent-request-id":16450664162270192921,"retry-timeout":"500ms"}
Apr  4 08:38:42 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:42.828+0800","caller":"etcdserver/v3_server.go:792","msg
":"ignored out-of-date read index response; local node read indexes queueing up and waiting to be in sync with leader","sent-request-id":164506641622701
92921,"received-request-id":16450664162270192920}
Apr  4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:43.255+0800","caller":"etcdserver/v3_server.go:815","msg
":"waiting for ReadIndex response took too long, retrying","sent-request-id":16450664162270192921,"retry-timeout":"500ms"}
Apr  4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"info","ts":"2022-04-04T08:38:43.371+0800","caller":"traceutil/trace.go:171","msg":"tr
ace[51766535] linearizableReadLoop","detail":"{readStateIndex:19779122; appliedIndex:19779122; }","duration":"1.617109134s","start":"2022-04-04T08:38:41
.754+0800","end":"2022-04-04T08:38:43.371+0800","steps":["trace[51766535] 'read index received'  (duration: 1.61709629s)","trace[51766535] 'applied inde
x is now lower than readState.Index'  (duration: 11.126µs)"],"step_count":2}
Apr  4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:43.371+0800","caller":"etcdserver/util.go:166","msg":"ap
ply request took too long","took":"2.036405202s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/management.cattle.io
/tokens/\" range_end:\"/registry/management.cattle.io/tokens0\" count_only:true ","response":"range_response_count:0 size:7"}
Apr  4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"info","ts":"2022-04-04T08:38:43.371+0800","caller":"traceutil/trace.go:171","msg":"tr
ace[250405174] range","detail":"{range_begin:/registry/management.cattle.io/tokens/; range_end:/registry/management.cattle.io/tokens0; response_count:0;
 response_revision:16993077; }","duration":"2.036476756s","start":"2022-04-04T08:38:41.334+0800","end":"2022-04-04T08:38:43.371+0800","steps":["trace[25
0405174] 'agreement among raft nodes before linearized reading'  (duration: 2.036372691s)"],"step_count":1}
Apr  4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:43.371+0800","caller":"etcdserver/util.go:166","msg":"ap
ply request took too long","took":"2.05153119s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/prioritylevelconfigur
ations/node-high\" ","response":"range_response_count:1 size:696"}
Apr  4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"info","ts":"2022-04-04T08:38:43.371+0800","caller":"traceutil/trace.go:171","msg":"tr
ace[1123299582] range","detail":"{range_begin:/registry/prioritylevelconfigurations/node-high; range_end:; response_count:1; response_revision:16993077;
 }","duration":"2.051557477s","start":"2022-04-04T08:38:41.319+0800","end":"2022-04-04T08:38:43.371+0800","steps":["trace[1123299582] 'agreement among r
aft nodes before linearized reading'  (duration: 2.05149392s)"],"step_count":1}



Apr  4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:43.858+0800","caller":"etcdserver/util.go:166","msg":"ap
ply request took too long","took":"490.353508ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/management.cattle.io
/groupmembers/\" range_end:\"/registry/management.cattle.io/groupmembers0\" count_only:true ","response":"range_response_count:0 size:7"}
Apr  4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"info","ts":"2022-04-04T08:38:43.858+0800","caller":"traceutil/trace.go:171","msg":"tr
ace[346636311] range","detail":"{range_begin:/registry/management.cattle.io/groupmembers/; range_end:/registry/management.cattle.io/groupmembers0; respo
nse_count:0; response_revision:16993083; }","duration":"490.375769ms","start":"2022-04-04T08:38:43.368+0800","end":"2022-04-04T08:38:43.858+0800","steps
":["trace[346636311] 'agreement among raft nodes before linearized reading'  (duration: 490.341115ms)"],"step_count":1}
Apr  4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:43.858+0800","caller":"v3rpc/interceptor.go:197","msg":"
request stats","start time":"2022-04-04T08:38:42.991+0800","time spent":"867.248022ms","remote":"192.168.0.21:41106","response type":"/etcdserverpb.KV/Ra
nge","request count":0,"request size":42,"response count":0,"response size":30,"request content":"key:\"/registry/ingress/\" range_end:\"/registry/ingre
ss0\" count_only:true "}
Apr  4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:43.858+0800","caller":"etcdserver/util.go:166","msg":"ap
ply request took too long","took":"2.102507536s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/endpointslices/defau
lt/kubernetes\" ","response":"range_response_count:1 size:524"}
Apr  4 08:38:43 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:38:43.858+0800","caller":"v3rpc/interceptor.go:197","msg":"
request stats","start time":"2022-04-04T08:38:43.368+0800","time spent":"490.412641ms","remote":"192.168.0.22:49194","response type":"/etcdserverpb.KV/Ra
nge","request count":0,"request size":94,"response count":0,"response size":30,"request content":"key:\"/registry/management.cattle.io/groupmembers/\" r
ange_end:\"/registry/management.cattle.io/groupmembers0\" count_only:true "}



Apr  4 08:39:10 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:39:10.989+0800","caller":"v3rpc/interceptor.go:197","msg":"
request stats","start time":"2022-04-04T08:39:09.808+0800","time spent":"1.180906139s","remote":"192.168.0.35:42840","response type":"/etcdserverpb.KV/Ra
nge","request count":0,"request size":81,"response count":1,"response size":1019,"request content":"key:\"/calico/resources/v3/projectcalico.org/nodes/s
ervicecloud-k8s-master03-168-0-23\" "}
Apr  4 08:39:11 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:39:11.360+0800","caller":"etcdserver/v3_server.go:792","msg
":"ignored out-of-date read index response; local node read indexes queueing up and waiting to be in sync with leader","sent-request-id":164506641622701
93073,"received-request-id":16450664162270193066}

Apr  4 08:43:05 xxx-k8s-master02-168-0-22 etcd: {"level":"warn","ts":"2022-04-04T08:43:05.776+0800","caller":"etcdserver/util.go:166","msg":"ap
ply request took too long","took":"18.702242124s","expected-duration":"100ms","prefix":"","request":"header:<ID:14702141606942672209 username:\"etcd\" a
uth_revision:1 > txn:<compare:<target:MOD key:\"/registry/minions/xxx-k8s-node08-168-0-31\" mod_revision:16993950 > success:<request_put:<key:\
"/registry/minions/xxx-k8s-node08-168-0-31\" value_size:4861 >> failure:<>>","response":"size:20"}
@ptabor
Copy link
Contributor

ptabor commented Apr 11, 2022

I'm having trouble understanding the full setup. Is the etcd dataDir on dm-1 ?

I don't see how etcd could cause kube-scheduler to kube-controller to read X MB/s from disk...
Aren't these components in a restart loop loading their binaries ? Could you, please, check what they are doing using strace ?

@stale
Copy link

stale bot commented Jul 12, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 12, 2022
@stale stale bot removed the stale label Sep 7, 2022
@stale
Copy link

stale bot commented Dec 31, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 31, 2022
@stale stale bot closed this as completed Apr 2, 2023
@MohamedShaj
Copy link

@dotbalo have you found the workaround on this?

@dotbalo
Copy link
Author

dotbalo commented Jan 15, 2024

@MohamedShaj It seems that this is not a problem of etcd, but a problem of disk performance. We have replaced the disks with SSDs of the highest performance and increased the storage space, and then run all the processes on the high-performance disks. Since then, iowait has rarely been high.

@MohamedShaj
Copy link

MohamedShaj commented Jan 15, 2024

so after changing the disk with high I/O also, sometimes you see these logs right,

I was facing this logs issue in k3s cluster, I have 3 node clusters where the master and worker are in the same node

in my case, i came to know that etcd data dir should be a separate disk and needs to have a good IO disk and better-performing CPU and memory,

so after all tuning i can see the logs were drastically reduced,for 4 hours once I can see only 2 logs or 3 related readIndex took long time logs,

So just want to understand, some or other reason the logs will be coming right but this will not our node unstable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants