spike in ETCD IO: raft apply, peer traffic #7981

armstrongli · 2017-05-24T17:23:45Z

We got peer traffic spike in ETCD cluster:

After reviewing all the logs of the ETCD members, I didn't find out any abnormal logs.

The guess of this situation is that:

All(most of) ETCD followers' raft logs are different from leader's
ETCD leader is sending whole data to all ETCD followers

The data transferred is about 500MB.

All the members are using SSD
All the members are running in the same network
All the servers' time is synced
No leader election.
No client traffic drops
No DB size spike
No memory spike

Every ETCD server has 10 CPUs, and 20GB memory.

gyuho · 2017-05-24T17:27:44Z

Do you have server logs during that spikes? That would be helpful for us to debug. Thanks!

armstrongli · 2017-05-24T17:31:32Z

@gyuho I guess there must be something wrong with the storage side(in source code).
we have encountered such issues:

raft sync, but data different
sudden memory increase until OOM killed
raft spike without any signs
data extremely unbalanced

gyuho · 2017-05-24T17:33:18Z

@armstrongli What's the etcd version? There was memory leak issue, if you use etcd <=3.1.4.

https://github.com/coreos/etcd/blob/master/NEWS#L7

armstrongli · 2017-05-24T17:33:51Z

Yes. it's < 3.1.4.
Thanks. We will upgrade our cluster to >= 3.1.5.

xiang90 · 2017-05-24T17:51:19Z

It seems there are a lot of raft proposals received around 06:46. So leader sent out more data to followers. that is expected. RPC is not increased, but it only tracks v3 rpc. Probably you have v2 application? Please always provide server logs of ALL members.

armstrongli · 2017-05-24T18:07:39Z

Probably you have v2 application?

All the ETCD members are V3, and all the kube-apiservers are using etcd3 as the backend storage

xiang90 · 2017-05-25T02:28:05Z

@armstrongli Then I do not understand where are these proposal came from. Your guess does not seem correct. Clearly it was the proposals causing the peer traffic spikes, not the internal syncing. I still need the log to confirm.

xinxiaogang · 2017-05-25T12:01:36Z

@xiang90 what is the raft proposal exact meaning?

xinxiaogang · 2017-05-25T12:05:25Z

@xiang90 Also, in the presented dashboard, there is also Disk sync duration spike along with Peer traffic spike. So all these caused by Raft proposal spike?
Another questions, will this Disk sync duration spike cause leader election and leader change?

xinxiaogang · 2017-05-25T12:10:22Z

I am in the same organization with @armstrongli , for now, we actually see two major issues with our ETCD cluster:

The Peer traffic spike happens randomly like presented by @armstrongli
Frequently leader change as below (last 12 hours' data)

We are not sure whether these two issues are related or not.

armstrongli · 2017-05-26T02:50:24Z

@xiang90 The leader logs and member logs :

etcd.log.2017-05-23.22.leader.tar.gz
etcd.log.2017-05-23.22.member.tar.gz

xiang90 · 2017-05-26T03:30:16Z

@armstrongli I need the log that corresponds to the graph. The log you provided are from 22:00 to 23:00, not 08:00 to 20:00. Please provide useful information. It is really hard for me to understand what you want and to interpret the data you provided. It is a waste of time for both side if the information is not accurate.

armstrongli · 2017-05-26T04:05:15Z

The graph is in Beijing time. And the log is in UTC time.

armstrongli · 2017-05-26T04:06:09Z

@xiang90 The logs are for the graph I write in the issue description.

xiang90 · 2017-05-26T04:25:48Z

@armstrongli Can you please convert the timestamp so that they matches?

mitake · 2017-05-26T04:28:06Z

Hi @armstrongli @xinxiaogang

I'm still not sure about your case, but in some extreme cases, etcd's batched append strategy would be able to cause high spikes. It is because etcd batches proposals (almost equal to requests from clients) into a single AppendEntries(). This strategy improves both of throughput and latency significantly for ordinary workloads. However, I think that the strategy can make too large delay if the number of proposal in an interval is extremely large because current etcd doesn't provide a parameter for tuning the batching strategy. But please note that I'm not sure about the problem. I've never seen problems like your case.

https://github.com/mitake/etcd/tree/batch-append-limit in this branch you can find a new option --batch-append that takes a single integer as its argument. The number corresponds to the limit of the batched entries. I'm glad if you can try it in your environment and report the result.

@xiang90 how do you think about monitoring the numbers of appended entries? I can work on it.

xiang90 · 2017-05-26T04:30:47Z

@armstrongli

I can confirm trom the log, there was no snapshot transfer or election happened. There is a proposal spike which caused the io spike as I mentioned.

Do you enable auth for the cluster?

xiang90 · 2017-05-26T04:31:43Z

@mitake

If you look at the RPC rate, it is actually very low. 80 rpcs per seconds. but the proposal rate is high. I am wondering if auth retry causes that.

mitake · 2017-05-26T04:34:13Z

@xiang90 ah yes, rpc rate isn't high at all, I missed it. I'll also check the auth retry.

xiang90 · 2017-05-26T04:35:59Z

@mitake Yea... The 80k pending proposal is just crazy. I do not know what triggered it.

xiang90 · 2017-05-26T04:39:03Z

@mitake I believe they run etcd 3.0.15. I remember we fixed something auth retry related? /cc @heyitsanthony

mitake · 2017-05-26T04:52:26Z

@xiang90 39e9b1f would be the change? I looked at it but I'm still not sure it can produce the large pending proposal yet.

armstrongli · 2017-05-26T04:57:39Z

@xiang90

Do you enable auth for the cluster?

No. The auth is not enabled.
cc @mitake @heyitsanthony

xiang90 · 2017-05-26T05:15:30Z

@armstrongli Not sure what was going on then. Something was hitting etcdv2 api pretty hard is my best guess.

armstrongli · 2017-05-26T05:21:50Z

There're no application reaching etcd except kube-apiserver. And all the apiservers are using --storage-backend=etcd3 .
@xiang90 Is there any tool/metrics to catch/show the requests from v2?

armstrongli · 2017-05-26T11:23:25Z

@xiang90 @mitake
our ETCD cluster goes crazy for almost 1 hour without any sign of stop:

Now it's not one spike anymore.

mitake · 2017-05-26T11:39:35Z

@armstrongli if you have a time, could you try this commit: mitake@f83d5a2 ? If you append a new option like this --batch-append 1000, the leader node will submit AppendEntries() RPC forcibly when it has 1000 entries without queuing succeeding proposals.

BTW how many nodes does your k8s cluster have?

xinxiaogang · 2017-05-26T11:42:49Z

@mitake we have 200 nodes, 3000 pods, etc.

xinxiaogang · 2017-05-26T11:44:16Z

@mitake @xiang90 From our monitoring, the client traffic reaches around 10+Mbps for each ETCD node.

xinxiaogang · 2017-05-26T11:47:05Z

So from the dashboard, etcd node keeps doing proposal during that time with large entry commit. How can we know where these proposal from?

xiang90 · 2017-05-26T16:03:13Z

So from the dashboard, etcd node keeps doing proposal during that time with large entry commit. How can we know where these proposal from?

monitor you applications. etcd does not support audit at the moment.

xiang90 · 2017-05-26T16:08:39Z

@xiang90 Is there any tool/metrics to catch/show the requests from v2?

cURL etcd at {endpoint}/metrics. there will be some v2 related metrics:

https://github.com/coreos/etcd/blob/master/Documentation/v2/metrics.md#http-requests

One thing to notice is that the dashboard setup might be incorrect too. So the actual RPC rate might not be what you showed us on the dashboard. We are not sure what really happened still.

xinxiaogang · 2017-05-27T00:30:13Z

monitor you applications. etcd does not support audit at the moment.

Monitor for what metrics of my application? The application talking to etcd is kube-apiserver. We have do audit of our apiserver request, it is normal as usual.

xiang90 · 2017-05-27T01:07:57Z

@xinxiaogang You need to monitor what requests your application sent to etcd, the dababase. not what requests your application received. that part has nothing to do with etcd directly. I have no idea where these proposals coming from. etcd will not generate proposals out of nothing...

again, i am also not sure if the graph you guys provided are accurate. that is something you have to figure out.

we do not really have enough data to help you.

xinxiaogang · 2017-05-27T01:10:24Z

@xiang90 Is there a way or metrics I can get from etcd of the incoming requests from application? I know etcd peers also use http to communicate, how to exclude the peer talk while only leave the application call?

xiang90 · 2017-05-27T01:13:32Z

@xinxiaogang

as i mentioned above, you should read https://github.com/coreos/etcd/blob/master/Documentation/v2/metrics.md#http-requests.

These metrics describe the serving of requests (non-watch events) served by etcd members in non-proxy mode: total incoming requests, request failures and processing latency (inc. raft rounds for storage). They are useful for tracking user-generated traffic hitting the etcd cluster .

it answers your question.

xiang90 · 2017-05-27T01:15:37Z

for v3 metrics, read the doc here https://github.com/coreos/etcd/blob/master/Documentation/metrics.md

xinxiaogang · 2017-05-27T01:17:08Z

We are running out etcd on v3, so we should get the v3 metrics, correct?
RPC related metrics are only in v2?

xinxiaogang · 2017-05-27T01:32:52Z

There is only method GET in the metrics?

# HELP etcd_http_received_total Counter of requests received into the system (successfully parsed and authd).
# TYPE etcd_http_received_total counter
etcd_http_received_total{method="GET"} 1

xiang90 · 2017-05-27T01:40:15Z

@xinxiaogang can you paste all the metrics to here?

xiang90 · 2017-05-27T01:40:21Z

of all your servers

xinxiaogang · 2017-05-27T01:51:07Z

metrics1.txt
metrics2.txt
metrics3.txt
metrics4.txt
metrics5.txt
@xiang90

xiang90 · 2017-06-14T17:50:10Z

we reproduced this on our own testing k8s cluster. check #8096. closing this since it is unhelpful.

armstrongli mentioned this issue May 24, 2017

Sudden memory increase in ETCD members #7964

Closed

heyitsanthony assigned xiang90 Jun 6, 2017

xiang90 closed this as completed Jun 14, 2017

spike in ETCD IO: raft apply, peer traffic #7981

spike in ETCD IO: raft apply, peer traffic #7981

Comments

armstrongli commented May 24, 2017 • edited Loading

gyuho commented May 24, 2017

armstrongli commented May 24, 2017 • edited Loading

gyuho commented May 24, 2017 • edited Loading

armstrongli commented May 24, 2017 • edited Loading

xiang90 commented May 24, 2017

armstrongli commented May 24, 2017 • edited Loading

xiang90 commented May 25, 2017

xinxiaogang commented May 25, 2017

xinxiaogang commented May 25, 2017

xinxiaogang commented May 25, 2017 • edited Loading

armstrongli commented May 26, 2017

xiang90 commented May 26, 2017 • edited Loading

armstrongli commented May 26, 2017

armstrongli commented May 26, 2017

xiang90 commented May 26, 2017

mitake commented May 26, 2017

xiang90 commented May 26, 2017

xiang90 commented May 26, 2017

mitake commented May 26, 2017

xiang90 commented May 26, 2017

xiang90 commented May 26, 2017

mitake commented May 26, 2017

armstrongli commented May 26, 2017

xiang90 commented May 26, 2017

armstrongli commented May 26, 2017

armstrongli commented May 26, 2017

mitake commented May 26, 2017 • edited Loading

xinxiaogang commented May 26, 2017

xinxiaogang commented May 26, 2017

xinxiaogang commented May 26, 2017

xiang90 commented May 26, 2017

xiang90 commented May 26, 2017

xinxiaogang commented May 27, 2017 • edited Loading

xiang90 commented May 27, 2017 • edited Loading

xinxiaogang commented May 27, 2017

xiang90 commented May 27, 2017

xiang90 commented May 27, 2017

xinxiaogang commented May 27, 2017 • edited Loading

xinxiaogang commented May 27, 2017

xiang90 commented May 27, 2017

xiang90 commented May 27, 2017

xinxiaogang commented May 27, 2017 • edited Loading

xiang90 commented Jun 14, 2017

armstrongli commented May 24, 2017 •

edited

Loading

armstrongli commented May 24, 2017 •

edited

Loading

gyuho commented May 24, 2017 •

edited

Loading

armstrongli commented May 24, 2017 •

edited

Loading

armstrongli commented May 24, 2017 •

edited

Loading

xinxiaogang commented May 25, 2017 •

edited

Loading

xiang90 commented May 26, 2017 •

edited

Loading

mitake commented May 26, 2017 •

edited

Loading

xinxiaogang commented May 27, 2017 •

edited

Loading

xiang90 commented May 27, 2017 •

edited

Loading

xinxiaogang commented May 27, 2017 •

edited

Loading

xinxiaogang commented May 27, 2017 •

edited

Loading