Very frequent Leader Election with High compaction time? #14071

iamejboy · 2022-05-25T09:40:36Z

What happened?

Frequent leader changed with no log of "MsgTimeoutNow" from leader from follower, or "rafthttp: lost the TCP streaming connection with peer". A follower would just initialize an election, replacing a leader. Also seen some random "grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing" and more than 5s compaction

What did you expect to happen?

Should see leader timeout from follower point of view (log)
Less or no leader changed
No grpc error

How can we reproduce it (as minimally and precisely as possible)?

Cannot reproduce at the moment.

Anything else we need to know?

etcd is running with around 1k worker node

Etcd version (please run commands below)

$ etcd --version
3.4.3

$ etcdctl version
3.4.3

Etcd configuration (command line flags or environment variables)

/usr/local/bin/etcd --name host-000005 --initial-advertise-peer-urls https://host-000005:2380 --listen-peer-urls https://0.0.0.0:2380 --listen-client-urls https://0.0.0.0:2379 --advertise-client-urls https://host-000005:2379 --initial-cluster host-000000=https://host-000000:2380,host-000001=https://host-000001:2380,host-000002=https://host-000002:2380,host-000003=https://host-000003:2380,host-000004=https://host-000004:2380,host-000005=https://host-000005:2380,host-000006=https://host-000006:2380 --initial-cluster-token host --initial-cluster-state existing --data-dir /var/etcd/data --quota-backend-bytes=8388608000 --client-cert-auth --trusted-ca-file=/etc/ssl/ca.pem --cert-file=/etc/ssl/etcd.pem --key-file=/etc/ssl/etcd-key.pem --peer-client-cert-auth --peer-trusted-ca-file=/etc/ssl/ca.pem --peer-cert-file=/etc/ssl/etcd.pem --peer-key-file=/etc/ssl/etcd-key.pem --cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
6e1f41eb0fc907d3, started, host-000004, https://host-000004:2380, https://host-000004.:2379
74af892fcb613a88, started, host-000006, https://host-000006:2380, https://host-000006.:2379
9d80f22925f0ae8c, started, host-000001, https://host-000001:2380, https://host-000001.:2379
b33dba86a0fb4825, started, host-000002, https://host-000002:2380, https://host-000002.:2379
d078cc1c696b11c9, started, host-000003, https://host-000003:2380, https://host-000003.:2379
e9e6c9183f59bf86, started, host-000000, https://host-000000:2380, https://host-000000.:2379
ef747a0c48e5d248, started, host-000005, https://host-000005:2380, https://host-000005.:2379/

$ etcdctl --endpoints=<member list> endpoint status -w table
+---------------------------------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
|                              ENDPOINT                               |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+---------------------------------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://host-000004:2379 | 6e1f41eb0fc907d3 |   3.4.3 |  2.0 GB |     false |      9002 | 5905669295 |
| https://host-000006:2379 | 74af892fcb613a88 |   3.4.3 |  2.0 GB |     false |      9002 | 5905669295 |
| https://host-000001:2379 | 9d80f22925f0ae8c |   3.4.3 |  2.0 GB |     false |      9002 | 5905669295 |
| https://host-000002:2379 | b33dba86a0fb4825 |   3.4.3 |  2.0 GB |     false |      9002 | 5905669295 |
| https://host-000003:2379 | d078cc1c696b11c9 |   3.4.3 |  2.0 GB |     false |      9002 | 5905669296 |
| https://host-000000:2379 | e9e6c9183f59bf86 |   3.4.3 |  2.0 GB |     false |      9002 | 5905669297 |
| https://host-000005:2379 | ef747a0c48e5d248 |   3.4.3 |  2.0 GB |      true |      9002 | 5905669297 |
+---------------------------------------------------------------------+------------------+---------+---------+-----------+-----------+------------+

Relevant log output

-----------------------------------
Instance host-000004 lost leadership
-----------------------------------

host-000004 (Lost leadership)

022-05-25 04:54:14.866592 I | mvcc: finished scheduled compaction at 5818195527 (took 5.377953822s)
2022-05-25 04:54:31.938448 I | etcdserver: start to snapshot (applied: 5902424075, lastsnap: 5902324073)
2022-05-25 04:54:31.963711 I | etcdserver: saved snapshot at index 5902424075
2022-05-25 04:54:31.963836 I | etcdserver: compacted raft log at 5902419075
2022-05-25 04:54:36.858578 I | pkg/fileutil: purged file /var/etcd/data/member/snap/0000000000002329-000000015fc836e4.snap successfully
2022-05-25 04:54:45.843793 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058e2d-000000015fcfe31f.wal is created
2022-05-25 04:54:46.786311 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058e25-000000015fce2d81.wal successfully
2022-05-25 04:54:46.790929 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058e26-000000015fce6554.wal successfully
2022-05-25 04:54:46.795588 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058e27-000000015fce9b95.wal successfully
2022-05-25 04:54:46.800170 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058e28-000000015fced1fb.wal successfully
2022-05-25 04:55:52.710821 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058e2e-000000015fd0197c.wal is created
2022-05-25 04:56:16.805639 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058e29-000000015fcf06f7.wal successfully
2022-05-25 04:57:03.624697 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058e2f-000000015fd05237.wal is created
2022-05-25 04:57:16.810828 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058e2a-000000015fcf3e39.wal successfully
raft2022/05/25 04:57:26 INFO: 6e1f41eb0fc907d3 [logterm: 9001, index: 5902460075, vote: 6e1f41eb0fc907d3] ignored MsgVote from ef747a0c48e5d248 [logterm: 9001, index: 5902459946] at term 9001: lease is not expired (remaining ticks: 3)
raft2022/05/25 04:57:26 INFO: 6e1f41eb0fc907d3 [term: 9001] received a MsgApp message with higher term from ef747a0c48e5d248 [term: 9002]
raft2022/05/25 04:57:26 INFO: 6e1f41eb0fc907d3 became follower at term 9002
raft2022/05/25 04:57:26 INFO: found conflict at index 5902459947 [existing term: 9001, conflicting term: 9002]
raft2022/05/25 04:57:26 INFO: truncate the unstable entries before index 5902459947
raft2022/05/25 04:57:26 INFO: raft.node: 6e1f41eb0fc907d3 changed leader from 6e1f41eb0fc907d3 to ef747a0c48e5d248 at term 9002

-------------
host-000000

2022-05-25 04:54:10.002279 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058e20-000000015fcea9a9.wal successfully
2022-05-25 04:54:14.885924 I | mvcc: finished scheduled compaction at 5818195527 (took 5.3921232s)
2022-05-25 04:55:04.080798 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058e26-000000015fcff18d.wal is created
2022-05-25 04:55:10.007123 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058e21-000000015fcede88.wal successfully
2022-05-25 04:56:11.714987 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058e27-000000015fd02947.wal is created
2022-05-25 04:57:21.155467 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058e28-000000015fd0600a.wal is created
raft2022/05/25 04:57:26 INFO: e9e6c9183f59bf86 [term: 9001] received a MsgVote message with higher term from ef747a0c48e5d248 [term: 9002]
raft2022/05/25 04:57:26 INFO: e9e6c9183f59bf86 became follower at term 9002
raft2022/05/25 04:57:26 INFO: e9e6c9183f59bf86 [logterm: 9001, index: 5902459946, vote: 0] cast MsgVote for ef747a0c48e5d248 [logterm: 9001, index: 5902459946] at term 9002
raft2022/05/25 04:57:26 INFO: raft.node: e9e6c9183f59bf86 lost leader 6e1f41eb0fc907d3 at term 9002
raft2022/05/25 04:57:26 INFO: raft.node: e9e6c9183f59bf86 elected leader ef747a0c48e5d248 at term 9002

----------
host-000001

raft2022/05/25 04:57:26 INFO: 9d80f22925f0ae8c [term: 9001] received a MsgVote message with higher term from ef747a0c48e5d248 [term: 9002]
raft2022/05/25 04:57:26 INFO: 9d80f22925f0ae8c became follower at term 9002
raft2022/05/25 04:57:26 INFO: 9d80f22925f0ae8c [logterm: 9001, index: 5902459946, vote: 0] cast MsgVote for ef747a0c48e5d248 [logterm: 9001, index: 5902459946] at term 9002
raft2022/05/25 04:57:26 INFO: raft.node: 9d80f22925f0ae8c lost leader 6e1f41eb0fc907d3 at term 9002
raft2022/05/25 04:57:26 INFO: raft.node: 9d80f22925f0ae8c elected leader ef747a0c48e5d248 at term 9002

----------
host-000002

2022-05-25 04:54:54.606781 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058e41-000000015fcfea20.wal is created
WARNING: 2022/05/25 04:55:54 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2022-05-25 04:56:03.791002 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058e42-000000015fd0236f.wal is created
2022-05-25 04:57:13.297482 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058e43-000000015fd05a6b.wal is created
raft2022/05/25 04:57:26 INFO: b33dba86a0fb4825 [term: 9001] received a MsgVote message with higher term from ef747a0c48e5d248 [term: 9002]
raft2022/05/25 04:57:26 INFO: b33dba86a0fb4825 became follower at term 9002
raft2022/05/25 04:57:26 INFO: b33dba86a0fb4825 [logterm: 9001, index: 5902459946, vote: 0] cast MsgVote for ef747a0c48e5d248 [logterm: 9001, index: 5902459946] at term 9002
raft2022/05/25 04:57:26 INFO: raft.node: b33dba86a0fb4825 lost leader 6e1f41eb0fc907d3 at term 9002
raft2022/05/25 04:57:26 INFO: raft.node: b33dba86a0fb4825 elected leader ef747a0c48e5d248 at term 9002

----------
host-000003

2022-05-25 04:53:22.373726 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058db4-000000015fcfa1a6.wal is created
2022-05-25 04:54:09.414384 I | mvcc: store.index: compact 5818195527
2022-05-25 04:54:14.942893 I | mvcc: finished scheduled compaction at 5818195527 (took 5.45195303s)
2022-05-25 04:54:33.342456 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058db5-000000015fcfd9da.wal is created
2022-05-25 04:55:37.268804 I | etcdserver: start to snapshot (applied: 5902437693, lastsnap: 5902337692)
2022-05-25 04:55:37.315167 I | etcdserver: saved snapshot at index 5902437693
2022-05-25 04:55:37.315273 I | etcdserver: compacted raft log at 5902432693
2022-05-25 04:55:39.590539 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058dae-000000015fce5c2d.wal successfully
2022-05-25 04:55:39.595323 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058daf-000000015fce932b.wal successfully
2022-05-25 04:55:39.600071 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058db0-000000015fcec828.wal successfully
2022-05-25 04:55:41.288946 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058db6-000000015fd01029.wal is created
2022-05-25 04:55:59.977981 I | pkg/fileutil: purged file /var/etcd/data/member/snap/0000000000002329-000000015fc86c12.snap successfully
2022-05-25 04:56:09.604896 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058db1-000000015fcefd97.wal successfully
2022-05-25 04:56:50.886239 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058db7-000000015fd0470a.wal is created
2022-05-25 04:57:09.610434 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058db2-000000015fcf34a3.wal successfully
raft2022/05/25 04:57:26 INFO: d078cc1c696b11c9 [term: 9001] received a MsgVote message with higher term from ef747a0c48e5d248 [term: 9002]
raft2022/05/25 04:57:26 INFO: d078cc1c696b11c9 became follower at term 9002
raft2022/05/25 04:57:26 INFO: d078cc1c696b11c9 [logterm: 9001, index: 5902459946, vote: 0] cast MsgVote for ef747a0c48e5d248 [logterm: 9001, index: 5902459946] at term 9002
raft2022/05/25 04:57:26 INFO: raft.node: d078cc1c696b11c9 lost leader 6e1f41eb0fc907d3 at term 9002
raft2022/05/25 04:57:26 INFO: raft.node: d078cc1c696b11c9 elected leader ef747a0c48e5d248 at term 9002

----------
host-000005

2022-05-25 04:54:14.757204 I | mvcc: finished scheduled compaction at 5818195527 (took 5.260169803s)
2022-05-25 04:54:16.993054 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058e0f-000000015fcfcb8e.wal is created
2022-05-25 04:54:38.569112 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058e0a-000000015fceba6d.wal successfully
2022-05-25 04:55:22.959544 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058e10-000000015fd00250.wal is created
WARNING: 2022/05/25 04:55:34 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2022-05-25 04:56:33.032027 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058e11-000000015fd039c0.wal is created
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 is starting a new election at term 9001
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 became candidate at term 9002
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 received MsgVoteResp from ef747a0c48e5d248 at term 9002
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 [logterm: 9001, index: 5902459946] sent MsgVote request to 6e1f41eb0fc907d3 at term 9002
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 [logterm: 9001, index: 5902459946] sent MsgVote request to 74af892fcb613a88 at term 9002
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 [logterm: 9001, index: 5902459946] sent MsgVote request to 9d80f22925f0ae8c at term 9002
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 [logterm: 9001, index: 5902459946] sent MsgVote request to b33dba86a0fb4825 at term 9002
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 [logterm: 9001, index: 5902459946] sent MsgVote request to d078cc1c696b11c9 at term 9002
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 [logterm: 9001, index: 5902459946] sent MsgVote request to e9e6c9183f59bf86 at term 9002
raft2022/05/25 04:57:26 INFO: raft.node: ef747a0c48e5d248 lost leader 6e1f41eb0fc907d3 at term 9002
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 received MsgVoteResp from 9d80f22925f0ae8c at term 9002
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 has received 2 MsgVoteResp votes and 0 vote rejections
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 received MsgVoteResp from b33dba86a0fb4825 at term 9002
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 has received 3 MsgVoteResp votes and 0 vote rejections
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 received MsgVoteResp from d078cc1c696b11c9 at term 9002
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 has received 4 MsgVoteResp votes and 0 vote rejections
raft2022/05/25 04:57:26 INFO: ef747a0c48e5d248 became leader at term 9002
raft2022/05/25 04:57:26 INFO: raft.node: ef747a0c48e5d248 elected leader ef747a0c48e5d248 at term 9002

----------
host-000006

2022-05-25 04:54:56.056931 I | pkg/fileutil: purged file /var/etcd/data/member/wal/000000000005810d-000000015fce35c9.wal successfully
2022-05-25 04:54:56.061662 I | pkg/fileutil: purged file /var/etcd/data/member/wal/000000000005810e-000000015fce6d53.wal successfully
2022-05-25 04:54:56.066391 I | pkg/fileutil: purged file /var/etcd/data/member/wal/000000000005810f-000000015fcea321.wal successfully
2022-05-25 04:54:56.071129 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058110-000000015fced95b.wal successfully
2022-05-25 04:55:13.883076 I | pkg/fileutil: purged file /var/etcd/data/member/snap/0000000000002329-000000015fc84831.snap successfully
2022-05-25 04:56:04.060044 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058116-000000015fd023af.wal is created
2022-05-25 04:56:26.076119 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058111-000000015fcf0e52.wal successfully
2022-05-25 04:57:13.635577 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000058117-000000015fd05ad6.wal is created
2022-05-25 04:57:26.081384 I | pkg/fileutil: purged file /var/etcd/data/member/wal/0000000000058112-000000015fcf4653.wal successfully
raft2022/05/25 04:57:26 INFO: 74af892fcb613a88 [term: 9001] received a MsgVote message with higher term from ef747a0c48e5d248 [term: 9002]
raft2022/05/25 04:57:26 INFO: 74af892fcb613a88 became follower at term 9002
raft2022/05/25 04:57:26 INFO: 74af892fcb613a88 [logterm: 9001, index: 5902459946, vote: 0] cast MsgVote for ef747a0c48e5d248 [logterm: 9001, index: 5902459946] at term 9002
raft2022/05/25 04:57:26 INFO: raft.node: 74af892fcb613a88 lost leader 6e1f41eb0fc907d3 at term 9002
raft2022/05/25 04:57:26 INFO: raft.node: 74af892fcb613a88 elected leader ef747a0c48e5d248 at term 9002

spzala · 2022-06-01T20:05:58Z

@iamejboy thanks for reporting the issue, you may want to try tuning considering your env - https://etcd.io/docs/v3.5/tuning/ As you said, it's a kind of issue that's difficult to reproduce. If you can dig more and like to provide a PR that would be great.

chaochn47 · 2022-06-25T00:35:24Z

If I interpret the impact correctly, leader election frequently disrupts the cluster availability.

Given the etcd version is v3.4.3, pre-vote feature is not enabled by referring to

etcd/etcdmain/help.go

Lines 113 to 114 in 3cf2f69

    
             --pre-vote 'false' 
        
               Enable to run an additional Raft election phase.

and can double confirmed from the host-000005 log that the log term is immediately increased once the leader election timeout.

pre-vote is a two-phase election process. A pre-election is carried out first (using the same rules as a regular election), and no node increases its term number unless the pre-election indicates that the campaigning node would win. This minimizes disruption when a partitioned node rejoins the cluster.

That's one way to mitigate the network packet loss, delay and partition impact.

I did not see old leader warning log to send out heartbeats due to it breaches 2*heartbeat timeout. So I guess it's not a disk problem in leader. What's the peer to peer round trip time in old leader and new leader?

Also would you mind consider using latest 3.4 minor version etcd? https://github.com/etcd-io/etcd/tree/v3.4.18

Feel free to report back with network round trip time metric between peers if the issue still exist after pre-vote is turned on and upgrade etcd to latest 3.4 version.

b10s · 2022-06-28T14:32:37Z

What's the peer to peer round trip time in old leader and new leader?

131 us avg, 145 us max, 121 us min for last 10 days stat from smoke ping

ahrtr · 2022-09-08T06:40:30Z

If a follower doesn't receive any message from the leader in randomizedElectionTimeout, then it may kick off a new election process.

It looks like just a performance issue to me. Suggested action:

Investigate the metrics provided by etcd to figure out the performance bottleneck;
Tune the election interval and election timeout per https://etcd.io/docs/v3.5/tuning/ ;
Once 3.4.21 is released, suggest to upgrade to the version or 3.5.5 (to be released soon). Note that 3.4.3 is too old, and there is a known data inconsistent issue which was resolved in 3.4.8.

Closing this ticket for now, please feel free to reopen it or raise a new issue/question if you have any other queries or issue.

The etcd error 'Failed to update lock: etcdserver: request timed out' does not seem to be related with Cilium. Thus, we will add it to the list of exceptions and not fail the CI because of this error. This seems to be a consequence of etcd's errors such as: - 'Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"' - 'Failed to update lock: etcdserver: request timed out' Those issues are referred in the etcd repository: - etcd-io/etcd#14071 - etcd-io/etcd#14027 (comment) Signed-off-by: André Martins <andre@cilium.io>

[ upstream commit e3803b4 ] The etcd error 'Failed to update lock: etcdserver: request timed out' does not seem to be related with Cilium. Thus, we will add it to the list of exceptions and not fail the CI because of this error. This seems to be a consequence of etcd's errors such as: - 'Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"' - 'Failed to update lock: etcdserver: request timed out' Those issues are referred in the etcd repository: - etcd-io/etcd#14071 - etcd-io/etcd#14027 (comment) [ Backport note: Fixed minor conflict. ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit e3803b4 ] The etcd error 'Failed to update lock: etcdserver: request timed out' does not seem to be related with Cilium. Thus, we will add it to the list of exceptions and not fail the CI because of this error. This seems to be a consequence of etcd's errors such as: - 'Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"' - 'Failed to update lock: etcdserver: request timed out' Those issues are referred in the etcd repository: - etcd-io/etcd#14071 - etcd-io/etcd#14027 (comment) Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit e3803b4 ] The etcd error 'Failed to update lock: etcdserver: request timed out' does not seem to be related with Cilium. Thus, we will add it to the list of exceptions and not fail the CI because of this error. This seems to be a consequence of etcd's errors such as: - 'Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"' - 'Failed to update lock: etcdserver: request timed out' Those issues are referred in the etcd repository: - etcd-io/etcd#14071 - etcd-io/etcd#14027 (comment) [ Backport note: Fixed minor conflict. ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit e3803b4 ] The etcd error 'Failed to update lock: etcdserver: request timed out' does not seem to be related with Cilium. Thus, we will add it to the list of exceptions and not fail the CI because of this error. This seems to be a consequence of etcd's errors such as: - 'Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"' - 'Failed to update lock: etcdserver: request timed out' Those issues are referred in the etcd repository: - etcd-io/etcd#14071 - etcd-io/etcd#14027 (comment) Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

iamejboy added the type/bug label May 25, 2022

serathius mentioned this issue Jun 21, 2022

Plans for v3.5.5 release #14138

Closed

16 tasks

serathius added the release/v3.4 label Sep 7, 2022

ahrtr closed this as completed Sep 8, 2022

aanm mentioned this issue Jan 25, 2023

tests: add exception for etcd error cilium/cilium#23334

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very frequent Leader Election with High compaction time? #14071

Very frequent Leader Election with High compaction time? #14071

iamejboy commented May 25, 2022

spzala commented Jun 1, 2022

chaochn47 commented Jun 25, 2022

b10s commented Jun 28, 2022

ahrtr commented Sep 8, 2022

Very frequent Leader Election with High compaction time? #14071

Very frequent Leader Election with High compaction time? #14071

Comments

iamejboy commented May 25, 2022

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

spzala commented Jun 1, 2022

chaochn47 commented Jun 25, 2022

b10s commented Jun 28, 2022

ahrtr commented Sep 8, 2022