EtcdLeaderElection does't lose leadership when network unavailable #49

belovaf · 2021-01-24T04:56:57Z

Hello!

I think, when SessionLease becomes EXPIRED, owning EtcdLeaderElection should call ElectionListener.leadershipChange(false).

With current behavior we can have several leaders at the same time.

The text was updated successfully, but these errors were encountered:

njhill · 2021-01-24T16:02:21Z

Thanks @belovaf I will take a closer look at this. It actually used to be handled via the PersistentLeaseKey's connection to the RangeCache but this behaviour was modified in version 0.0.11 and later, I think due to some unintended side effects.

belovaf · 2021-01-24T23:37:57Z

Also I notised that etcd supports election natively:
https://etcd.io/docs/v3.3.13/dev-guide/api_concurrency_reference_v3/

Looks like it can be implemented with SessionLease only.

sherman · 2021-02-21T16:37:11Z

Hi, all!

I think it would be nice having a configurable option, what is to do in case of network/etcd server is unavailable. Here's a simple log. The server became a leader. Then, I killed etcd server and finally restarted it. So, there are no events about lost/restore leadership.

sherman · 2021-02-21T19:45:21Z

AFAICS, when a lease is changed state to EXPIRED, this line of code is executed. Maybe it's the right place to notify listeners about this situation?

sherman · 2021-02-21T20:14:52Z

The patch illustrates the idea.

diff --git a/src/main/java/com/ibm/etcd/client/utils/PersistentLeaseKey.java b/src/main/java/com/ibm/etcd/client/utils/PersistentLeaseKey.java
index ab4ce68..076e37f 100644
--- a/src/main/java/com/ibm/etcd/client/utils/PersistentLeaseKey.java
+++ b/src/main/java/com/ibm/etcd/client/utils/PersistentLeaseKey.java
@@ -94,6 +94,9 @@ public class PersistentLeaseKey extends AbstractFuture<ByteString> implements Au
                 putKey(lease.getLeaseId());
             } else {
                 leaseActive = false;
+                if (rangeCache != null && newState == LeaseState.EXPIRED) {
+                    rangeCache.deleteLocaly(key);
+                }
             }
         });
     }
diff --git a/src/main/java/com/ibm/etcd/client/utils/RangeCache.java b/src/main/java/com/ibm/etcd/client/utils/RangeCache.java
index bc82dd5..4eaa96c 100644
--- a/src/main/java/com/ibm/etcd/client/utils/RangeCache.java
+++ b/src/main/java/com/ibm/etcd/client/utils/RangeCache.java
@@ -425,6 +425,14 @@ public class RangeCache implements AutoCloseable, Iterable<KeyValue> {
         return listeners.remove(listener);
     }
 
+    public void deleteLocaly(ByteString key) {
+        KeyValue existKv = entries.get(key);
+        if (existKv != null) {
+            entries.remove(key);
+            notifyListeners(EventType.DELETED, existKv, true);
+        }
+    }
+
     /**
      * Interface for listening to update events from
      * the cache

njhill · 2021-02-25T19:00:26Z

Thanks @sherman, sorry I'm a bit swamped at the moment but I will have a closer look at this soon. What you suggest is similar to the prior logic that was there, I need to look again at what the problematic side effects were. I think it could result in an inconsistent cache state in some circumstances - i.e. the key ending up absent from the cache when it should not have been.

sherman · 2021-03-15T07:20:15Z

Hi, @njhill.

Additionally, I've tested the patch against a 3-node ETCD cluster with leader/follower configuration. Can't see any problems.

Cases.

Run a leader and a follower. Block network between leader and ETCD. When a leader is changed, restore the network.
Restart ETCD cluster.

What kind of circumstances do you mention? Would you please provide any information on how it could be reproduced?

njhill mentioned this issue Jul 7, 2021

In EtcdClient.forEndpoints(xxx).withSessionTimeoutSecs(); Why is the verification completely opposite after version 0.015; #45

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EtcdLeaderElection does't lose leadership when network unavailable #49

EtcdLeaderElection does't lose leadership when network unavailable #49

belovaf commented Jan 24, 2021

njhill commented Jan 24, 2021

belovaf commented Jan 24, 2021

sherman commented Feb 21, 2021

sherman commented Feb 21, 2021

sherman commented Feb 21, 2021

njhill commented Feb 25, 2021

sherman commented Mar 15, 2021

EtcdLeaderElection does't lose leadership when network unavailable #49

EtcdLeaderElection does't lose leadership when network unavailable #49

Comments

belovaf commented Jan 24, 2021

njhill commented Jan 24, 2021

belovaf commented Jan 24, 2021

sherman commented Feb 21, 2021

sherman commented Feb 21, 2021

sherman commented Feb 21, 2021

njhill commented Feb 25, 2021

sherman commented Mar 15, 2021