Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jetcd locking problem #291

Closed
dinoopt opened this issue Feb 17, 2018 · 14 comments
Closed

jetcd locking problem #291

dinoopt opened this issue Feb 17, 2018 · 14 comments

Comments

@dinoopt
Copy link

dinoopt commented Feb 17, 2018

I'm using jetcd in my project to do lock/unlock of etcd key before doing an update of the value. Verified from the code and logs that everytime a lock is done, an unlock of the key also is done.
Whenever we fail to acquire a lock, we keep retrying till we get the lock.

However still end up in scenarios where after some time, none of the threads are able to acquire any lock. Eventually we see most of the threads getting stuck for locks which no other thread seems to be holding as confirmed from the logs. Even when we fail to acquire a lock, see the lock key getting created in etcd.

I'm using the recent 0.0.2 version with all the latest changes, is this a known issue ? Is there a fix available for this ? The server version of etcd I'm using is 3.2.13.

@fanminshi
Copy link
Member

@dinoopt the lock api is added recently and possibly contains bugs. if you can reproduce this issue with a simple script, it can help me to debug this for you.

@dinoopt
Copy link
Author

dinoopt commented Mar 5, 2018

@fanminshi Sorry for coming back late on this. The initial problem I faced was that there are multiple threads waiting for a lock and each of these locks have a timeout and a lease timeout. After the lock was released by the holding thread, it was going to a thread which had timed out but the lease had not expired. This was leading to all other threads waiting for this thread to release the lock. I added a step to revoke the lease during timeout, which fixed the issue to a certain extent. However during concurrent runs, we still are seeing this issue. I'm trying to reproduce this using a standalone program, but have not been successful so far.

Below is the sequence we use for lock, unlock. Could you help in understanding if there is something missing here.

  1. The below function lockForTimeAndTimeout() is called with 'lease_timeinSeconds' and 'timeout_timeInSeconds' both set to 1 second.
def lockForTimeAndTimeout(client: Client, key: ByteSequence,
                            lease_timeInSeconds: Int, timeout_timeInSeconds: Int): (Option[String], Long) = {
    val leaseID = client.getLeaseClient.grant(lease_timeInSeconds).get().getID
    val unlockKey: Option[String] = Try {
      client.getLockClient.lock(key, leaseID).get(timeout_timeInSeconds, SECONDS)
    } match {
      case Success(x) => Some(x.getKey.toStringUtf8)
      case Failure(ex) => None
    }
    (unlockKey, leaseID)
  }
  1. If the lock is successful, then call keepAlive() with the returned leaseID.
keepAlive(client, leaseID)
  1. If the function lockForTimeAndTimeout() returned None, then call revoke() with the returned leaseID so that the lease terminates.
client.getLeaseClient.revoke(leaseID)
  1. If the lock was successful, perform the operation on etcd and release the lock.
client.getLockClient.unlock(lockKey).get()

Also I ensure that if the lock is successful, it is held only for very minimal time to perform an update to etcd and is immediately released.

What I notice during the runs is that once the locking problem surfaces, many threads keep creating lock keys and try to acquire lock but fail to acquire the lock. I see too many instances of this thread in the stack and it keeps increasing once the locking problem starts.

"pool-1-thread-32" #66 prio=5 os_prio=31 tid=0x00007ff7aae58800 nid=0xa703 waiting on condition [0x0000700004053000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007971b7450> (a io.grpc.stub.ClientCalls$GrpcFuture)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:445)
        at com.coreos.jetcd.internal.impl.Util.lambda$toCompletableFutureWithRetry$1(Util.java:125)
        at com.coreos.jetcd.internal.impl.Util$$Lambda$32/150248051.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

"pool-1-thread-31" #65 prio=5 os_prio=31 tid=0x00007ff7ad506000 nid=0xa503 waiting on condition [0x0000700003f50000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007971b6a60> (a io.grpc.stub.ClientCalls$GrpcFuture)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:445)
        at com.coreos.jetcd.internal.impl.Util.lambda$toCompletableFutureWithRetry$1(Util.java:125)
        at com.coreos.jetcd.internal.impl.Util$$Lambda$32/150248051.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

If there are any other best practices while using jetcd locking, then please let know of that.

Thank you in advance for your help.

@dinoopt
Copy link
Author

dinoopt commented Mar 16, 2018

@fanminshi I face this issue only on a ETCD cluster running 3.2.13 version and don't see the issue in my other instances which uses ETCD cluster running 3.2.9 and 3.2.11 versions. I'm not able to reproduce this issue on my local standalone ETCD server running 3.2.13 version.

After including some debug messages in jetcd code, was trying the below scenario

  • One thread holds the lock, calls keepAlive and does a sleep
  • Three other threads contend for the lock and timeout and revoke the lease after timeout
  • The thread holding the lock comes out of the sleep and does an unlock

When running against the 3.2.9 version, see this behavior

  • After the unlock, it completes the futures created for the three other threads which timed out because of not getting a lock in the function toCompletableFutureWithRetry()

When running against 3.2.13 version, see this behavior

  • After the unlock, the futures created for the three other threads which timed out because of not getting a lock doesn't get completed.
  • This results in these threads continuing to wait in toCompletableFutureWithRetry(), please refer the stack I mentioned in the previous update.

Could you kindly let know if the difference in the etcd server versions can result in this kind of behavior ?

@fanminshi
Copy link
Member

@dinoopt hey thanks for the detailed explanation. I am current busy with many things. I'll take a look of this once I am free.

@fanminshi
Copy link
Member

cc/ @Grisu118

@dinoopt
Copy link
Author

dinoopt commented Mar 20, 2018

@fanminshi Thank you for your response. Including the stand alone Scala program I was using to reproduce the issue against etcd cluster running 3.2.13 version, in case that helps.

package com.test.etcdtest

import java.io._
import java.nio.file.{Files, Path, Paths}
import java.security.cert.{CertificateFactory, X509Certificate}

import scala.util.{Failure, Success, Try}
import com.coreos.jetcd.Client
import com.coreos.jetcd.data.ByteSequence

import scala.concurrent.Future
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
import java.util.concurrent.ThreadLocalRandom

import io.netty.handler.codec.http2.Http2SecurityUtil
import io.netty.handler.ssl.{SupportedCipherSuiteFilter, _}
import io.netty.handler.ssl.ApplicationProtocolConfig.{Protocol, SelectedListenerFailureBehavior, SelectorFailureBehavior}
import io.netty.handler.ssl.util.InsecureTrustManagerFactory

object ETCDTest {

  val ETCD_LOCAL_OR_GLOBAL: Boolean = false

  var etcd_endpoint_primary: String = null
  var etcd_endpoint_secondary: String = null
  var etcd_user: String = null
  var etcd_password: String = null
  var etcd_cert: String = null
  var etcd_ssl: Int = 1

  private val etcd_client: Client = ETCD_LOCAL_OR_GLOBAL match {
    case true =>
      Client.builder().endpoints("http://localhost:2379").build()
    case false =>
      etcd_endpoint_primary = sys.env.getOrElse("ETCD_ENDPOINT_PRIMARY", "")
      etcd_endpoint_secondary = sys.env.getOrElse("ETCD_ENDPOINT_SECONDARY", "")
      etcd_user = sys.env.getOrElse("ETCD_USERNAME", "")
      etcd_password = sys.env.getOrElse("ETCD_PASSWORD", "")
      etcd_cert = sys.env.getOrElse("ETCD_CERT_FILE", "")

      val certFile: Path = Paths.get(etcd_cert)
      val sslBytes = Files.readAllBytes(certFile)
      val is: InputStream = new ByteArrayInputStream(sslBytes);
      val cf: CertificateFactory = CertificateFactory.getInstance("X.509")
      val caCert: X509Certificate = cf.generateCertificate(is).asInstanceOf[X509Certificate]
      val apc: ApplicationProtocolConfig = new ApplicationProtocolConfig(Protocol.ALPN, SelectorFailureBehavior.NO_ADVERTISE, SelectedListenerFailureBehavior.ACCEPT, ApplicationProtocolNames.HTTP_2)
      val sslC: SslContext = SslContextBuilder.forClient().sslProvider(SslProvider.OPENSSL).ciphers(Http2SecurityUtil.CIPHERS, SupportedCipherSuiteFilter.INSTANCE).trustManager(InsecureTrustManagerFactory.INSTANCE).applicationProtocolConfig(apc).build()

      Client.builder().endpoints(etcd_endpoint_primary, etcd_endpoint_secondary).user(ByteSequence.fromString(etcd_user)).
        password(ByteSequence.fromString(etcd_password)).sslContext(sslC).lazyInitialization(false).build()
  }


  def getRandomSleepTime(): Int =
    ThreadLocalRandom.current.nextInt(2000, 5000)

  def updateValueAndUnlock(keytoLock: String, leaseID: Long) = Future {
    val unlockKey: Option[String] = getLock(keytoLock, leaseID, 60L)

    unlockKey match {
      case Some(key) =>
        Try {
          etcd_client.getLeaseClient.keepAlive(leaseID)
        } match {
          case Success(_) => println("Successfully called keepAlive")
          case Failure(_) => println("Failed to call keepAlive")
        }

        etcd_client.getKVClient().put(ByteSequence.fromString(keytoLock), ByteSequence.fromString("bar1"))
        val valueOfKey = etcd_client.getKVClient().get(ByteSequence.fromString(keytoLock))
        println("Value of key foo2 is " + valueOfKey)
        Thread.sleep(90000)
        Try {
          etcd_client.getLockClient.unlock(ByteSequence.fromString(unlockKey.get)).get()
        } match {
          case Success(_) => println("Unlocked successfully")
          case Failure(_) => println("Failed to unlock")
        }
      case None =>
        println ("Could not acquire lock, timed out")
    }
  }


  def getLock(key: String, leaseID: Long, timeOut: Long): Option[String] = {
    Try {
      etcd_client.getLockClient.lock(ByteSequence.fromString(key), leaseID).get(timeOut, SECONDS)
    } match {
      case Success(x) =>
        println("Successfully acquired lock")
        Some(x.getKey.toStringUtf8)
      case Failure(ex) =>
        println(s"Failed to get the lock with exception: ${ex}, revoking the lease")
        Try {
          etcd_client.getLeaseClient.revoke(leaseID)
        } match {
          case Success(_) => println("Successfully revoked the lease")
          case _ => println("Failed to revoke the lease")
        }
        None
    }
  }

  def getLeaseID(timeinSec: Long): Long = {
    Try {
      etcd_client.getLeaseClient.grant(timeinSec).get().getID()
    } match {
      case Success(x) =>
        println("Got lease id " + x)
        x
      case Failure(_) =>
        println("Failed to get a lease")
        0L
    }
  }

  def main(args: Array[String]) {

    val keytoLock: String = "foo2"
    val leaseID1 = getLeaseID(300L)
    val leaseID2 = getLeaseID(300L)
    val leaseID3 = getLeaseID(300L)
    val leaseID4 = getLeaseID(300L)

    updateValueAndUnlock(keytoLock, leaseID1)
    Thread.sleep(10000)

    updateValueAndUnlock(keytoLock, leaseID2)
    updateValueAndUnlock(keytoLock, leaseID3)
    updateValueAndUnlock(keytoLock, leaseID4)

    println("Waiting for futures to complete..")
    Thread.sleep(300000)
    println("End of the run...")
    System.exit(0)
  }
}

@Grisu118
Copy link
Contributor

Did a quick try, but I'm not able to reproduce the bug. I always get the following output of your test. As far as I understand this is the correct output, right?

Tested with single node, cluster and versions 3.2.9/3.2.10/3.2.13

Got lease id 827362993188163844
Got lease id 827362993188163846
Got lease id 827362993188163848
Got lease id 827362993188163850
Successfully acquired lock
Successfully called keepAlive
Value of key foo2 is com.coreos.jetcd.internal.impl.Util$2@5cd18aff[Not completed]
Waiting for futures to complete..
Failed to get the lock with exception: java.util.concurrent.TimeoutException, revoking the lease
Failed to get the lock with exception: java.util.concurrent.TimeoutException, revoking the lease
Failed to get the lock with exception: java.util.concurrent.TimeoutException, revoking the lease
Successfully revoked the lease
Could not acquire lock, timed out
Successfully revoked the lease
Could not acquire lock, timed out
Successfully revoked the lease
Could not acquire lock, timed out
Unlocked successfully
End of the run...

Process finished with exit code 0

@dinoopt
Copy link
Author

dinoopt commented Mar 20, 2018

@Grisu118 Sorry I forgot to mention that I was running with some debug points added in jetcd code. The code output will be the same when run against all the etcd server versions. The difference I saw in the behavior was,

  1. In 3.2.9, after the thread which acquires the lock on a key does the unlock (after the message 'Unlocked successfully' in the above output), the futures created for the the other 3 threads which timed out waiting for the lock also are able to complete.

This is the piece of code in jetcd in Util.java where the other 3 threads trying to acquire the lock wait for the future to complete.

 executor.execute(() -> {
      // only retry 3 times.
      int retryLimit = 3;
      while (retryLimit-- > 0) {
        try {
          ListenableFuture<S> f = sourceFutureRef.get();
          targetFuture.complete(resultConvert.apply(f.get()));  --->  Wait for the future to complete
          return;
} catch (Exception e) {

  1. In 3.2.13, after the thread which acquire the lock does the unlock, the futures created for the 3 other threads which timed out for the lock wait indefinitely in
targetFuture.complete(resultConvert.apply(f.get()));

If I take a jstack output immediately after the unlock and before the run ends ,still see three threads getting stuck in toCompeltableFutureWithRetry() in the below stack.

"pool-1-thread-32" #66 prio=5 os_prio=31 tid=0x00007ff7aae58800 nid=0xa703 waiting on condition [0x0000700004053000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007971b7450> (a io.grpc.stub.ClientCalls$GrpcFuture)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:445)
        at com.coreos.jetcd.internal.impl.Util.lambda$toCompletableFutureWithRetry$1(Util.java:125)
        at com.coreos.jetcd.internal.impl.Util$$Lambda$32/150248051.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Also tried adding a timeout value in f.get() in which case the threads are able to terminate after the timeout.

Also I was not able to reproduce the problem on a standalone Etcd server running in my system. It seems to be happening only when I run against a provisioned Etcd cluster which we are using in our project.

Please let me know if you need any additional details.

Thank you for your help.

@Grisu118
Copy link
Contributor

I cannot reproduce this against an etcd cluster on my local machine (windows 10). The cluster is created on docker, similar to the unit tests of jetcd.

Every future which is accessed in the code snipped you mentioned is completed

Tue Mar 20 14:41:54 CET 2018: io.grpc.stub.ClientCalls$GrpcFuture@3b4245c9
Tue Mar 20 14:41:55 CET 2018: isDone: true, io.grpc.stub.ClientCalls$GrpcFuture@3b4245c9
Got lease id 1060142804964194564
Tue Mar 20 14:41:55 CET 2018: io.grpc.stub.ClientCalls$GrpcFuture@5fd59959
Got lease id 1060142804964194566
Tue Mar 20 14:41:55 CET 2018: io.grpc.stub.ClientCalls$GrpcFuture@4cac74ad
Tue Mar 20 14:41:55 CET 2018: isDone: true, io.grpc.stub.ClientCalls$GrpcFuture@5fd59959
Got lease id 1060142804964194568
Tue Mar 20 14:41:55 CET 2018: io.grpc.stub.ClientCalls$GrpcFuture@8550115
Tue Mar 20 14:41:55 CET 2018: isDone: true, io.grpc.stub.ClientCalls$GrpcFuture@4cac74ad
Got lease id 1060142804964194570
Tue Mar 20 14:41:55 CET 2018: isDone: true, io.grpc.stub.ClientCalls$GrpcFuture@8550115
Tue Mar 20 14:41:56 CET 2018: io.grpc.stub.ClientCalls$GrpcFuture@692fe6f
Successfully acquired lock
Tue Mar 20 14:41:56 CET 2018: isDone: true, io.grpc.stub.ClientCalls$GrpcFuture@692fe6f
Successfully called keepAlive
Tue Mar 20 14:41:56 CET 2018: io.grpc.stub.ClientCalls$GrpcFuture@2ad6a65
Value of key foo2 is com.coreos.jetcd.internal.impl.Util$2@3148425f[Not completed]
Tue Mar 20 14:41:56 CET 2018: io.grpc.stub.ClientCalls$GrpcFuture@426b9348
Tue Mar 20 14:41:56 CET 2018: isDone: true, io.grpc.stub.ClientCalls$GrpcFuture@426b9348
Tue Mar 20 14:41:56 CET 2018: isDone: true, io.grpc.stub.ClientCalls$GrpcFuture@2ad6a65
Waiting for futures to complete..
Tue Mar 20 14:42:06 CET 2018: io.grpc.stub.ClientCalls$GrpcFuture@47aff454
Tue Mar 20 14:42:06 CET 2018: io.grpc.stub.ClientCalls$GrpcFuture@46541fdd
Tue Mar 20 14:42:06 CET 2018: io.grpc.stub.ClientCalls$GrpcFuture@5f0c3cb0
Failed to get the lock with exception: java.util.concurrent.TimeoutException, revoking the lease
Failed to get the lock with exception: java.util.concurrent.TimeoutException, revoking the lease
Failed to get the lock with exception: java.util.concurrent.TimeoutException, revoking the lease
Successfully revoked the lease
Could not acquire lock, timed out
Successfully revoked the lease
Could not acquire lock, timed out
Successfully revoked the lease
Could not acquire lock, timed out
Tue Mar 20 14:43:06 CET 2018: io.grpc.stub.ClientCalls$GrpcFuture@3dd5e14c
Tue Mar 20 14:43:06 CET 2018: io.grpc.stub.ClientCalls$GrpcFuture@6b596208
Tue Mar 20 14:43:06 CET 2018: io.grpc.stub.ClientCalls$GrpcFuture@696eb698
Tue Mar 20 14:43:06 CET 2018: isDone: true, io.grpc.stub.ClientCalls$GrpcFuture@3dd5e14c
Tue Mar 20 14:43:06 CET 2018: isDone: true, io.grpc.stub.ClientCalls$GrpcFuture@6b596208
Tue Mar 20 14:43:06 CET 2018: isDone: true, io.grpc.stub.ClientCalls$GrpcFuture@696eb698
Tue Mar 20 14:43:26 CET 2018: io.grpc.stub.ClientCalls$GrpcFuture@67a955c5
Unlocked successfully
Tue Mar 20 14:43:26 CET 2018: isDone: true, io.grpc.stub.ClientCalls$GrpcFuture@67a955c5
Tue Mar 20 14:43:26 CET 2018: isDone: true, io.grpc.stub.ClientCalls$GrpcFuture@46541fdd
Tue Mar 20 14:43:26 CET 2018: isDone: true, io.grpc.stub.ClientCalls$GrpcFuture@47aff454
Tue Mar 20 14:43:26 CET 2018: isDone: true, io.grpc.stub.ClientCalls$GrpcFuture@5f0c3cb0
Disconnected from the target VM, address: '127.0.0.1:59444', transport: 'socket'
End of the run...

@dinoopt
Copy link
Author

dinoopt commented Mar 22, 2018

@Grisu118 Thanks for trying it out. Yesterday I got a new etcd cluster provisioned with 3.2.13 and the problem is not happening there. In the clusters which were upgraded from some lower version to 3.2.13, I can still see the problem happening. I tried upgrading a local etcd server installed in my Mac using brew from 3.2.9 to 3.2.13, but was not able to reproduce the issue.

@liudechi7
Copy link

liudechi7 commented Jun 6, 2018

@fanminshi I also encountered the same problem. Using a distributed lock, two programs compete to acquire locks, each maintaining the lock's associated lease id's expiration time (keepAliveOnce). When the lock program is terminated, theoretically the expiration time is reached, the lock should be released, and another program acquires the lock, but the actual Failed to get the lock. This problem is not easy to reproduce. The current poor positioning is a service-side issue (etcd version 3.2.18), or a jetcd(version 0.0.2) issue

long leaseId = leaseClient.grant(60).get().getID();
new Thread(() -> { leaseClient.keepAliveOnce(leaseId); Thread.sleep(1000); });
lockClient.lock(ByteSequence.fromString("lock"), leaseId).get().getKey();

@unqjohnny
Copy link

@fanminshi @Grisu118,I also have the same problem in mutil-thread evnriment.
etcd version: 3.3.12 .. java client verison: 0.0.3. this is my test code:

        int i=0;
        while (i < 3){
            i++;
            taskExecutor.execute(() -> {
                System.out.println("start:"+Thread.currentThread().getName());
                try {
                    while (true){
                        Long leaseId = null;
                        String lockName = "test_lock_0002";
                        try {

                            leaseId = EtcdClient.lockWithTime(lockName,10L,1000);
                            Thread.sleep(500);
                            System.out.println(Thread.currentThread().getName() + ":get lock:"+leaseId);
                        }catch (Exception e){
                            e.printStackTrace();
                        }finally {
                            if(leaseId != null){
                                System.out.println(Thread.currentThread().getName() + ":unlock:"+leaseId);
                                EtcdClient.unlock(lockName,leaseId);
                            }
                        }
                        Thread.sleep(500);
                    }

                }catch (Exception e){
                    e.printStackTrace();
                }
            });
        }

and below is EtcdClient code:

public class EtcdClient {

    private static final Logger logger = LogManager.getLogger(EtcdClient.class);
    private static final Client client = Client.builder().endpoints("http://url:2379").build();

    public static Long  lockWithTime(String lockName, Long ttl, Integer timeout){
        try {
            long leaseId = client.getLeaseClient().grant(ttl).get().getID();
            client.getLockClient().lock(ByteSequence.from(lockName.getBytes()),leaseId).get(timeout,TimeUnit.MILLISECONDS);
            return leaseId;
        }catch (Exception e){
            logger.error("lock error:"+lockName +"," + e.getMessage(),e);
        }
        return null;
    }
    public static boolean unlock(String lockName, Long leaseId){
        try {
            client.getLockClient().unlock(ByteSequence.from(lockName.getBytes())).get();
            client.getLeaseClient().revoke(leaseId);
            return true;
        }catch (Exception e){
            logger.error(e.getMessage(),e);
        }
        return false;
    }
}

And then is the result:

start:taskExecutor-1
start:taskExecutor-2
start:taskExecutor-3
taskExecutor-2:get lock:7587836883519476368
taskExecutor-2:unlock:7587836883519476368
2019-04-16 14:32:07 ERROR EtcdClient [57]-lock error:test_lock_0002,null
java.util.concurrent.TimeoutException
	at java.util.concurrent.CompletableFuture.timedAwaitDone(CompletableFuture.java:365) ~[?:1.8.0_20]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2260) ~[?:1.8.0_20]
	at com.coral.client.etcd.EtcdClient.lockWithTime(EtcdClient.java:54) [classes/:?]
	at com.coral.client.test.ExTest.lambda$testMutiThread$1(ExTest.java:772) [classes/:?]
	at com.coral.client.test.ExTest$$Lambda$123/1223360538.run(Unknown Source) [classes/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_20]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_20]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_20]
taskExecutor-3:get lock:7587836883519476372
taskExecutor-3:unlock:7587836883519476372 .  //After this, all thread could not get the lock
taskExecutor-1:get lock:null
2019-04-16 14:32:08 ERROR EtcdClient [57]-lock error:test_lock_0002,null
java.util.concurrent.TimeoutException
	at java.util.concurrent.CompletableFuture.timedAwaitDone(CompletableFuture.java:365) ~[?:1.8.0_20]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2260) ~[?:1.8.0_20]
	at com.coral.client.etcd.EtcdClient.lockWithTime(EtcdClient.java:54) [classes/:?]
	at com.coral.client.test.ExTest.lambda$testMutiThread$1(ExTest.java:772) [classes/:?]
	at com.coral.client.test.ExTest$$Lambda$123/1223360538.run(Unknown Source) [classes/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_20]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_20]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_20]
taskExecutor-2:get lock:null
2019-04-16 14:32:08 ERROR EtcdClient [57]-lock error:test_lock_0002,null
java.util.concurrent.TimeoutException
	at java.util.concurrent.CompletableFuture.timedAwaitDone(CompletableFuture.java:365) ~[?:1.8.0_20]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2260) ~[?:1.8.0_20]
	at com.coral.client.etcd.EtcdClient.lockWithTime(EtcdClient.java:54) [classes/:?]
	at com.coral.client.test.ExTest.lambda$testMutiThread$1(ExTest.java:772) [classes/:?]
	at com.coral.client.test.ExTest$$Lambda$123/1223360538.run(Unknown Source) [classes/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_20]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_20]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_20]
2019-04-16 14:32:09 ERROR EtcdClient [57]-lock error:test_lock_0002,null
java.util.concurrent.TimeoutException
	at java.util.concurrent.CompletableFuture.timedAwaitDone(CompletableFuture.java:365) ~[?:1.8.0_20]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2260) ~[?:1.8.0_20]
	at com.coral.client.etcd.EtcdClient.lockWithTime(EtcdClient.java:54) [classes/:?]
	at com.coral.client.test.ExTest.lambda$testMutiThread$1(ExTest.java:772) [classes/:?]
	at com.coral.client.test.ExTest$$Lambda$123/1223360538.run(Unknown Source) [classes/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_20]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_20]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_20]
taskExecutor-3:get lock:null
taskExecutor-1:get lock:null
2019-04-16 14:32:10 ERROR EtcdClient [57]-lock error:test_lock_0002,null
java.util.concurrent.TimeoutException
	at java.util.concurrent.CompletableFuture.timedAwaitDone(CompletableFuture.java:365) ~[?:1.8.0_20]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2260) ~[?:1.8.0_20]
	at com.coral.client.etcd.EtcdClient.lockWithTime(EtcdClient.java:54) [classes/:?]
	at com.coral.client.test.ExTest.lambda$testMutiThread$1(ExTest.java:772) [classes/:?]
	at com.coral.client.test.ExTest$$Lambda$123/1223360538.run(Unknown Source) [classes/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_20]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_20]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_20]
taskExecutor-2:get lock:null
2019-04-16 14:32:11 ERROR EtcdClient [57]-lock error:test_lock_0002,null
java.util.concurrent.TimeoutException
	at java.util.concurrent.CompletableFuture.timedAwaitDone(CompletableFuture.java:365) ~[?:1.8.0_20]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2260) ~[?:1.8.0_20]
	at com.coral.client.etcd.EtcdClient.lockWithTime(EtcdClient.java:54) [classes/:?]
	at com.coral.client.test.ExTest.lambda$testMutiThread$1(ExTest.java:772) [classes/:?]
	at com.coral.client.test.ExTest$$Lambda$123/1223360538.run(Unknown Source) [classes/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_20]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_20]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_20]
2019-04-16 14:32:11 ERROR EtcdClient [57]-lock error:test_lock_0002,null
java.util.concurrent.TimeoutException
	at java.util.concurrent.CompletableFuture.timedAwaitDone(CompletableFuture.java:365) ~[?:1.8.0_20]
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2260) ~[?:1.8.0_20]
	at com.coral.client.etcd.EtcdClient.lockWithTime(EtcdClient.java:54) [classes/:?]
	at com.coral.client.test.ExTest.lambda$testMutiThread$1(ExTest.java:772) [classes/:?]
	at com.coral.client.test.ExTest$$Lambda$123/1223360538.run(Unknown Source) [classes/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_20]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_20]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_20]
taskExecutor-3:get lock:null
taskExecutor-1:get lock:null

Is there any problem with my test code???? Thanks.

@unqjohnny
Copy link

maybe the reason is due to params timeout..

client.getLockClient().lock(ByteSequence.from(lockName.getBytes()),leaseId).get(1000,TimeUnit.MILLISECONDS);

change 1000 to 2000. every thing ok

@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity.
Remove stale label or comment or this will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants