watch without waitForReady sometimes not reschedule when all servers are down #1261

tunefun · 2023-10-25T09:51:54Z

Versions

etcd: 3.5.8
jetcd: 0.7.6
java: 1.8

Describe the bug
Watch without waitForReady sometimes not reschedule when all servers are down. When we reboot part of servers, watch stream on those servers rescheduled as expected;When we shutdown all servers and reboot all servers after a while, watch not rescheduled as expected.

Review code below, we will loss the event when event comes before setting handler.

jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/WatchImpl.java

Lines 180 to 188 in c51adfb

    
           rstream = Util.applyRequireLeader(option.withRequireLeader(), stub).watch(stream -> { 
        
               wstream.set(stream); 
        
               stream.write(WatchRequest.newBuilder().setCreateRequest(builder).build()); 
        
           }); 
        
           rstream.handler(this::onNext); 
        
           rstream.exceptionHandler(this::onError); 
        
           rstream.endHandler(event -> onCompleted());

example:
connect failed -> StreamObserverReadStream.onError() -> WatcherImpl.onError() -> WatcherImpl.reschedule() -> WatcherImpl.resume() -> WatchVertxStub.watch() -> connect failed -> StreamObserverReadStream.onError() -> StreamObserverReadStream.exceptionHandler is null -> rstream.exceptionHandler(this::onError)

To Reproduce
as description

Expected behavior
watch rescheduled

lburgazzoli · 2023-10-25T09:58:39Z

@tenghuanhe can you provide a reproducer and/or willing to work on a PR ?

giri-vsr · 2023-11-09T07:40:30Z

We are also facing same issue.I am not able to reproduce using test containers.

This is what is happening.
For creating watch io.vertx.grpc.stub.ClientCalls.manyToMany is called which returns rstream and after that we register exceptionHandler but in this case exception happens before rstream is returned thats why StreamObserverReadStream.exceptionHandler is null.So it not rescheduling(onError is not called).

Exception occurs in StreamObserver<I> request = delegate.apply(response);

  public static <I, O> ReadStream<O> manyToMany(ContextInternal ctx, Handler<WriteStream<I>> requestHandler, Function<StreamObserver<O>, StreamObserver<I>> delegate) {
    StreamObserverReadStream<O> response = new StreamObserverReadStream<>();
    StreamObserver<I> request = delegate.apply(response);
    requestHandler.handle(new GrpcWriteStream<>(request));
    return response;
  }

@lburgazzoli I am not sure what will be the fix?

lburgazzoli · 2023-11-09T08:32:45Z

@giri-vsr I do have some very limited time at this stage and I need to digg into the issue more to understand what to do
If you have any time, maybe it would be good to chat with the vert.x folks to see what a solution could be.

giri-vsr · 2023-11-14T07:45:09Z

Maybe we need to move to https://github.com/eclipse-vertx/vertx-grpc since https://github.com/vert-x3/vertx-grpc is deprecated

lburgazzoli · 2023-11-14T07:49:51Z

Maybe we need to move to https://github.com/eclipse-vertx/vertx-grpc since https://github.com/vert-x3/vertx-grpc is deprecated

can you try to work on a PR ?

giri-vsr · 2023-12-15T06:37:08Z

Issue is fixed in https://github.com/vert-x3/vertx-grpc 4.5.1 and Move to https://github.com/eclipse-vertx/vertx-grpc should be handled separately.

tunefun · 2024-01-11T13:19:41Z

@lburgazzoli @giri-vsr hello, it seems that election observe have the same problem

jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/ElectionImpl.java

Lines 116 to 119 in 85a9a54

    
           stub.observe(request) 
        
               .handler(value -> listener.onNext(new LeaderResponse(value, namespace))) 
        
               .endHandler(ignored -> listener.onCompleted()) 
        
               .exceptionHandler(error -> listener.onError(toEtcdException(error)));

and maybe we can also registers end handler before watch request？

tunefun · 2024-01-11T13:49:15Z

also LeaseImpl,MaintenanceImpl：

jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/LeaseImpl.java

Lines 150 to 163 in 85a9a54

    
           leaseStub 
        
               .leaseKeepAlive(s -> { 
        
                   ref.set(s); 
        
                   s.write(req); 
        
               }) 
        
               .handler(r -> { 
        
                   if (r.getTTL() != 0) { 
        
                       future.complete(new LeaseKeepAliveResponse(r)); 
        
                   } else { 
        
                       future.completeExceptionally( 
        
                           newEtcdException(ErrorCode.NOT_FOUND, "etcdserver: requested lease not found")); 
        
                   } 
        
               }) 
        
               .exceptionHandler(future::completeExceptionally);

jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/LeaseImpl.java

Lines 197 to 199 in 85a9a54

    
           leaseStub.leaseKeepAlive(this::writeHandler) 
        
               .handler(this::handleResponse) 
        
               .exceptionHandler(this::handleException);

jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/MaintenanceImpl.java

Lines 122 to 136 in 85a9a54

    
           this.stub.snapshot(SnapshotRequest.getDefaultInstance()) 
        
               .handler(r -> { 
        
                   try { 
        
                       r.getBlob().writeTo(outputStream); 
        
                       bytes.addAndGet(r.getBlob().size()); 
        
                   } catch (IOException e) { 
        
                       answer.completeExceptionally(toEtcdException(e)); 
        
                   } 
        
               }) 
        
               .endHandler(event -> { 
        
                   answer.complete(bytes.get()); 
        
               }) 
        
               .exceptionHandler(e -> { 
        
                   answer.completeExceptionally(toEtcdException(e)); 
        
               });

jetcd/jetcd-core/src/main/java/io/etcd/jetcd/impl/MaintenanceImpl.java

Lines 144 to 147 in 85a9a54

    
           this.stub.snapshot(SnapshotRequest.getDefaultInstance()) 
        
               .handler(r -> observer.onNext(new io.etcd.jetcd.maintenance.SnapshotResponse(r))) 
        
               .endHandler(event -> observer.onCompleted()) 
        
               .exceptionHandler(e -> observer.onError(toEtcdException(e)));

giri-vsr mentioned this issue Nov 9, 2023

ExceptionHandler cannot be registered to StreamObserverReadStream in manyToMany vert-x3/vertx-grpc#141

Closed

giri-vsr mentioned this issue Dec 14, 2023

watcher: Register Exception Handler before watch request #1287

Merged

lburgazzoli closed this as completed in #1287 Dec 14, 2023

tunefun mentioned this issue Jan 30, 2024

election observe sometimes lose the first event #1308

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

watch without waitForReady sometimes not reschedule when all servers are down #1261

watch without waitForReady sometimes not reschedule when all servers are down #1261

tunefun commented Oct 25, 2023

lburgazzoli commented Oct 25, 2023

giri-vsr commented Nov 9, 2023

lburgazzoli commented Nov 9, 2023

giri-vsr commented Nov 14, 2023

lburgazzoli commented Nov 14, 2023

giri-vsr commented Dec 15, 2023

tunefun commented Jan 11, 2024

tunefun commented Jan 11, 2024

watch without waitForReady sometimes not reschedule when all servers are down #1261

watch without waitForReady sometimes not reschedule when all servers are down #1261

Comments

tunefun commented Oct 25, 2023

lburgazzoli commented Oct 25, 2023

giri-vsr commented Nov 9, 2023

lburgazzoli commented Nov 9, 2023

giri-vsr commented Nov 14, 2023

lburgazzoli commented Nov 14, 2023

giri-vsr commented Dec 15, 2023

tunefun commented Jan 11, 2024

tunefun commented Jan 11, 2024