kubernetes watch API is behaving oddly #1370

sameer2800 · 2020-11-09T17:18:23Z

I am running a kubernetes watch on configmaps list. I am running watch in a background thread and looking for watch events continously and updating my cache if there is any event added/modifed. What i am observing is that sometimes the watch events are not coming at all.

Watch<V1ConfigMap> watch = Watch.createWatch(
               apiClient,
               api.listNamespacedConfigMapCall(serviceDiscoveryConfig.getNamespace(), null, null, null,
                       null, null, null, null, null, true, null),
               new TypeToken<Watch.Response<V1ConfigMap>>() {
               }.getType()); 


while(true) {
               TimeUnit.SECONDS.sleep(2);
               for (Watch.Response<V1ConfigMap> configMap : watch) {

                   if (configMap.object != null) {
                       log.info("Received configmap watch event. updating the configmap: {} , metadata: {}", configMap.object.getMetadata().getName(), configMap.object.getMetadata().toString());

                           namespaceConfigsCache.insertOrUpdateValue(configMap.object.getMetadata().getName(), configMap.object.getData());

                   }
               }
           }

This entire piece of code runs in a background thread. The moment code misses a event, i dont see any more watch events after that point of time at all. is there any chance that watch is being stopped. if so, how do i check the status.

The text was updated successfully, but these errors were encountered:

brendandburns · 2020-11-10T05:22:29Z

You should not expect a watch to run forever, you need to list/watch in a loop.

The code is actually semi-thorny to get right, you are probably better off using the Informer class in this library that handles much of this logic for you.

sameer2800 · 2020-11-10T06:24:42Z

thanks @brendandburns. I will try out Informer and will let you know.

sameer2800 · 2020-11-10T13:15:22Z

@brendandburns I am seeing similar behavior even with Informer class. Let me know if i have configured something wrong here. It received events for the first few minutes and then suddenly it stopped receiving changelog events.

 // configmaps informer
        SharedIndexInformer<V1ConfigMap> configInformer =
                factory.sharedIndexInformerFor(
                        (CallGeneratorParams params) -> {
                            return api.listNamespacedConfigMapCall(
                                    serviceDiscoveryConfig.getNamespace(),
                                    null,
                                    null,
                                    null,
                                    null,
                                    null,
                                    null,
                                    params.resourceVersion,
                                    params.timeoutSeconds,
                                    params.watch,
                                    null);
                        },
                        V1ConfigMap.class,
                        V1ConfigMapList.class);


configInformer.addEventHandler(
                new ResourceEventHandler<V1ConfigMap>() {
                    @Override
                    public void onAdd(V1ConfigMap configMap) {

                        log.info("Received configmap watch add event. updating the configmap: {} , metadata: {}", configMap.getMetadata().getName(), configMap.getMetadata().toString());

                       
                    }

                    @Override
                    public void onUpdate(V1ConfigMap oldConfigMap, V1ConfigMap newConfigMap) {
                        log.info("Received configmap watch update event. updating the configmap: {} , metadata: {}", newConfigMap.getMetadata().getName(), newConfigMap.getMetadata().toString());
                    
                    }

                    @Override
                    public void onDelete(V1ConfigMap configMap, boolean deletedFinalStateUnknown) {
                      
                    }
                });

        factory.startAllRegisteredInformers();

brendandburns · 2020-11-10T16:31:47Z

Is it possible that your thread is throwing an exception? If you throw an uncaught exception inside the thread, the thread will terminate.

I would try:

public void run() {
  try {
    // your code here
  } catch (Throwable e) {
     e.printStackTrace();
  }
}

And see if any exceptions occur. Your code for using the informer looks correct.

yue9944882 · 2020-11-10T16:47:42Z

Is it possible that your thread is throwing an exception? If you throw an uncaught exception inside the thread, the thread will terminate.

i think so

sameer2800 · 2020-11-11T05:02:17Z

@brendandburns @yue9944882 I am not running in this a seperate thread because factory.startAllRegisteredInformers(); starts the informers in the background thread. I am running this in main method itself. Which part do u want me to put in try catch block ,because initilazing infomer and adding a event handler is one time task and i dont see errors there. And startAllRegisteredInfromers runs in background.

sameer2800 · 2020-11-12T07:34:20Z

@brendandburns I have not changed any timeout variables. i see default for listNamespacedConfigMapCall is set to 5 mins.

listerWatcher.watch(
                  new CallGeneratorParams(
                      Boolean.TRUE,
                      lastSyncResourceVersion,
                      Long.valueOf(Duration.ofMinutes(5).toMillis()).intValue()));

Actually, in my case, kube API server goes to unavailable state once in a while. do u think increasing the timeout will work ?

yue9944882 · 2020-11-12T10:46:19Z

you code looks good, the informer will retry reconnecting the kube-apiserver every 1 second if the server goes unavailable. and watch connection will be re-established once the server is up.

java/examples/src/main/java/io/kubernetes/client/examples/InformerExample.java

Lines 38 to 39 in a43fa93

    
           OkHttpClient httpClient = 
        
               apiClient.getHttpClient().newBuilder().readTimeout(0, TimeUnit.SECONDS).build();

did you set the read-timeout to infinite as the example above shows?

sameer2800 · 2020-11-12T16:53:18Z

@yue9944882 yes.

apiClient = ClientBuilder.cluster().build();;
       // infinite timeout
       OkHttpClient httpClient =
               apiClient.getHttpClient().newBuilder().readTimeout(0, TimeUnit.SECONDS).build();
       apiClient.setHttpClient(httpClient);
       this.api = new CoreV1Api();

I tried changing the timeouts too. dint help. in my last run, I could see it working for hours. then it stopped receiving the events. I started with debug mode on. I neither see exceptions nor errors.

brendandburns · 2020-11-12T18:56:26Z

I'm actually not sure if you want infinite timeout? In a flaky network, is it possible that the something is not sending a TCP reset on the severing of a network connection? I've seen situations where a TCP reset isn't sent and the system holds a TCP connection open, but there's no traffic flowing.

I would actually set a non-infinite timeout (5 minutes?) and see if that fixes things.

sameer2800 · 2020-11-13T07:42:49Z

@brendandburns thanks for the suggestion and i will try and let you know

tony-clarke-amdocs · 2020-11-24T19:39:47Z

@sameer2800 where you able to resolve this?

We are starting to see the same symptoms on AKS (Azure Kubernetes Service, K8S 1.18.8) for CR instance. After about 5 minutes the informer stops seeing any updates (new/update/delete). We are running with the 9.0.1 release. We updated to 10.0.1 but no difference.

@brendanburns you suggested to run with a read timeout that is not zero, but the 10.0.0 release was updated to disallow any read timeout other than zero. See this commit. Any other suggestions to try?

brendandburns · 2020-11-25T16:32:28Z

cc @yue9944882

See some related discussion here:
kubernetes/kubernetes#65012

@tony-clarke-amdocs for AKS specifically see the discussion here:
Azure/AKS#1755

I think we should:
a) re-enable non-zero timeouts
b) make sure we're sending TCP Keep-Alive

eventually:

c) switch from Web Sockets to HTTP/2 and add health checks.

tony-clarke-amdocs · 2020-11-25T18:38:04Z

@brendandburns @yue9944882 I noticed that the watch call sets the timeout to 5 minutes. See here. Given that we no longer see watch events after 5 minutes...I tend to think this is not a coincidence?
Any idea how we make sure to send TCP keep-alive? Looking at the code, I don't think we are doing web sockets today.

tony-clarke-amdocs · 2020-12-04T14:25:16Z

@brendandburns @yue9944882 I think I have figured this out. The standard client doesn't include http2 protocol.

ApiClient apiClient = ClientBuilder.standard().build();

We need to add the following to add http2 and a pinginterval.

apiClient.setHttpClient(apiClient
                .getHttpClient()
                .newBuilder()
                    .protocols(Arrays.asList(Protocol.HTTP_2,Protocol.HTTP_1_1))
                    .readTimeout(Duration.ZERO)
                    .pingInterval(1,TimeUnit.MINUTES)
                .build())

With the above change the watch doesn't hang and it all is good.

Does it make sense that the standard build includes something like this by default? I think it should at least include HTTP_2 protocol.

brendandburns · 2020-12-06T16:25:53Z

That change seems fine to me.
@yue9944882 wdyt?

eskaufel mentioned this issue Dec 15, 2020

Watch Method stops watching kubernetes-client/csharp#533

Closed

tony-clarke-amdocs mentioned this issue Jan 17, 2021

Update client builder to be more robust by default #1498

Merged

k8s-ci-robot closed this as completed in #1498 Jan 20, 2021

brendandburns mentioned this issue Mar 6, 2021

ReflectorRunnable.watchHandler seems to be stuck after about 5min of inactivity #1578

Closed

brendandburns mentioned this issue Feb 28, 2022

SharedIndexInformer does not work while watching the kubernetes event #2151

Closed

utk-12 mentioned this issue Apr 21, 2022

K8s Java client upgrade azkaban/azkaban#3075

Merged

funky-eyes mentioned this issue Jul 12, 2022

Broken pipe on API call with Minikube #1003

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubernetes watch API is behaving oddly #1370

kubernetes watch API is behaving oddly #1370

sameer2800 commented Nov 9, 2020 •

edited

Loading

brendandburns commented Nov 10, 2020

sameer2800 commented Nov 10, 2020

sameer2800 commented Nov 10, 2020 •

edited

Loading

brendandburns commented Nov 10, 2020

yue9944882 commented Nov 10, 2020

sameer2800 commented Nov 11, 2020

sameer2800 commented Nov 12, 2020 •

edited

Loading

yue9944882 commented Nov 12, 2020

sameer2800 commented Nov 12, 2020 •

edited

Loading

brendandburns commented Nov 12, 2020

sameer2800 commented Nov 13, 2020

tony-clarke-amdocs commented Nov 24, 2020

brendandburns commented Nov 25, 2020

tony-clarke-amdocs commented Nov 25, 2020 •

edited

Loading

tony-clarke-amdocs commented Dec 4, 2020

brendandburns commented Dec 6, 2020

kubernetes watch API is behaving oddly #1370

kubernetes watch API is behaving oddly #1370

Comments

sameer2800 commented Nov 9, 2020 • edited Loading

brendandburns commented Nov 10, 2020

sameer2800 commented Nov 10, 2020

sameer2800 commented Nov 10, 2020 • edited Loading

brendandburns commented Nov 10, 2020

yue9944882 commented Nov 10, 2020

sameer2800 commented Nov 11, 2020

sameer2800 commented Nov 12, 2020 • edited Loading

yue9944882 commented Nov 12, 2020

sameer2800 commented Nov 12, 2020 • edited Loading

brendandburns commented Nov 12, 2020

sameer2800 commented Nov 13, 2020

tony-clarke-amdocs commented Nov 24, 2020

brendandburns commented Nov 25, 2020

tony-clarke-amdocs commented Nov 25, 2020 • edited Loading

tony-clarke-amdocs commented Dec 4, 2020

brendandburns commented Dec 6, 2020

sameer2800 commented Nov 9, 2020 •

edited

Loading

sameer2800 commented Nov 10, 2020 •

edited

Loading

sameer2800 commented Nov 12, 2020 •

edited

Loading

sameer2800 commented Nov 12, 2020 •

edited

Loading

tony-clarke-amdocs commented Nov 25, 2020 •

edited

Loading