Automatically reconnect Kubernetes watcher when closed exceptionally #6023

ikhoon · 2024-12-07T13:22:14Z

Motivation:

A watcher in KubernetesEndpointGroup automatically reconnects when it fails to connect to the remote peer. However, it does not reconnect when a WatcherException is raised.

io.fabric8.kubernetes.client.WatcherException: too old resource version: 573375490 (573377297)
	at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onStatus(AbstractWatchManager.java:401)
	at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onMessage(AbstractWatchManager.java:369)
	at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onMessage(WatcherWebSocketListener.java:52)
	at com.linecorp.armeria.client.kubernetes.ArmeriaWebSocket.onNext(ArmeriaWebSocket.java:106)
	at com.linecorp.armeria.client.kubernetes.ArmeriaWebSocket.onNext(ArmeriaWebSocket.java:37)
	at com.linecorp.armeria.common.stream.DefaultStreamMessage.notifySubscriberWithElements(DefaultStreamMessage.java:412)
        ...
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 573375490 (573377297)
	... 62 common frames omitted

I don't know why too old resource version was raised but an important thing is that watchers should not be stopped until KubernetesEndpointGroup is closed.

Modifications:

Refactor KuberntesEndpointGroup to start watchers asynchronously.
Automatically restart Watchers when onClose(WatcherException) is invoked.
Add more logs that I think might be useful.
- Also make the log formats consistent
Debounce the update of endpoints to prevent EndpointGroup.whenReady() from completing with a small number of endpoints.
- The purpose is to prevent a few endpoints from receiving too much traffic when a watcher is newly created.

Result:

KubernetesEndpointGroup automatically reconnects a Watcher when WatcherException is raised.

Motivation: A watcher in `KubernetesEndpointGroup` automatically reconnects when it fails to connect to the remote peer. However, it does not reconnect when a `WatcherException` is raised. ``` io.fabric8.kubernetes.client.WatcherException: too old resource version: 573375490 (573377297) at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onStatus(AbstractWatchManager.java:401) at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onMessage(AbstractWatchManager.java:369) at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onMessage(WatcherWebSocketListener.java:52) at com.linecorp.armeria.client.kubernetes.ArmeriaWebSocket.onNext(ArmeriaWebSocket.java:106) at com.linecorp.armeria.client.kubernetes.ArmeriaWebSocket.onNext(ArmeriaWebSocket.java:37) at com.linecorp.armeria.common.stream.DefaultStreamMessage.notifySubscriberWithElements(DefaultStreamMessage.java:412) ... Caused by: io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 573375490 (573377297) ... 62 common frames omitted ``` I don't know why `too old resource version` was raised but watchers should not be stopped until `KubernetesEndpointGroup` is closed Modifications: - Refactor `KuberntesEndpointGroup` to start watchers asynchronously. - Automatically restart `Watcher`s when `onClose(WatcherException)` is invoked. - Add more logs that I think might be useful. - Also make the log formats consistent - Debounce the update of endpoints to prevent `EndpointGroup.whenReady()` from completing with a small number of endpoints are recevied. - The purpose is to avoid a small number of endpoints from receiving too much traffic when a watcher is newly created. Result: `KubernetesEndpointGroup` now automatically reconnects a `Watcher` when `WatcherException` is raised.

github-actions · 2024-12-07T13:42:36Z