-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watcher handle graceful onclose #56
Conversation
the same way as we do when the watch closes exceptionally. The graceful onClose is default-implemented with a debug log message so we had mistakenly not overridden it.
We had no sign of graceful watcher onClose since it only logged a debug message in its default implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to test any of these, from client reference?
@@ -50,6 +50,8 @@ quarkus: | |||
level: DEBUG | |||
"org.apache.kafka.clients.Metadata": | |||
level: DEBUG | |||
"io.fabric8.kubernetes.client": | |||
level: DEBUG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use an env here. This is great for troubleshooting stuff like ECONNRESET but we should probably default to info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do a single-arch build with DEBUG, so we don't have to update lots of yaml.
while "onupdate" refers to the mechanism of handling key-value updates This reverts commit 440a8cd.
I think no, since the default reconnect behavior seems quite agressive. The issue we had was (most likely) a graceful close of the watch according to the client, so in that case I would assume no reconnect is done. Otherwise, an unlimited of retry with a 1 second interval (the default reconnect behavior) would surely have succeeded eventually. |
Evaluating I haven't added test coverage because I think we might want to switch to an informer (or are they always watching all namespaces?) in which case none of these hacks will be needed, at least not in their current form. |
I still don't know how API-server restarts behave now. We do however have debug logging for the websocket connections since 252f46d. |
fabric8io/kubernetes-client#5372 (comment)
|
using testcontainers with k3s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My interpretation is that we restart watches all the time. Correct?
src/main/java/se/yolean/kafka/keyvalue/kubernetes/EndpointsWatcher.java
Outdated
Show resolved
Hide resolved
After the latest changes, the integration test validates that we reconnect our watch after being disconnected. Running the test confirms that we actually get onClose-events after reconnect fails. This might be due to the fact that we're running a newer version now, and issues like these are fixed fabric8io/kubernetes-client#5189. If the integration test failed to reproduce the issue, we're back to square one... |
and select a random value between to use
the informer handles reconnects internally, which makes in unneccesary for us to write our own logic for that. Also, the replaced watch reconnect behaviour we wrote ourselves was not perfect and regularly re-watched without a need to do so
we modify two collections in sequence and the method is invoked by random threadpool-threads
src/main/resources/application.yaml
Outdated
@@ -1,7 +1,7 @@ | |||
kkv: | |||
namespace: ${NAMESPACE} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention we use POD_NAMESPACE when depending on the downward API for envs.
Alternatively I think we should use WATCH_NAMESPACE
or something like TARGET_NAMESPACE
if we think that watching a different namespace is a reasonable use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move namespace
and resync-period
to kkv.target.service
(next to service name and port), since they are directly related to this.
I think that TARGET_NAMESPACE
, or even more descriptive TARGET_SERVICE_NAMESPACE
is a good idea. Even though we always run kkv in the same namespace as the service, I really don't think there are any technical restrictions to allowing other namespaces as well.
We mistakenly did not implement any behavior for when our watch for kubernetes endpoints closed gracefully, since it has a default implementation that only logs a message on the debug level.
Log on DEBUG-level for the kubernetes client.
Handle graful watcher close the same way we handle exception close (log and exit the application with code 11).