Skip to content
This repository has been archived by the owner on Mar 13, 2022. It is now read-only.

Keep the watch action working forever #36

Merged
merged 2 commits into from
Jan 10, 2018

Conversation

lichen2013
Copy link
Contributor

@codecov-io
Copy link

codecov-io commented Oct 17, 2017

Codecov Report

Merging #36 into master will increase coverage by 0.23%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #36      +/-   ##
==========================================
+ Coverage   93.36%   93.59%   +0.23%     
==========================================
  Files          11       11              
  Lines         829      859      +30     
==========================================
+ Hits          774      804      +30     
  Misses         55       55
Impacted Files Coverage Δ
watch/watch_test.py 97.95% <100%> (+0.59%) ⬆️
watch/watch.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9213876...aec1c52. Read the comment docs.

@lichen2013 lichen2013 force-pushed the keep_watch branch 2 times, most recently from 48223bf to 4f815d4 Compare October 17, 2017 08:49
watch/watch.py Outdated
resp.close()
resp.release_conn()

if not keep or self._stop:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will result in all cached event to be delivered again. If we want to reconnect, we should catch the last resourceVersion and pass it to the watch call to make sure we continue receiving new events not old cached events. Also would be nice if we can fix this at root too (find the reason for timeout in http client and fix that). I think the server keeps the connection alive and it is the client that has a pre-defined timeout somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Let me do a little more checks.

Copy link

@karmab karmab Oct 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for what it's worth, i indeed got this same kind of issue solved using @mbohlool recommendation and this kind of snippet:

DOMAIN = "kool.karmalabs.local"
....
resource_version = None
while True:
    if resource_version is None:
        stream = watch.Watch().stream(crds.list_cluster_custom_object, DOMAIN, "v1", "guitars")
    else:
        stream = watch.Watch().stream(crds.list_cluster_custom_object, DOMAIN, "v1", "guitars", resource_version=resource_version)
    for event in stream:
        obj = event["object"]
        metadata = obj.get("metadata")
        resource_version = metadata['resourceVersion']
        ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some updates for the timeout setting:

There are two inputs available in the client can set timeouts:

a) timeout_seconds: This value will not used by client but included in the URL:

https://localhost:6443/api/v1/namespaces/default/pods?labelSelector=app%3Ddemo&timeoutSeconds=100&watch=True.

So, this setting is actually used by server.

b) _request_timeout

This can accept 2 type inputs, int or tuple with length equals to 2, checked at

kubernetes.client.rest.RESTClientObject.request()

The value will eventually set to the socket used for the connection.
b.1) if the input is tuple, the first value will actually been ignored.
b.2) when timeout happens, an exception will be raised, for example:

urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='localhost', port=6443): Read timed out.

b.3) when the value is not set, the default value is None, socket has no timeout.

https://docs.python.org/2/library/socket.html#socket.getdefaulttimeout

Haven't figure out where/how server sets timeout, but I believe when we leave both timeout settings empty, client will keeps the connection alive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice update on the timeout. timeout_seconds is what server expect for watch calls and if you don't set it, it won't timeout the connection. There is a hidden timeout on the request that fortunately the generated client don't know about and cannot set.

Even with not setting the timeout, the connection can fail or network can fail and it is a good idea to have this flag if we make it smart enough to continue the watch using resource_version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain a little more about the "There is a hidden timeout on the request that fortunately the generated client don't know about and cannot set". Really courious about this....

Copy link
Contributor Author

@lichen2013 lichen2013 Oct 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continue the watch using resource_version updated.
Also, I did a little change. Since we already have "timeout_seconds" to let user set the timeouts, when the value is empty, I suppose this means watch forever, no extra flag is needed.

@lichen2013 lichen2013 force-pushed the keep_watch branch 3 times, most recently from 14847ad to ff02e98 Compare October 26, 2017 08:20
@lichen2013 lichen2013 changed the title Add flag to enable keep the watch action working all the time Keep the watch action working forever Oct 26, 2017
resp.close()
resp.release_conn()

if timeouts or self._stop:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user set timeouts but it has not been passed yet?

Copy link
Contributor Author

@lichen2013 lichen2013 Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to find the proof that the exit happens we saw at the beginning is caused by the client side.

I do find the flag 'min-request-timeout' in doc kube-apiserver, and it is said that the apiserver actually has a timeout setting by itself:

--min-request-timeout=1800: An optional field indicating the minimum number of seconds a handler must keep a request open before timing it out. Currently only honored by the watch request handler, which picks a randomized value above this number as the connection timeout, to spread out load.

And if it is not set, the default value is 1800:

https://github.com/kubernetes/apiserver/blob/master/pkg/server/config.go#L248

MinRequestTimeout: 1800,

So, base on what I know now: I don't think timeout will happen before the timeouts setting.

@mbohlool
Copy link
Contributor

@lichen2013 This looks very good. Only one comment, otherwise good to go.

@mbohlool