Python: Handling Watch Stream Exception #843

neliel123 · 2019-05-30T08:19:34Z

have this code that is implemented as a thread as a watch component.

class MyWatcher(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)


    def run(self):
        self.init_kube_config()
        python_api = client.CoreV1Api()
        try:
            w = watch.Watch()
            for event in w.stream(python_api.list_namespaced_config_map, namespace="my-namespace"):
                self.process_event(event)
        except ApiException as e:
            logger.error("Exception encountered while watching for event stream :: list_namespaced_config_map :: {}".format(e))
            pass

But sometimes for some reason, I am getting below exception

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/site-packages/kube/my_watcher.py", line 21, in run
    for event in w.stream(python_api.list_namespaced_config_map, namespace="my-namespace"):
  File "/usr/lib/python3.6/site-packages/kubernetes/watch/watch.py", line 128, in stream
    resp = func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 11854, in list_namespaced_config_map
    (data) = self.list_namespaced_config_map_with_http_info(namespace, **kwargs)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 11957, in list_namespaced_config_map_with_http_info
    collection_formats=collection_formats)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 321, in call_api
    _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 155, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 342, in request
    headers=headers)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 231, in GET
    query_params=query_params)
  File "/usr/lib/python3.6/site-packages/kubernetes/client/rest.py", line 222, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Mon, 27 May 2019 07:48:35 GMT', 'Content-Length': '186'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"resourceVersion: Invalid value: \\"None\\": strconv.ParseUint: parsing \\"None\\": invalid syntax","code":500}\n'

Currently I am catching the APIException and let it pass...I am just wondering what should be the proper way to handle this?
Should I stop the watch object and then create a new watch object?

I don't exactly know how to reproduce the error as it seems to be random in nature so I am just wondering what should I do in my except code.

The text was updated successfully, but these errors were encountered:

tomplus · 2019-06-06T07:25:43Z

Hi @neliel123
As you said the reason is unknown so I suggest to log an exception with a stack trace and then start watching it again. If you don't recreate the Watch you will start to stream from the last processed event.

fejta-bot · 2019-09-04T07:54:13Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

salilgupta1 · 2019-09-20T21:25:17Z

/remove-lifecycle stale

salilgupta1 · 2019-09-20T21:31:10Z

I'm currently trying to figure out what we should be doing with the resource_version when we have to restart the watch stream. The python client will handle certain exceptions, such as the connection timing out, and will use the last resource_version it has to continue the stream. Should we be emulating that paradigm when the client doesn't handle an exception? E.g. if we receive a 410 error code from K8s, should we restart the stream using the last resource_version we had? I would think that K8s would just continue to throw a 410 error because the resource_version is old.

Using K8s: 1.10.3
Client: v10.0.1

Here is a rough outline of our code, right now:

while True:
  try:
     stream = watch.Watch().stream(client.list_pod_for_all_namespaces)
     for pod in stream:
         # do some parsing
   except K8sApiException as e:
      # swallow error
   except ValueError as e:
      # We occasionally run into this problem: https://github.com/kubernetes-client/python/issues/895
      # swallow error if above issue comes up

When we swallow errors, the stream is started fresh and the resource_version is None. Should we really be passing in the last resource_version seen back into the stream() method?
We aren't worried about out of order events as much as missing events altogether.

fejta-bot · 2019-12-19T22:30:27Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-12-19T22:31:01Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

cben · 2020-01-06T13:50:49Z

@salilgupta1 Watch already tracks last seen resource_version:
https://github.com/kubernetes-client/python-base/blob/a2d1024524/watch/watch.py#L99-L102
and restarts from that version after "regular" disconnection.

This might not work if in first call you didn't provide resource_version, due to some nasty random ordering Old events from the past yielded due to remembered resource_version #819. My impression (not personally tested) is that "List+Watch" pattern avoids that.
You're right that 410 Gone should not retry same resource_version, but rather repeat List+Watch starting at version from List.

So I think you don't need to pass back resource_version from stream() to stream(), but you do need an outer loop doing List+Watch.
(See #1016 and #868 for proposed abstractions to help with that, but for now you need to code that.)

fejta-bot · 2020-02-05T14:09:18Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-03-06T14:52:03Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-03-06T14:52:11Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 4, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 20, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 19, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 5, 2020

k8s-ci-robot closed this as completed Mar 6, 2020

bodom0015 mentioned this issue Dec 19, 2023

feat: upgrade kubernetes client from 11.0.0 -> 24.2.0, implement List+Watch in KubeWatcher moleculemaker/mmli-job-manager#32

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Handling Watch Stream Exception #843

Python: Handling Watch Stream Exception #843

neliel123 commented May 30, 2019

tomplus commented Jun 6, 2019

fejta-bot commented Sep 4, 2019

salilgupta1 commented Sep 20, 2019

salilgupta1 commented Sep 20, 2019 •

edited

Loading

fejta-bot commented Dec 19, 2019

fejta-bot commented Dec 19, 2019

cben commented Jan 6, 2020

fejta-bot commented Feb 5, 2020

fejta-bot commented Mar 6, 2020

k8s-ci-robot commented Mar 6, 2020

Python: Handling Watch Stream Exception #843

Python: Handling Watch Stream Exception #843

Comments

neliel123 commented May 30, 2019

tomplus commented Jun 6, 2019

fejta-bot commented Sep 4, 2019

salilgupta1 commented Sep 20, 2019

salilgupta1 commented Sep 20, 2019 • edited Loading

fejta-bot commented Dec 19, 2019

fejta-bot commented Dec 19, 2019

cben commented Jan 6, 2020

fejta-bot commented Feb 5, 2020

fejta-bot commented Mar 6, 2020

k8s-ci-robot commented Mar 6, 2020

salilgupta1 commented Sep 20, 2019 •

edited

Loading