Skip to content
This repository has been archived by the owner on Mar 13, 2022. It is now read-only.

Handle error events in watch #102

Closed
wants to merge 1 commit into from

Conversation

fabxc
Copy link

@fabxc fabxc commented Nov 15, 2018

Raise an ApiException for error events that indicate a watch failure
despite the HTTP response indicating success.

Fixes #57

The ApiException constructor indicates that it's intended to be used manually as well. Thus it seemed convenient to use and is the least likely to break any callers that likely have handling for that exception in place already.

@roycaihw

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 15, 2018
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 15, 2018
@fabxc
Copy link
Author

fabxc commented Nov 15, 2018

/assign roycaihw

watch/watch_test.py Outdated Show resolved Hide resolved
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fabxc
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: roycaihw

If they are not already assigned, you can assign the PR to them by writing /assign @roycaihw in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Raise an ApiException for error events that indicate a watch failure
despite the HTTP response indicating success.

Fixes kubernetes-client#57

Signed-off-by: Fabian Reinartz <freinartz@google.com>
@codecov-io
Copy link

Codecov Report

Merging #102 into master will increase coverage by 0.07%.
The diff coverage is 96%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #102      +/-   ##
==========================================
+ Coverage   92.04%   92.12%   +0.07%     
==========================================
  Files          13       13              
  Lines        1182     1206      +24     
==========================================
+ Hits         1088     1111      +23     
- Misses         94       95       +1
Impacted Files Coverage Δ
watch/watch.py 100% <100%> (ø) ⬆️
watch/watch_test.py 97.61% <94.73%> (-0.52%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 879ab01...bd98fd4. Read the comment docs.

@mitar
Copy link
Contributor

mitar commented Feb 11, 2019

Should we really throw an exception here and not just retry the watch?

@mitar
Copy link
Contributor

mitar commented Feb 11, 2019

On a better look, I do not think this fully addresses #57 in any case. To me unmarshal_event already fails because it cannot really convert the response data to API response of a rellevant type. I get:

...
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/watch/watch.py", line 141, in stream
    yield self.unmarshal_event(line, return_type)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/watch/watch.py", line 94, in unmarshal_event
    js['object'] = self._api_client.deserialize(obj, return_type)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 236, in deserialize
    return self.__deserialize(data, response_type)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 276, in __deserialize
    return self.__deserialize_model(data, klass)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 622, in __deserialize_model
    instance = klass(**kwargs)
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/models/v1_event.py", line 107, in __init__
    self.involved_object = involved_object
  File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/models/v1_event.py", line 266, in involved_object
    raise ValueError("Invalid value for `involved_object`, must not be `None`")
ValueError: Invalid value for `involved_object`, must not be `None`

So, I think: this should be handled sooner, and it should just automatically resume if possible.

@mitar
Copy link
Contributor

mitar commented Feb 11, 2019

I made slightly improved (in my view) version here: master...mitar:retry-watch

Would you maybe be interested in incorporating those changes here and updating tests here to trigger those other conditions as well? For example, I am using list_event_for_all_namespaces and I am getting the error above with just your patch. I had to add a check to not try to unmarshall it. We should have a test to make sure without that check code fails. I do not understand why in your case this does not happen, but maybe you used a different resource type which happens to be able to be unmarshalled even if it is in fact an error.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 12, 2019
@mitar mitar mentioned this pull request May 12, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 11, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Watch stream should handle HTTP error before unmarshaling event
7 participants