Skip to content
This repository has been archived by the owner on Mar 13, 2022. It is now read-only.

Retry expired watches #133

Merged
merged 1 commit into from
Jul 15, 2020
Merged

Conversation

mitar
Copy link
Contributor

@mitar mitar commented May 12, 2019

This is a followup to #102 and fixes #57.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 12, 2019
@mitar
Copy link
Contributor Author

mitar commented May 12, 2019

/assign @mbohlool @micw523 @roycaihw

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 12, 2019
@codecov-io
Copy link

codecov-io commented May 12, 2019

Codecov Report

Merging #133 into master will decrease coverage by 0.23%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #133      +/-   ##
==========================================
- Coverage   92.86%   92.62%   -0.24%     
==========================================
  Files          13       13              
  Lines        1345     1356      +11     
==========================================
+ Hits         1249     1256       +7     
- Misses         96      100       +4
Impacted Files Coverage Δ
watch/watch_test.py 97.09% <83.33%> (-1.61%) ⬇️
watch/watch.py 97.84% <83.33%> (-2.16%) ⬇️
config/kube_config_test.py 94.82% <0%> (-0.12%) ⬇️
config/kube_config.py 86.37% <0%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 474e9fb...5cc28ea. Read the comment docs.

@ccamacho
Copy link

It would be nice to have those 3 commits squashed into 1

@mitar
Copy link
Contributor Author

mitar commented Jun 29, 2019

Doesn't GitHub support that when merging the PR?

@micw523
Copy link
Contributor

micw523 commented Jun 29, 2019 via email

@mitar
Copy link
Contributor Author

mitar commented Jun 29, 2019

Done.

@roycaihw
Copy link
Member

/cc

watch/watch.py Outdated
yield self.unmarshal_event(line, return_type)
event = self.unmarshal_event(line, return_type)
if isinstance(event, dict) and event['type'] == 'ERROR':
obj = event['raw_object']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to assume the APIs won't change for accessing event['raw_object']['code'], @roycaihw ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contract is that apiserver will always send a watch event of ERROR type with object of metav1.Status type.

We can check if the dict has the key code and handle non-existence like what client-go does

Copy link
Member

@roycaihw roycaihw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good in general. One major question about the retry behavior

watch/watch.py Outdated
yield self.unmarshal_event(line, return_type)
event = self.unmarshal_event(line, return_type)
if isinstance(event, dict) and event['type'] == 'ERROR':
obj = event['raw_object']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contract is that apiserver will always send a watch event of ERROR type with object of metav1.Status type.

We can check if the dict has the key code and handle non-existence like what client-go does

watch/watch.py Outdated
# but only if we have not already retried.
if not retry_after_410 and obj['code'] == 410:
retry_after_410 = True
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not retry the watch request with the same RV when it's already gone in apiserver

we can retry on 500 and 504 according to the client-go implementation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yliaog pointed out that we need to be careful about the code structure here. We want to be consistent to kubernetes client-go in terms of what a watch is capable of doing

  • In client-go, a watch function call doesn't retry on err. A client can implement their own handling for watch errors
  • client-go provides tooling like informer and retrywatcher for clients to leverage, which have retry logics for watch errors

We should be consistent and provide the same choices to our user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that would be backwards incompatible with current offering by Python package? It already supports retrying. So I think for Python it seems we already made decision that watch is retrying.

@@ -91,7 +91,7 @@ def unmarshal_event(self, data, return_type):
except ValueError:
return data
js['raw_object'] = js['object']
if return_type:
if return_type and js['type'] != 'ERROR':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where we fail to deserialize the response. Good catch

@dmateusp
Copy link

is this good to go ?

@jinchihe
Copy link

Can someone review and approve the PR? we meet the problem now... Thanks.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 18, 2020
@mitar
Copy link
Contributor Author

mitar commented Mar 18, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2020
@mel1nn
Copy link

mel1nn commented May 12, 2020

Could you fix the merge conflict please ?

@mel1nn
Copy link

mel1nn commented May 27, 2020

@mitar ?

@mitar
Copy link
Contributor Author

mitar commented May 27, 2020

Sorry. I will look into this now.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 27, 2020
@codecov-commenter
Copy link

Codecov Report

Merging #133 into master will decrease coverage by 0.19%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #133      +/-   ##
==========================================
- Coverage   92.69%   92.50%   -0.20%     
==========================================
  Files          13       13              
  Lines        1519     1547      +28     
==========================================
+ Hits         1408     1431      +23     
- Misses        111      116       +5     
Impacted Files Coverage Δ
watch/watch.py 97.84% <83.33%> (-2.16%) ⬇️
watch/watch_test.py 97.09% <83.33%> (-1.61%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 49ec060...1eca201. Read the comment docs.

@mitar
Copy link
Contributor Author

mitar commented May 27, 2020

Done.

@mitar
Copy link
Contributor Author

mitar commented Jun 23, 2020

I have updated this. Please review it before it gets rotten again.

@mel1nn
Copy link

mel1nn commented Jun 29, 2020

@mbohlool ?

@mel1nn
Copy link

mel1nn commented Jul 15, 2020

@roycaihw ?

watch/watch.py Outdated Show resolved Hide resolved
watch/watch.py Outdated Show resolved Hide resolved
@yliaog
Copy link
Contributor

yliaog commented Jul 15, 2020

thanks for the PR.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 15, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mitar, yliaog

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 15, 2020
@k8s-ci-robot k8s-ci-robot merged commit ded3d12 into kubernetes-client:master Jul 15, 2020
palnabarun added a commit to palnabarun/python-base that referenced this pull request Jul 16, 2020
PR kubernetes-client#133 introduces the usage of `http` module for checking the status
code for `GONE` HTTP status. However, this doesn't work in Python 2.7.

This commit checks if the interpreter is Python 2 and imports the
status code from `httplib` module instead and unifies the approach
to the checks.

Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Watch stream should handle HTTP error before unmarshaling event