Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a resync option to manager options #88

Merged
merged 1 commit into from
Jul 26, 2018

Conversation

grantr
Copy link
Contributor

@grantr grantr commented Jul 25, 2018

Allows setting the resync interval to something besides the default of 10 hours for all informers managed by the cache.

I'm not sure how to test this, because the resync interval seems to be irretrievable from the cache or its informers.

Fixes #87.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 25, 2018
@DirectXMan12
Copy link
Contributor

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 26, 2018
@DirectXMan12
Copy link
Contributor

I think this looks reasonable, but we should probably have some comment on when it's appropriate to override this setting.

@grantr
Copy link
Contributor Author

grantr commented Jul 26, 2018

Sure, though I'm not sure what that comment should be. It should also mention why 10 hours is an appropriate default. It's hard to come by this knowledge since informers are not very well documented. 😄

Examples of resync periods in Kubernetes:

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 26, 2018
@grantr
Copy link
Contributor Author

grantr commented Jul 26, 2018

I changed the name of the field to SyncPeriod since that's the name chosen by most Kubernetes controllers, and added some thoughts on how to choose a value.

@grantr
Copy link
Contributor Author

grantr commented Jul 26, 2018

Unrelated to this PR, but this line references log without importing it: https://github.com/grantr/controller-runtime/blob/38e11f82cb145bd2dc9292de809833041524bf4d/pkg/manager/manager.go#L121

That caused issues for me locally.

@droot
Copy link
Contributor

droot commented Jul 26, 2018

@grantr Can you pl. squash the commits.

Unrelated to this PR, but this line references log without importing it: https://github.com/grantr/controller-runtime/blob/38e11f82cb145bd2dc9292de809833041524bf4d/pkg/manager/manager.go#L121
That caused issues for me locally.

Thats strange, how is that compiling.

@droot
Copy link
Contributor

droot commented Jul 26, 2018

Unrelated to this PR, but this line references log without importing it: https://github.com/grantr/controller-runtime/blob/38e11f82cb145bd2dc9292de809833041524bf4d/pkg/manager/manager.go#L121
That caused issues for me locally.

Whats the issue you are seeing ? log is declared in internal.go at the pkg level in manager pkg.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 26, 2018
@grantr
Copy link
Contributor Author

grantr commented Jul 26, 2018

Interesting. My editor (Atom) runs goimports on files when they save (via the go-plus package), and it's adding a reference to a log package when that happens (it just so happens to be istio.io/fortio/log, but that's probably just the first one I have locally). It doesn't happen on the command line when I run goimports, so maybe the go-plus package is confused. I haven't had this problem in any other go projects.

Allows setting the resync interval to something besides the default of
10 hours for all informers managed by the cache.
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 26, 2018
@grantr
Copy link
Contributor Author

grantr commented Jul 26, 2018

Rebased @droot!

@droot droot added approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jul 26, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: grantr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 3aaf8cd into kubernetes-sigs:master Jul 26, 2018
@grantr grantr deleted the resync-option branch July 26, 2018 22:26
@DirectXMan12
Copy link
Contributor

I think we might want to reword that description -- from my understanding, I think it's a bit misleading. You really only need a resync interval at all if one of the stimuli you react to isn't watchable -- it needs to be polled (e.g. the HPA, which needs to poll metrics for changes). Historically, from what I understand, it was also used to prevent accidental "misses" of events or reconciles, but that's not true any more, from what I've been told.

@grantr
Copy link
Contributor Author

grantr commented Jul 27, 2018

Historically, from what I understand, it was also used to prevent accidental "misses" of events or reconciles, but that's not true any more, from what I've been told.

How was this fixed?

@DirectXMan12
Copy link
Contributor

How was this fixed?

For the API server events, I think the code was made sufficiently robust that watch event misses were no longer an issue.

I wanted to make sure my recollection about resyncing from cache was correct, so I traced things through.

If you look at what resync actually does these days, we see that it determines defaultEventHandlerResyncPeriod and resyncCheckPeriod in the shared informer implementation (https://github.com/kubernetes-sigs/controller-runtime/blob/master/vendor/k8s.io/client-go/tools/cache/shared_informer.go#L141). If we trace that through, it becomes FullResyncPeriod in Config, which gets passed to NewReflector, which it in turn uses to create a timer which triggers r.store.Resync every interval, which calls DeltaFIFO#Resync, which enumerates through the known keys in the cache and adds an event.

To summarize, basically, it runs through the cache every resync interval, adding an event to trigger a reconcile, so it's completely independent of apiserver communication.

Thus it, it seems like it only becomes relevant if either

a) you've miswritten your reconcile loop so that it can fail to reconcile the correct state and not cause a requeue
b) you've got something that you need to polling (like the HPA, but in the case of controller-runtime, you can use generic events for that as well)
c) (maybe -- need to make sure I'm not missing anything here) if you've got something in super-long backoff from errors, and you want to make sure at some point it gets tried again (but in this case, you could just cap the backoff instead)

@grantr
Copy link
Contributor Author

grantr commented Jul 27, 2018

Ok, that clarifies things, thanks!

Just to make sure I understand, here's what I interpret as the reasons to want a resync interval:

  • Risk of broken code (mine or kubernetes')
  • Polling unwatched state changes (but with controller-runtime, use generic events instead: how would I do this? Is there a looping poller construct in controller-runtime?)
  • unusual error backoff (but change workqueue backoff instead: how would I do this with controller-runtime?)

This seems complex enough that it seems like it should be an article in the kubebuilder book, and the comment in code should link to that article. WDYT @DirectXMan12?

@DirectXMan12
Copy link
Contributor

Risk of broken code (mine or kubernetes')

It should never be kube code (famous last words)

Polling unwatched state changes (but with controller-runtime, use generic events instead: how would I do this? Is there a looping poller construct in controller-runtime?)

Use the channel event source + client(cache).List. It's not necessarily more efficient or anything, but is more flexible (since you can filter what you relist, if you want, or do more complicated logic)

but change workqueue backoff instead: how would I do this with controller-runtime

I don't know if we actual configure the max backoff anywhere. I think right now it's abnormally large. @pwittrock and I were talking about this the other day. I also haven't fully proven this case to myself -- I'd need to test or follow the code through a bit more

This seems complex enough that it seems like it should be an article in the kubebuilder book, and the comment in code should link to that article. WDYT @DirectXMan12?

We should probably talk about it, yes. In general, I think users probably shouldn't bother changing the default unless they really know what they're doing.

@grantr
Copy link
Contributor Author

grantr commented Jul 27, 2018

I think users probably shouldn't bother changing the default unless they really know what they're doing.

Maybe that should be the comment 😁

@DirectXMan12
Copy link
Contributor

heh, that's fair. I'd want to leave a little bit of description, but sticking that at the end is probably fine :-)

justinsb pushed a commit to justinsb/controller-runtime that referenced this pull request Dec 7, 2018
Add a resync option to manager options
dlipovetsky added a commit to dlipovetsky/controller-runtime that referenced this pull request Oct 29, 2020
The comment tries to capture more of the thoughts exchanged in
kubernetes-sigs#88.
dlipovetsky added a commit to dlipovetsky/controller-runtime that referenced this pull request Oct 29, 2020
The comment tries to capture more of the thoughts exchanged in
kubernetes-sigs#88.
dlipovetsky added a commit to dlipovetsky/controller-runtime that referenced this pull request Oct 29, 2020
The comment tries to capture more of the thoughts exchanged in
kubernetes-sigs#88.
k8s-ci-robot pushed a commit that referenced this pull request Mar 3, 2021
* Improve docs for manager.Options.SyncPeriod

The comment tries to capture more of the thoughts exchanged in
#88.

* Note that  SyncPeriod applies to all controllers.

Co-authored-by: Joe Julian <me@joejulian.name>

* Make explanation of SyncPeriod more precise..

Co-authored-by: Joe Julian <me@joejulian.name>

Co-authored-by: Joe Julian <me@joejulian.name>
RainbowMango pushed a commit to RainbowMango/controller-runtime that referenced this pull request Sep 1, 2021
* Improve docs for manager.Options.SyncPeriod

The comment tries to capture more of the thoughts exchanged in
kubernetes-sigs#88.

* Note that  SyncPeriod applies to all controllers.

Co-authored-by: Joe Julian <me@joejulian.name>

* Make explanation of SyncPeriod more precise..

Co-authored-by: Joe Julian <me@joejulian.name>

Co-authored-by: Joe Julian <me@joejulian.name>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants