[stage] UPSTREAM: 49016: PV controller: resync informers manually #16927

jsafrane · 2017-10-18T10:51:56Z

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1501152

cc @openshift/sig-storage

jsafrane · 2017-10-18T12:43:02Z

/retest
flake: #16929

deads2k · 2017-10-18T12:48:00Z

@jsafrane backport to stage?

deads2k · 2017-10-18T12:50:18Z

The resync for other shared informer consumers would be skipped until their requested resync came about, right? Did that feature break somehow?

jsafrane · 2017-10-18T13:21:39Z

rebased to stage

@deads2k, it's fix for this bug: kubernetes/kubernetes#49905 (comment). PV controller may start when informer sync period is already fixed and can't be changed.

openshift-ci-robot · 2017-10-18T13:30:58Z

@jsafrane: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/openshift-jenkins/extended_builds	`5a6b0b0`	link	`/test extended_builds`
ci/openshift-jenkins/extended_conformance_install_update	`5a6b0b0`	link	`/test extended_conformance_install_update`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

deads2k · 2017-10-18T13:34:50Z

@deads2k, it's fix for this bug: kubernetes/kubernetes#49905 (comment). PV controller may start when informer sync period is already fixed and can't be changed.

That sounds like a bug. Who is starting it? We fixed the GC start problem.

jsafrane · 2017-10-18T13:52:58Z

It's started as usual controller in controller-manager, nothing special about it. And someone complained (@LiGgit?) that we should not force all controllers to have 15 second resync period, that's too often. So I added manual resync.

jsafrane · 2017-10-18T13:54:21Z

        "Auto-merging vendor/k8s.io/kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go", 
        "CONFLICT (content): Merge conflict in vendor/k8s.io/kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go",

This PR does not touch kuberuntime_manager.go...

/test all

jsafrane · 2017-10-18T13:58:41Z

/test cross
/test extended_builds

deads2k · 2017-10-18T14:13:21Z

It's started as usual controller in controller-manager, nothing special about it. And someone complained (@LiGgit?) that we should not force all controllers to have 15 second resync period, that's too often. So I added manual resync.

@liggitt is the skip logic that avoids resyncing everyone broken?

jsafrane · 2017-10-18T14:19:05Z

And someone complained (@LiGgit?) that we should not force all controllers to have 15 second resync period

Found the complaint: kubernetes/kubernetes#48941

deads2k · 2017-10-18T14:30:41Z

And someone complained (@LiGgit?) that we should not force all controllers to have 15 second resync period
Found the complaint: kubernetes/kubernetes#48941

@liggitt was that before we ensured startup order and had the opt-in resync?

liggitt · 2017-10-18T15:08:54Z

@liggitt was that before we ensured startup order

probably

and had the opt-in resync?

not sure what that is referring to. I still think a 15 second resync, no matter how it is done, is way too short. we have better patterns for retrying failed objects at a shorter interval without resyncing the whole list.

deads2k · 2017-10-18T18:36:26Z

not sure what that is referring to. I still think a 15 second resync, no matter how it is done, is way too short. we have better patterns for retrying failed objects at a shorter interval without resyncing the whole list.

No other consumers would see a resync. Only this controller that asked would see it.

I agree it is too short, but this doesn't look better than actually specifying the resync since it doesn't hurt other consumers.

wongma7 · 2017-10-18T21:25:42Z

would like to point out it's not just for retrying failed cases, it's a fundamental assumption in the pv controller which is apparently "space shuttle" code and hard to change? The reason for this bug is basically there is a "syncUnboundClaim" but no "syncUnboundVolume": if a volume has just been created, there is no reason to assume there exists a claim looking for it. but if a claim has just been created, obviously the controller should find a volume for it.

another case from today: a Bound volume's pvc is deleted. "syncClaim" will get triggered, but it should not be syncClaim's responsibility to update the Bound volume to Released. that is the job of "syncVolume," which can only be triggered by the periodic resync...

liggitt · 2017-10-18T22:08:37Z

it's a fundamental assumption in the pv controller which is apparently "space shuttle" code and hard to change?

requiring resync of all objects to stay responsive on subsets of objects that need reprocessing is an assumption that does not scale and should be redesigned.

jsafrane · 2017-10-19T09:33:01Z

requiring resync of all objects to stay responsive on subsets of objects that need reprocessing is an assumption that does not scale and should be redesigned.

While I agree with this, nobody complained so far that PV controller is too slow. Compare with A/D controller that syncs all pods with attachable volumes every 100 ms.

Redesign is on long-term TODO list, I added a card to our board. https://trello.com/c/ARmicYxn/577-speed-up-pv-controller

deads2k · 2017-10-19T14:14:53Z

At any rate, there are now better ways to solve this and you should use them instead of this. However, this is the current state upstream and is better than the bug at the moment.

/lgtm

openshift-merge-robot · 2017-10-19T14:14:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, jsafrane

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~vendor/k8s.io/kubernetes/pkg/controller/volume/persistentvolume/OWNERS~~ [deads2k,jsafrane]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

deads2k · 2017-10-19T14:15:36Z

@jsafrane you you also open the pull against master so it doesn't regress.

deads2k · 2017-10-19T18:42:34Z

@jsafrane holding this until the openshift/origin:master pull is open and labeled.

deads2k · 2017-10-19T19:05:39Z

@jsafrane I'm sorry about this, but @eparis found a special-case in online that lets us avoid this problem. repoint to master? still lgtm

jsafrane · 2017-10-20T07:24:28Z

opened PR against master in #16965

@deads2k

Automatic merge from submit-queue (batch tested with PRs 16667, 16796, 16960, 16965, 16894). [master] UPSTREAM: 49016: PV controller: resync informers manually Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1501152 This is the same as #16927, just for master instead of stage. /assign @deads2k

jsafrane · 2017-10-23T11:12:41Z

master counterpart is merged, can we merge this one?

eparis · 2017-10-23T14:33:33Z

going to close this one, we'll get it in stage tomorrow night on the next stage rebase.

openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 18, 2017

openshift-merge-robot assigned deads2k Oct 18, 2017

openshift-merge-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. vendor-update Touching vendor dir or related files labels Oct 18, 2017

UPSTREAM: 49016: PV controller: resync informers manually

5a6b0b0

jsafrane force-pushed the 3.7-fix-pvcontroller-sync branch from 31e6220 to 5a6b0b0 Compare October 18, 2017 13:20

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 18, 2017

jsafrane changed the base branch from master to stage October 18, 2017 13:20

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 19, 2017

deads2k added the do-not-merge label Oct 19, 2017

jsafrane changed the title ~~UPSTREAM: 49016: PV controller: resync informers manually~~ [stage] UPSTREAM: 49016: PV controller: resync informers manually Oct 20, 2017

jsafrane mentioned this pull request Oct 20, 2017

[master] UPSTREAM: 49016: PV controller: resync informers manually #16965

Merged

eparis closed this Oct 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stage] UPSTREAM: 49016: PV controller: resync informers manually #16927

[stage] UPSTREAM: 49016: PV controller: resync informers manually #16927

jsafrane commented Oct 18, 2017

jsafrane commented Oct 18, 2017

deads2k commented Oct 18, 2017

deads2k commented Oct 18, 2017

jsafrane commented Oct 18, 2017

openshift-ci-robot commented Oct 18, 2017 •

edited

Loading

deads2k commented Oct 18, 2017

jsafrane commented Oct 18, 2017

jsafrane commented Oct 18, 2017

jsafrane commented Oct 18, 2017

deads2k commented Oct 18, 2017

jsafrane commented Oct 18, 2017

deads2k commented Oct 18, 2017

liggitt commented Oct 18, 2017

deads2k commented Oct 18, 2017

wongma7 commented Oct 18, 2017

liggitt commented Oct 18, 2017

jsafrane commented Oct 19, 2017

deads2k commented Oct 19, 2017

openshift-merge-robot commented Oct 19, 2017

deads2k commented Oct 19, 2017

deads2k commented Oct 19, 2017

deads2k commented Oct 19, 2017

jsafrane commented Oct 20, 2017

jsafrane commented Oct 23, 2017

eparis commented Oct 23, 2017

[stage] UPSTREAM: 49016: PV controller: resync informers manually #16927

[stage] UPSTREAM: 49016: PV controller: resync informers manually #16927

Conversation

jsafrane commented Oct 18, 2017

jsafrane commented Oct 18, 2017

deads2k commented Oct 18, 2017

deads2k commented Oct 18, 2017

jsafrane commented Oct 18, 2017

openshift-ci-robot commented Oct 18, 2017 • edited Loading

deads2k commented Oct 18, 2017

jsafrane commented Oct 18, 2017

jsafrane commented Oct 18, 2017

jsafrane commented Oct 18, 2017

deads2k commented Oct 18, 2017

jsafrane commented Oct 18, 2017

deads2k commented Oct 18, 2017

liggitt commented Oct 18, 2017

deads2k commented Oct 18, 2017

wongma7 commented Oct 18, 2017

liggitt commented Oct 18, 2017

jsafrane commented Oct 19, 2017

deads2k commented Oct 19, 2017

openshift-merge-robot commented Oct 19, 2017

deads2k commented Oct 19, 2017

deads2k commented Oct 19, 2017

deads2k commented Oct 19, 2017

jsafrane commented Oct 20, 2017

jsafrane commented Oct 23, 2017

eparis commented Oct 23, 2017

openshift-ci-robot commented Oct 18, 2017 •

edited

Loading