Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix nil stop value for source.Channel #154

Closed
wants to merge 2 commits into from

Conversation

mhrivnak
Copy link
Contributor

fixes #103

Creates a stop channel for the manager in New(), which will get passed to any
source.Channel instances that are added. When the manager's start method is
called and a new stop channel is passed in, that channel will be joined in a
goroutine with the manager's existing channel so that if the newer channel gets
closed, so will the manager's.

fixes kubernetes-sigs#103

Creates a stop channel for the manager in New(), which will get passed to any
source.Channel instances that are added. When the manager's start method is
called and a new stop channel is passed in, that channel will be joined in a
goroutine with the manager's existing channel so that if the newer channel gets
closed, so will the manager's.
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mhrivnak
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: pwittrock

If they are not already assigned, you can assign the PR to them by writing /assign @pwittrock in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 19, 2018
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Sep 19, 2018
@mhrivnak
Copy link
Contributor Author

I signed it.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Sep 19, 2018
@JoelSpeed
Copy link
Contributor

I've had a look at this this morning and I think it rather overcomplicates the changes required.

I don't think you need to create the second stopper nor make that many changes to the start procedure at all.

I put together a quick attempt that I think suffices and changes less of the existing code, I don't know if that would be preferred or not but I think the test cases I've added cover the behaviour that we are looking for here?

pusher@99acd5c

@mhrivnak
Copy link
Contributor Author

Thanks for the feedback.

I started out with roughly what you posted. I added some additional changes, which may not strictly be necessary, for these reasons:

  • In the leader election case, the non-exported start() method gets passed a different stop channel (one created by leaderelection) than the one that was passed into Start(). We basically have two sources that might tell us to stop: the leader election, and whatever was passed into Start() (which likely is tied to a signal handler for SIGTERM and similar). Having some code listening to one, and other code listening to the other, seems like an opportunity for unexpected behavior.
  • On that theme, there are several stop channels to keep track of, so my main goal became: make sure one channel is the authoritative source, and have everything listen to that, for the sake of simplicity. A different kind of simplicity perhaps than your example. But if we stick to a design where everything is listening to the same stop channel, and we have one place that controls writes to it, that keeps the flow of information simple and easy to reason about. And it avoids opportunity for confusion about stop channels, which may lead to the behavior described in the previous point.
  • Also for the sake of simplicity, I don't see why start() should receive a stop channel as a parameter. Why shouldn't it always use cm.stop, and let Start() do any setup necessary to enable cm.stop?
  • Preserving cm.stop as a <-chan instead of a plain chan seems valuable, since it gets passed into external code, and we want to be clear that no other code should close it (panic ensues if it happens to get closed twice). Maybe it's sufficient to rely on the fact that we'll only pass it to functions where the signature takes a <-chan parameter, but that becomes a human-enforced convention rather than compiler-enforced policy. But maybe it's not worth having both stopper and stop to solve that problem. It's a judgement call, and I'm certainly happy either way.
  • Then just one more nit, which is optional, is that I didn't see a reason for the start() method to block at the end. So I just removed its receive from stop and was able to factor out the in-lined function. I could put it back the way it was, but it seemed like a natural improvement to remove it vs. s/<-stop/<-cm.stop/

@JoelSpeed let me know what you think of all that. I'm certainly flexible, and my main goal is just to get the bug fixed.

@droot
Copy link
Contributor

droot commented Sep 20, 2018

sorry, coming late to the party. Thanks for the detailed comments above.

I want to explore the possibility of passing in the stop channel as one of the constructor param while creating manager (and if not passed, creating the stop channel in the constructor itself).

This raises the question about the stop channel being passed to Start, don't have a good answer yet...still thinking.. but wanted to throw it out there.

@mhrivnak
Copy link
Contributor Author

I want to explore the possibility of passing in the stop channel as one of the constructor param while creating manager (and if not passed, creating the stop channel in the constructor itself).

I completely agree. That would be the best design, to just pass one stop channel to the manager's constructor and be done. As I understand it, the main reason not to do that would be backward compatibility with Start(), but maybe that can be worked around somehow.

@droot
Copy link
Contributor

droot commented Sep 21, 2018

I completely agree. That would be the best design, to just pass one stop channel to the manager's constructor and be done. As I understand it, the main reason not to do that would be backward compatibility with Start(), but maybe that can be worked around somehow.

Yes, backward compatibility is the main issue. Couple of options come to mind:

  1. Allow specifying stop channel in the manager's constructor. Do not create stop channel if not supplied one. Report an error if Start is called with non-nil stop channel. This is sort of a workaround not the complete fix.
  2. Allow specifying stop channel in the manager's constructor and create one in the constructor if not specified. Deprecate Start(stop chan) method on the manager and introduce a new one without the Stop channel StartX (for lack of better name). Start using StartX in kubebuilder and docs and other places and remove Start(stop chan) in the major revision of the pkg down the line.

Open to other ideas as well.

/cc @DirectXMan12 @pwittrock

@shawn-hurley
Copy link

Allow specifying stop channel in the manager's constructor and create one in the constructor if not specified. Deprecate Start(stop chan) method on the manager and introduce a new one without the Stop channel StartX (for lack of better name). Start using StartX in kubebuilder and docs and other places and remove Start(stop chan) in the major revision of the pkg down the line.

If you did this you would still need this code or something like it to handle the case where the user passes a stop channel into start even if it is deprecated right?

@DirectXMan12
Copy link
Contributor

We'll already need a "major" (0.Y) release this release due to non-backwards-compat changes, so it's less backwards compat, and more consistency, but if we can get a better design, we should do it now before we hit 1.0, while we're already shaking things up a bit.

@DirectXMan12
Copy link
Contributor

That said, passing and lazily initializing the stop channel ahead of time could be a decent design. If we work it right, we could even get rid of the necessity of manually calling the signal handler setup code, which is a bit ugly, IMO.

@droot
Copy link
Contributor

droot commented Sep 26, 2018

We'll already need a "major" (0.Y) release this release due to non-backwards-compat changes, so it's less backwards compat, and more consistency, but if we can get a better design, we should do it now before we hit 1.0, while we're already shaking things up a bit.
That said, passing and lazily initializing the stop channel ahead of time could be a decent design. If we work it right, we could even get rid of the necessity of manually calling the signal handler setup code, which is a bit ugly, IMO.

Agreed with both points. We can do the prep-work for 0.1.x branch next week, but in the meantime, @mhrivnak @shawn-hurley are you guys interested in vetting the proposed changes in PR (with assumption of making a breaking change) to see if it works ?

@mhrivnak
Copy link
Contributor Author

@mhrivnak @shawn-hurley are you guys interested in vetting the proposed changes in PR (with assumption of making a breaking change) to see if it works ?

Happy to help. Just to make sure I'm clear, are you asking if we want to make a new PR that tries out the breaking change approach? Happy to do so.

@droot
Copy link
Contributor

droot commented Sep 26, 2018

Happy to help. Just to make sure I'm clear, are you asking if we want to make a new PR that tries out the breaking change approach? Happy to do so.

Thanks. Yes, thats the intention here.

@mhrivnak
Copy link
Contributor Author

I'll have another PR to look at shortly that makes backward-incompatible changes, but in the mean time I revised this one. Having looked more closely at how leaderelection behaves, it appears that it is safe for us to ignore the stop channel that it passes into OnStartedLeading, which lets us get rid of the connective goroutine.

Copy link
Contributor

@DirectXMan12 DirectXMan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor naming related nits to make the code a bit easier to follow, otherwise LGTM

@@ -75,6 +75,9 @@ type controllerManager struct {
errChan chan error
stop <-chan struct{}

// stopper is the write side of the stop channel. They should have the same value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe change to (for a bit more clarity -- it took me a moment to parse what "they" was):

stopper is the write side of the stop channel. It and stop should have the same value.

select {
// Only this function should receive from stop, and everything else
// should receive from cm.stop.
case <-stop:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this flow gets a bit confusing to read with cm.stop vs stop. Maybe stop to externalStop or rename cm.stop to cm.internalStop.

if cm.resourceLock == nil {
go cm.start(stop)
go cm.start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this doesn't block. Does it really need a goroutine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
go cm.start()
cm.start()

@rohitagarwal003
Copy link

Ping. Can we get this in?

@@ -75,6 +75,9 @@ type controllerManager struct {
errChan chan error
stop <-chan struct{}

// stopper is the write side of the stop channel. They should have the same value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// stopper is the write side of the stop channel. They should have the same value.
// stopper is the write side of the stop channel (used to close `stop`). It and `stop` should have the same value.

if cm.resourceLock == nil {
go cm.start(stop)
go cm.start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
go cm.start()
cm.start()

@DirectXMan12
Copy link
Contributor

/assign @DirectXMan12

@DirectXMan12
Copy link
Contributor

I'm going to fix this up and get it merged to unblock people. Thanks for your contribution, @mhrivnak !

@mhrivnak
Copy link
Contributor Author

mhrivnak commented Nov 8, 2018

I'm going to fix this up and get it merged to unblock people. Thanks for your contribution, @mhrivnak !

Ok, thanks! I'm about a week away from having time to come back to it, so I appreciate the help. I'll be on the lookout to review it.

@DirectXMan12
Copy link
Contributor

DirectXMan12 commented Nov 9, 2018

Superseded by #204

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Channel Source is borked due to stop channel injection
7 participants