Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source.Channel gets shutdown almost immediately and never processes any events #1343

Closed
charith-elastic opened this issue Jan 18, 2021 · 1 comment · Fixed by #1345
Closed

Comments

@charith-elastic
Copy link
Contributor

After upgrading to controller-runtime 0.8.0, I discovered that our external event source implemented using source.Channel is not generating any events. Looking at the code, one odd thing that stood out to me was the useless lock here:

cs.destLock.Lock()
defer cs.destLock.Unlock()
return nil

It used to protect the append to the dest array, but now the append has moved further up as a fix to #942. After fixing the lock in the code, the event source started working again so I think there must be a race somewhere. The odd thing is that this unprotected append was present in 0.7.0 as well and that didn't affect us. So I am not sure whether my hypothesis is entirely correct.

Either way, I'll be happy to submit a patch to fix the lock unless there are any objections.

@charith-elastic
Copy link
Contributor Author

charith-elastic commented Jan 18, 2021

I was too quick to celebrate with the above comment. It was just a lucky coincidence with timing. I think I found the root cause now:

for _, watch := range c.startWatches {
c.Log.Info("Starting EventSource", "source", watch.src)
watchStartCtx, cancel := context.WithTimeout(ctx, c.CacheSyncTimeout)
defer cancel()
if err := watch.src.Start(watchStartCtx, watch.handler, c.Queue, watch.predicates...); err != nil {
return err
}
}

The context created in line 166 is cancelled almost immediately by the defer statement. That context is used by the source.Channel implementation to control the goroutine that distributes the events:

func (cs *Channel) syncLoop(ctx context.Context) {
for {
select {
case <-ctx.Done():
// Close destination channels
cs.doStop()
return
case evt := <-cs.Source:
cs.distribute(evt)
}
}
}

If I understand the issue that resulted in this change (#1219) correctly, I think that the timeout context should only be used with SyncingSource implementations when the WaitForSync method is called.

@charith-elastic charith-elastic changed the title Possible race condition when adding source.Channel destination source.Channel gets shutdown almost immediately and never processes any events Jan 18, 2021
njhale added a commit to njhale/operator-lifecycle-manager that referenced this issue Apr 13, 2021
Bump controller-runtime dependency to include a fix for the Channel Source which was preventing events from being enqueued.
(see kubernetes-sigs/controller-runtime#1343)
njhale added a commit to njhale/operator-lifecycle-manager that referenced this issue Apr 13, 2021
Bump controller-runtime dependency to include a fix for the Channel Source which was preventing events from being enqueued.
(see kubernetes-sigs/controller-runtime#1343)

Signed-off-by: Nick Hale <njohnhale@gmail.com>
njhale added a commit to njhale/operator-lifecycle-manager that referenced this issue Apr 15, 2021
Bump controller-runtime dependency to include a fix for the Channel Source which was preventing events from being enqueued.
(see kubernetes-sigs/controller-runtime#1343)

Signed-off-by: Nick Hale <njohnhale@gmail.com>
njhale added a commit to njhale/operator-framework-olm that referenced this issue Apr 16, 2021
Bump controller-runtime dependency to include a fix for the Channel Source which was preventing events from being enqueued.
(see kubernetes-sigs/controller-runtime#1343)

Signed-off-by: Nick Hale <njohnhale@gmail.com>
(cherry picked from commit f7e9584916df34a3b26807f640a4aa89f0d59333)
njhale added a commit to njhale/operator-framework-olm that referenced this issue Apr 16, 2021
Bump controller-runtime dependency to include a fix for the Channel Source which was preventing events from being enqueued.
(see kubernetes-sigs/controller-runtime#1343)

Signed-off-by: Nick Hale <njohnhale@gmail.com>
(cherry picked from commit f7e9584916df34a3b26807f640a4aa89f0d59333)
benluddy pushed a commit to benluddy/operator-framework-olm that referenced this issue Jul 1, 2021
Bump controller-runtime dependency to include a fix for the Channel Source which was preventing events from being enqueued.
(see kubernetes-sigs/controller-runtime#1343)

Signed-off-by: Nick Hale <njohnhale@gmail.com>

Upstream-repository: operator-lifecycle-manager
Upstream-commit: f7e9584916df34a3b26807f640a4aa89f0d59333
openshift-merge-robot pushed a commit to openshift/operator-framework-olm that referenced this issue Jul 14, 2021
Bump controller-runtime dependency to include a fix for the Channel Source which was preventing events from being enqueued.
(see kubernetes-sigs/controller-runtime#1343)

Signed-off-by: Nick Hale <njohnhale@gmail.com>

Upstream-repository: operator-lifecycle-manager
Upstream-commit: f7e9584916df34a3b26807f640a4aa89f0d59333
njhale added a commit to njhale/operator-framework-olm that referenced this issue Jul 14, 2021
Bump controller-runtime dependency to include a fix for the Channel Source which was preventing events from being enqueued.
(see kubernetes-sigs/controller-runtime#1343)

Signed-off-by: Nick Hale <njohnhale@gmail.com>

Upstream-repository: operator-lifecycle-manager
Upstream-commit: f7e9584916df34a3b26807f640a4aa89f0d59333
njhale added a commit to njhale/operator-framework-olm that referenced this issue Jul 22, 2021
Bump controller-runtime dependency to include a fix for the Channel Source which was preventing events from being enqueued.
(see kubernetes-sigs/controller-runtime#1343)

Signed-off-by: Nick Hale <njohnhale@gmail.com>

Upstream-repository: operator-lifecycle-manager
Upstream-commit: f7e9584916df34a3b26807f640a4aa89f0d59333
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant