Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zedcloud/deferred: don't take lock while processing the deferred queue #3159

Merged
merged 1 commit into from
Apr 20, 2023

Conversation

rouming
Copy link
Contributor

@rouming rouming commented Apr 19, 2023

The ctx.lock protects the concurrent access to the ctx.deferredItems, so no need to keep the mutex locked while processing the queue.

Why? Queue processing can take up to several minutes (depends on the send timeout), so that all other goroutines which produce new requests (callers of the SetDeferred() or RemoveDeferred() calls) will stuck for as long as the queue is not completely processed. This mutex contention badly impacts overall system responsiveness.

CC: @milan-zededa

The ctx.lock protects the concurrent access to the ctx.deferredItems,
so no need to keep the mutex locked while processing the queue.

Why? Queue processing can take up to several minutes (depends on the
send timeout), so that all other goroutines which produce new requests
(callers of the SetDeferred() or RemoveDeferred() calls) will stuck
for as long as the queue is not completely processed. This mutex
contention badly impacts overall system responsiveness.

Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but based on the issue you were debugging there is still an issue that the main event look in zedagent can get stuck inside HandleDeferred for 6 minutes during which time it does not process any other events such as subscriptions, hence e.g., AppInstanceStatus will not be reflected in what is sent to LPS during that time.

So in addition to this PR I think we need to remove the calls to HandleDeferred from the event loop and have a timer+trigger channel goroutine do all the work.

@@ -93,23 +93,24 @@ func CreateDeferredCtx(zedcloudCtx *ZedCloudContext, sentHandler *SentHandlerFun
func (ctx *DeferredContext) HandleDeferred(event time.Time,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to update the comment where it says "try to send all" since it only tries to send the deferred items which were present when the function started and needs to be called again to send any items which were added while it was running.

@rouming
Copy link
Contributor Author

rouming commented Apr 19, 2023

LGTM, but based on the issue you were debugging there is still an issue that the main event look in zedagent can get stuck inside HandleDeferred for 6 minutes during which time it does not process any other events such as subscriptions, hence e.g., AppInstanceStatus will not be reflected in what is sent to LPS during that time.

So in addition to this PR I think we need to remove the calls to HandleDeferred from the event loop and have a timer+trigger channel goroutine do all the work.

This already done by @milan-zededa commits, which we tested and backported to the 8.12, but not yet published. Not
part of this work.

@eriknordmark
Copy link
Contributor

This already done by @milan-zededa commits, which we tested and backported to the 8.12, but not yet published. Not
part of this work.

Ack.

@eriknordmark eriknordmark merged commit 9f2c9ad into lf-edge:master Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants