Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent too many ProcessBacklog() from being scheduled #1574

Closed
wants to merge 2 commits into from

Conversation

devbv
Copy link

@devbv devbv commented Sep 23, 2020

This PR prevents the situation that a consumer makes the number of backlog 1->0 infinitely and many producers schedule new ProcessBacklog() after they make the number of backlog 0->1.
The situation could lead a starvation of thread pool and a lot of timeout when we use transaction heavily. (#1572)

@devbv devbv changed the title Prevent too many ProcesBacklog() from being scheduled Prevent too many ProcessBacklog() from being scheduled Oct 12, 2020
@devbv
Copy link
Author

devbv commented Oct 12, 2020

@mgravell Hello, Could you take a look for this MR or issue #1572 ?

@arsnyder16
Copy link

@mgravell I am also seeing these errors in my logs. They are a semi related to the issue solved with #1585 in that in these timeout scenarios seems to cascade and build upon each other, and the library struggles to recover.

I would also be interested in seeing your opinion on getting this change in as well

@devbv
Copy link
Author

devbv commented Oct 17, 2020

image
Additional info for better understanding. This is stacktrace of threads when the timeout is occurred. You can see 9 of 10 worker threads are stopped at MutexSlim.TakeWithTimeout called by PhysicalBridge.ProcessBacklog()
This is reproduced from the sample code on #1572

@NickCraver
Copy link
Collaborator

Merging main in here for build check changes

mgravell added a commit that referenced this pull request Nov 13, 2020
- make the backlog processor async
- check for work *before* trying to get the lock
@mgravell mgravell mentioned this pull request Nov 13, 2020
@mgravell
Copy link
Collaborator

This is a great find, and could explain a lot, but I'm not sure that the approach is the best way of solving it - I'm playing with an alternative strategy in #1612 - but: even if I merge that, I'll still give credit for the research here; much appreciated!

@mgravell
Copy link
Collaborator

Have merged #1612, but credited in release notes:

@mgravell mgravell closed this Nov 13, 2020
mgravell added a commit that referenced this pull request Nov 13, 2020
* alternative implementation along the lines of #1574:

- make the backlog processor async
- check for work *before* trying to get the lock

* release notes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants