-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix for mutex slim being released too early during flush operation #1585
Conversation
Looks like the CI checks are failing for some unrelated issue, seems like the other PRs have the same issue |
Normally, time constraints are limiting how much I can get done on PRs etc,
but this is very interesting. I'm going to carve out some of tomorrow
morning to look at this, because if you're right (and I'm assuming you are,
haven't looked deep at the code yet) it could be really useful! Thanks.
…On Wed, 14 Oct 2020, 19:17 arsnyder16, ***@***.***> wrote:
Looks like the CI checks are failing for some unrelated issue, seems like
the other PRs have the same issue
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1585 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAEHMBVNRW4IYMZKORSB33SKXTLHANCNFSM4SQ5QC6A>
.
|
Thanks, i would be interested in hearing your thoughts, since i am not overly familiar with the library. Feel free to adjust it in another way, this seemed to make the most sense to me. |
I can't speak to the exact fix, but the described behavior matches exactly the symptoms we've seen provoke the issue. |
Okay, I've walked through the precise fixes. Normally, I'd like to see a continuation bound for on cancel but that's not really the model or style of the lib, I think? |
I agree the timeout can cause the lock to be released (incorrectly) while the task is actually making progress in the background. so I think this change is correct. |
Merging main in here for updated build checks |
Do we have a timeline for accepting the PR? This issue continues to block us from rolling our version forward. |
Sorry, I had planned to already look at it, but things got ... busy. I hope to look tomorrow (although I appreciate that I've said that before) |
No worries, we're in a stable-ish state on an older version, and life is pretty complicated for all of us right now. |
I concur with the problem, but I'm very dubious of allocating a CTS with scheduled timeout every call; I wonder instead if we could allocate and reuse a CTS, and only schedule the timeout (dooming it for reuse) when we get a non-sync response; something like this: e1e7189 (I recommend viewing without whitespace deltas: e1e7189?w=1) thoughts? if you agree that this minor tweak makes sense, feel free to cherry-pick or merge it into your branch, and we can get this merged. A very nuanced spot - congrats for finding this and offering a fix. |
merged with the CTS tweak as proposed |
Fixes #1438
When flushing during FlushSync we are specifying a timeout to the Task, but when this timeout gets hit the underlying task is still running. The thread that invoked FlushSync continues and releases the MutexSlim that is protecting the io pipe allowing another thread to obtain the MutexSlim and try to run operations on the pipe.
To fix use the cancellation token that can be passed to PipeWriter.FlushAsync.
An alternative to this is to not specify a timeout. There are 2 additional locations that flush the pipe neither of those locations specify a timeout they just simply await the Task. Generally the timeouts are driven by how long an operation waits to obtain the MutexSlim