Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OnManagedConnectionFailed thread/memory leak fix #1710

Merged
merged 2 commits into from
Jun 27, 2021

Conversation

alexSatov
Copy link
Contributor

Hi. I faced a problem with 'ConnectionMultiplexer': when it loses connection with Redis cluster, 'OnManagedConnectionFailed' event handler creates timer, which tries to 'SwitchMaster'. Timer ticks every second, but there is PLINQ query in 'GetConfiguredMasterForService' method, that leads to constantly threads starting (threads start faster than they exit). And if disconnect time is long enough, I get thread/memory leak in service.
PR contains hotfix of a problem with timer (timer starts next iteration only after previous iteration complete). But maybe you should take a closer look at PLINQ queries (they seem superfluous).
Also, I commented TextWriter.Null creation in 'SwitchMaster', because it seems like useless memory allocations (you handle nullable LogProxy).

@eduardobr
Copy link
Contributor

@alexSatov, if the timer iterations are overlapping, wouldn't we still have the problem after this change?
Simply denying reentrancy isn't what we need? (like this example with Interlocked)

@alexSatov
Copy link
Contributor Author

As I wrote, this change does not allow timer iterations to overlap. Arguments (0, 0) guarantee a single iteration, so I see no point in "resource lock".

@eduardobr
Copy link
Contributor

You're right about not overlapping - my bad, now I see the original interval was also changed to 0.
But in case SwitchMaster fails now it will always retry without any delay, right? In that case, this timer without interval is the same as a while(!succeeded) without a timer.
But I have no idea if removing the 1 sec delay wouldn't overwhelm the CPU in this scenario, so I'll refrain to comment.

@vait
Copy link

vait commented Mar 31, 2021

@eduardobr , you can add a delay it doesn't matter. In this change a working timer will start a new iteration only after previous one has been completed.

@@ -2535,7 +2538,8 @@ internal void OnManagedConnectionFailed(object sender, ConnectionFailedEventArgs
/// <param name="log">Log to write to, if any</param>
internal void SwitchMaster(EndPoint switchBlame, ConnectionMultiplexer connection, TextWriter log = null)
{
if (log == null) log = TextWriter.Null;
// useless memory allocations?
// if (log == null) log = TextWriter.Null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we revert this section please, scoping the change to only one thing here?

}, null, TimeSpan.FromSeconds(0), TimeSpan.FromSeconds(1));
finally
{
connection.sentinelMasterReconnectTimer?.Change(TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(0));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're disabling period pulling, the second arg should be Timeout.InfiniteTimeSpan here :)

Copy link

@vait vait Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! -1 is enough. And I've found one more mistake. Alex will send new commit next week.

Copy link
Collaborator

@NickCraver NickCraver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix here! Confirmed working 👍

@NickCraver NickCraver merged commit a49bd65 into StackExchange:main Jun 27, 2021
NickCraver pushed a commit that referenced this pull request Jun 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants