Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReconfigureAsync: fix re-entrancy issue #1772

Merged
merged 1 commit into from
Jun 19, 2021

Conversation

NickCraver
Copy link
Collaborator

@NickCraver NickCraver commented Jun 19, 2021

When this was refactored long ago, we went from a counter starting at 1 (which handled the first run) to a string reason (which defaults to empty). Combined with the || clause format, this subtly introduced a re-entrancy bug where a subsequent run can enter during the first run (where first was true, so no exchange for reason occurred).

This was causing several Sentinel issues due to on connect handlers setting config as nodes returned and ultimately the Set<T> triggering a ReconfigureIfNeeded....and actually running, while the original config was still in progress. This led to all sorts of race oddness especially if one endpoint was any kind of significantly different in timing than another.

Note: the other changes are only labels on the bool arguments for clarity at the call sites.

When this was refactored long ago, we went from a counter starting at 1 (which handled the first run) to a string reason (which defaults to empty). Combined with the || clause format, this subtly introduced a re-entrancy bug where a subsequent run can enter during the first run (where first was true, so no exchange for reason occurred).

This was causing several Sentinel issues due to on connect handlers setting config as nodes returned and ultimately the Set<T> triggering a ReconfigureIfNeeded....and actually running, while the original config was still in progress. This led to all sorts of race oddness especially if one endpoint was any kind of significantly different in timing than another.
@NickCraver NickCraver marked this pull request as ready for review June 19, 2021 17:51
@NickCraver
Copy link
Collaborator Author

cc @TimLovellSmith on this fix - not the only Sentinel issue but it was one of the NoConnectionAvailable sources, because a re-entrancy race improperly set both nodes as master yielding an unselectable replica in the DemandReplica path. In reality, this probably caused far more production connection problems.

@NickCraver NickCraver merged commit a767cae into main Jun 19, 2021
@NickCraver NickCraver deleted the craver/reconfigure-fix-for-reentrancy branch June 19, 2021 17:55
NickCraver added a commit that referenced this pull request Jun 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant