ReconfigureAsync: fix re-entrancy issue #1772

NickCraver · 2021-06-19T17:36:46Z

When this was refactored long ago, we went from a counter starting at 1 (which handled the first run) to a string reason (which defaults to empty). Combined with the || clause format, this subtly introduced a re-entrancy bug where a subsequent run can enter during the first run (where first was true, so no exchange for reason occurred).

This was causing several Sentinel issues due to on connect handlers setting config as nodes returned and ultimately the Set<T> triggering a ReconfigureIfNeeded....and actually running, while the original config was still in progress. This led to all sorts of race oddness especially if one endpoint was any kind of significantly different in timing than another.

Note: the other changes are only labels on the bool arguments for clarity at the call sites.

When this was refactored long ago, we went from a counter starting at 1 (which handled the first run) to a string reason (which defaults to empty). Combined with the || clause format, this subtly introduced a re-entrancy bug where a subsequent run can enter during the first run (where first was true, so no exchange for reason occurred). This was causing several Sentinel issues due to on connect handlers setting config as nodes returned and ultimately the Set<T> triggering a ReconfigureIfNeeded....and actually running, while the original config was still in progress. This led to all sorts of race oddness especially if one endpoint was any kind of significantly different in timing than another.

NickCraver · 2021-06-19T17:55:21Z

cc @TimLovellSmith on this fix - not the only Sentinel issue but it was one of the NoConnectionAvailable sources, because a re-entrancy race improperly set both nodes as master yielding an unselectable replica in the DemandReplica path. In reality, this probably caused far more production connection problems.

NickCraver added ⚙️ area:connection 🪲 bug labels Jun 19, 2021

NickCraver marked this pull request as ready for review June 19, 2021 17:51

NickCraver merged commit a767cae into main Jun 19, 2021

NickCraver deleted the craver/reconfigure-fix-for-reentrancy branch June 19, 2021 17:55

NickCraver added a commit that referenced this pull request Jun 19, 2021

Include #1772 in release notes

ae22e1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReconfigureAsync: fix re-entrancy issue #1772

ReconfigureAsync: fix re-entrancy issue #1772

NickCraver commented Jun 19, 2021 •

edited

Loading

NickCraver commented Jun 19, 2021

ReconfigureAsync: fix re-entrancy issue #1772

ReconfigureAsync: fix re-entrancy issue #1772

Conversation

NickCraver commented Jun 19, 2021 • edited Loading

NickCraver commented Jun 19, 2021

NickCraver commented Jun 19, 2021 •

edited

Loading