Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
…Naomarik) Huge thanks to @Naomarik for reporting and diagnosing this issue! BEFORE THIS COMMIT The following scenario was possible: t0. Client WebSocket connects to server with unique client ID. State in `conns_` atom: `{<client-id1> [<ws-sch:port-for-ip1> <udt>]}`. t1. Client's IP changes (for example, by switching from wifi to cellular network). t2. No `:on-close` event is triggered on server. t3. New client WebSocket connects to server with the SAME client ID but new IP. State in `conns_` atom: `{<client-id1> [<ws-sch:port-for-ip2> <udt>]}`. t4. Server-side connection timeout triggers for ORIGINAL connection (IP1). This unintentionally removes sch for the NEW connection (IP2). State in `conns_` atom: `{<client-id1> [nil <udt>]}`. t5. At this point: - The client has a working WebSocket connection to server. - But the server `conns_` state is bad (has a nil sch for client). - Which means that server->client broadcasts all fail. - This broken state persists until if/when something causes the client to reconnect. But note that this may not happen anytime soon since the client believes (accurately) that it IS successfully connected. IMPLEMENTATION DETAILS Each connection to server establishes the following: - An `:on-open` handler that modifies state in `conns_` for <client-id> - An `:on-close` handler that modifies state in `conns_` for <client-id> The bad behaviour occurs when: - Conn1's delayed `:on-close` triggers after Conn2's `:on-open`. Because Conn1 and Conn2 share the same client-id, they're mutating the same state. AFTER THIS COMMIT We introduce a simple compare-and-swap (CAS) mechanism in the `:on-close` handler so that its state mutations will noop if the current server-ch does not = the server-ch added by the corresponding `:on-open` handler. I.e. a given `:on-close` will now only remove the SAME server-ch added by its corresponding `:on-open`. In the example above, this means that the timeout at t4 will noop.
- Loading branch information