Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

device lists don't get resent on rapid changes #430

Closed
kegsay opened this issue May 8, 2024 · 0 comments · Fixed by #432
Closed

device lists don't get resent on rapid changes #430

kegsay opened this issue May 8, 2024 · 0 comments · Fixed by #432

Comments

@kegsay
Copy link
Member

kegsay commented May 8, 2024

Whilst working on matrix-org/complement-crypto#50 I took a tcpdump of a failing test and found convincing evidence that SS is not correctly telling clients of the later device list update for the same user. This is problematic because it means clients won't hit /keys/query at the right time (after the client has uploaded device keys) and hence won't encrypt for that device, causing UTDs.

alice rust = authorization: Bearer syt_dXNlci0xLWFsaWNl_pWOuJBbfdUlWnqLLVRgm_1rPA9l\r\n
bob JS = Authorization: Bearer syt_dXNlci0yLWJvYg_BCEPDhFaFLDZGIrkSuVX_1q3cQp\r\n
bob NEW_DEVICE String value: syt_dXNlci0yLWJvYg_RiemTZKJheBtUhAIxhWu_3qdJOR

767, 1.609s JS (bob) sees it. 
775, 1.61s SS sees it.

797, 1.619 bob (js) /keys/query for bob
823, 1.753 alice (rust) SS delivers changed entry for bob
831, 1.781 alice (rust) /keys/query for bob

875, 2.16s bob NEW_DEVICE /login response String value: syt_dXNlci0yLWJvYg_sXHbnXvELCuGoDdoQjAz_0GMdjf
908, 2.183s bob /keys/upload response syt_dXNlci0yLWJvYg_sXHbnXvELCuGoDdoQjAz_0GMdjf

917, 2.184s SS sees bob changed
928, 2.184s bob (JS) sees bob changed

953, 2.193s bob NEW_DEVICE syt_dXNlci0yLWJvYg_sXHbnXvELCuGoDdoQjAz_0GMdjf /keys/query for bob
969, 2.195s bob syt_dXNlci0yLWJvYg_BCEPDhFaFLDZGIrkSuVX_1q3cQp /keys/query for bob

Upon close inspection, this seems to be caused because device list updates don't wake up connections. This means it takes 30s for the device list update to make it to Alice, and in the meantime messages sent aren't encrypted for Bob's new device. Adding a sleep of at least 30s between Bob's new device login and Alice sending the message fixes the test.

Separately, the code for func (t *DeviceDataTable) Select(userID, deviceID string, swap bool) is not atomic, because the transaction lacks ... FOR UPDATE in the SELECT, meaning it is possible for the following to happen:

  • Reader: BEGIN TXN
  • Reader: SELECT device data
  • Poller gets a device list change entry
  • Writer: BEGIN TXN
  • Writer: SELECT device data
  • Writer: INSERT .. ON CONFLICT DO UPDATE .. [row level lock acquired]
  • Reader: UPDATE ... [blocks due to Writer txn]
  • Writer: COMMIT
  • Reader: UPDATE unblocks
  • Reader: COMMIT
  • Device list change entry was dropped.

With SELECT .. FOR UPDATE, the lock is acquired at SELECT time meaning this cannot happen and instead:

  • Reader: BEGIN TXN
  • Reader: SELECT device data [row level lock acquired]
  • Poller gets a device list change entry
  • Writer: BEGIN TXN
  • Writer: SELECT device data [blocks due to Reader txn]
  • Reader: UPDATE
  • Reader: COMMIT
  • Writer: SELECT device data unblocks
  • Writer: INSERT .. ON CONFLICT DO UPDATE .. [row level lock acquired]
  • Writer: COMMIT

If this happens, senders may not have up-to-date device lists for users, causing UTDs for the newly logged in device. For this to happen, the device must login at the same time as the client is sending SS requests.

kegsay added a commit that referenced this issue May 8, 2024
As we are selecting... for updates. Without this, we can drop
updates to the floor incorrectly.

See #430
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant