Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Optimise _update_client_ips_batch_txn to batch together database operations. #12252

Merged
merged 25 commits into from
Apr 8, 2022
Merged
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
7cf8b2e
Upsert many client IPs at once, more efficiently
reivilibre Mar 18, 2022
c7cfbf5
Newsfile
reivilibre Mar 18, 2022
45c98e0
Add a simple update many txn
reivilibre Mar 24, 2022
0646b4b
Add non-txn simple update many
reivilibre Mar 24, 2022
1acd8cd
Use simple_update_many for devices
reivilibre Mar 25, 2022
1e961ad
Isolate locals by splitting into two smaller functions
reivilibre Mar 25, 2022
503f5c8
Make _dump_to_tuple dump the entire table without undue ceremony
reivilibre Mar 28, 2022
5f29099
Rename some things to clarify
reivilibre Mar 28, 2022
e7c4907
Add a test for simple_update_many
reivilibre Mar 28, 2022
5675a94
Add an exception for when the number of value rows and key rows don't…
reivilibre Mar 28, 2022
e7985d2
Antilint
reivilibre Apr 6, 2022
7edf6f7
Add lock=True for the emulated many-upsert case
reivilibre Apr 7, 2022
9556aae
Don't double-lock the table
reivilibre Apr 7, 2022
303fba6
Inline column names
reivilibre Apr 7, 2022
ac4b1d5
Flatten user_ips builder
reivilibre Apr 7, 2022
34403cb
Flatten devices builder
reivilibre Apr 7, 2022
0f8e98b
Fix up docstring
reivilibre Apr 7, 2022
481b730
Remove dead variable
reivilibre Apr 7, 2022
99e6b66
Don't return for None
reivilibre Apr 7, 2022
3f7f659
Update synapse/storage/database.py
reivilibre Apr 7, 2022
50f2b91
Bail out when there's nothing to do
reivilibre Apr 7, 2022
93e1237
Apply suggestions from code review
reivilibre Apr 8, 2022
c0aaec4
Antilint
reivilibre Apr 8, 2022
c456958
Lock only once for batch emulated upserts
reivilibre Apr 8, 2022
214aacf
No need to manually lock
reivilibre Apr 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion synapse/storage/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -1360,11 +1360,17 @@ def simple_upsert_many_txn_emulated(
if not value_names:
value_values = [() for x in range(len(key_values))]

if lock:
# Lock the table just once, to prevent it being done once per row.
# Note that, according to Postgres' documentation, once obtained,
# the lock is held for the remainder of the current transaction.
self.engine.lock_table(txn, "user_ips")
Comment on lines +1364 to +1367
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this code is used on postgres though since that doesn't need the emulated support here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, sigh, you're absolutely right. Locking on SQLite is a no-op (apparently! I haven't fully reasoned through how that's sufficient, but the code is a no-op...?!).

  • Postgres: always supports native upsert, so locking doesn't apply.
  • SQLite: may not support native upsert, locking is a no-op.

Why do upserts have locking at all then? Maybe I made the mistake of assuming that there must have been a reason for it because of the code that was here before...

In one of the cases, an emulated upsert can be a SELECT followed by an INSERT. In the other case, it's an UPDATE followed by an INSERT...

The latter case (UPDATE, INSERT) doesn't sound like it would need a lock, because both of those operations need a PENDING lock. There's no way another writer can get in the middle of those two.

(Assuming traditional SQLite3 locking) The first case (SELECT, INSERT) may be problematic because SELECT only needs a SHARED lock. But it needs to upgrade to a PENDING lock when it begins the INSERT. If there's another PENDING or stronger lock, it will fail and will need to retry the transaction.

I can't tell whether this works properly in WAL mode or not; the WAL docs aren't particularly clear about how transactions interact like the old-style locking page is. That said, the page on transactions has this to say:

A read transaction is used for reading only. A write transaction allows both reading and writing. A read transaction is started by a SELECT statement, and a write transaction is started by statements like CREATE, DELETE, DROP, INSERT, or UPDATE (collectively "write statements"). If a write statement occurs while a read transaction is active, then the read transaction is upgraded to a write transaction if possible. If some other database connection has already modified the database or is already in the process of modifying the database, then upgrading to a write transaction is not possible and the write statement will fail with SQLITE_BUSY.

Assuming that's still true for WAL mode, then I think it's safe on SQLite.


Ah! However, there's one case where Postgres DOES need the emulated support, so I need to take back some of the things I just wrote :/.
That's when the unique index on the table is not ready yet and the table is 'unsafe to upsert'. (This is only a notable effect with Postgres, since SQLite's indices block when adding; Postgres' are added concurrently sometimes). In that case, locking does serve a useful role because Postgres' MVCC means that duplicates could be introduced (and then lead to the failure of creating the index altogether).
Aside: user_directory_search is an eternal exception — it's always 'unsafe to upsert' on SQLite....

That turned into something more complex than I thought :/

Copy link
Contributor Author

@reivilibre reivilibre Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A read transaction is used for reading only. A write transaction allows both reading and writing. A read transaction is started by a SELECT statement, and a write transaction is started by statements like CREATE, DELETE, DROP, INSERT, or UPDATE (collectively "write statements"). If a write statement occurs while a read transaction is active, then the read transaction is upgraded to a write transaction if possible. If some other database connection has already modified the database or is already in the process of modifying the database, then upgrading to a write transaction is not possible and the write statement will fail with SQLITE_BUSY.

Assuming that's still true for WAL mode, then I think it's safe on SQLite.

I tried this and it seems safe on both WAL and non-WAL mode.

The advantage of WAL mode in this case is that another transaction merely running 'select' and hanging around doesn't prevent you from committing. But that transaction will have to restart (at least if it's in conflict; not sure about generally) — it gets told the database is locked each time it tries to commit or update.

I learnt a bit about SQLite's WAL mode, so that's good at least


for keyv, valv in zip(key_values, value_values):
_keys = {x: y for x, y in zip(key_names, keyv)}
_vals = {x: y for x, y in zip(value_names, valv)}

self.simple_upsert_txn_emulated(txn, table, _keys, _vals, lock=lock)
self.simple_upsert_txn_emulated(txn, table, _keys, _vals, lock=False)

def simple_upsert_many_txn_native_upsert(
self,
Expand Down