Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Update UPSERT comment now that native upserts are the default #13924

Merged
merged 3 commits into from
Sep 29, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/13924.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Update an innaccurate comment in Synapse's upsert database helper.
51 changes: 41 additions & 10 deletions synapse/storage/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -1141,17 +1141,48 @@ async def simple_upsert(
desc: str = "simple_upsert",
lock: bool = True,
) -> bool:
"""
"""Insert a row with values + insertion_values; on conflict, update with values.

All of our supported databases accept the nonstandard "upsert" statement in
their dialect of SQL. We call this a "native upsert". The syntax looks roughly
like:

INSERT INTO table VALUES (values + insertion_values)
ON CONFLICT (keyvalues)
DO UPDATE SET (values); -- overwrite `values` columns only

If (values) is empty, the resulting query is slighlty simpler:

INSERT INTO table VALUES (insertion_values)
ON CONFLICT (keyvalues)
DO NOTHING; -- do not overwrite any columns

This function is a helper to build such queries.

In order for upserts to make sense, the database must be able to determine when
an upsert CONFLICTs with an existing row. Postgres and SQLite ensure this by
requiring that a unique index exist on the column names used to detect a
conflict (i.e. `keyvalues.keys()`).

If there is no such index, we can "emulate" an upsert with a SELECT followed
by either an INSERT or an UPDATE. This is unsafe: we cannot make the same
atomicity guarantees that a native upsert can and are very vulnerable to races
and crashes. Therefore if we wish to upsert without an appropriate unique index,
we must either:

1. Acquire a table-level lock before the emulated upsert (`lock=True`), or
2. VERY CAREFULLY ensure that we are the only thread and worker which will be
writing to this table, in which case we can proceed without a lock
(`lock=False`).

Generally speaking, you should use `lock=True`. If the table in question has a
unique index[*], this class will use a native upsert (which is atomic and so can
ignore the `lock` argument). Otherwise this class will use an emulated upsert,
in which case we want the safer option unless we been VERY CAREFUL.

`lock` should generally be set to True (the default), but can be set
to False if either of the following are true:
1. there is a UNIQUE INDEX on the key columns. In this case a conflict
will cause an IntegrityError in which case this function will retry
the update.
2. we somehow know that we are the only thread which will be updating
this table.
As an additional note, this parameter only matters for old SQLite versions
because we will use native upserts otherwise.
[*]: This class is aware that some tables have unique indices added as
background updates. It checks at runtime to see if those updates are
pending: if not, they are deemed safe for use with native upserts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth pointing to the mapping of these? Otherwise it is left feeling like magic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The whole thing does feel somewhat magical to me, tbh; hence the comment.)

I'll try to get this done today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like:

Suggested change
[*]: This class is aware that some tables have unique indices added as
background updates. It checks at runtime to see if those updates are
pending: if not, they are deemed safe for use with native upserts.
[*]: At runtime this class checks the tables/background indices in
UNIQUE_INDEX_BACKGROUND_UPDATES to see if it is safe to
do native upsert operations on those tables. It continually checks
until all associated background updates are finished.
Note that if a table doesn't appear in that list it is assumed to be
safe for native upserts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a go in c46f103 based on some of your words. WDYT?


Args:
table: The table to upsert into
Expand Down