Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Remove whole table locks on push rule add/delete #16051

Merged
merged 11 commits into from
Nov 13, 2023
1 change: 1 addition & 0 deletions changelog.d/16051.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Remove whole table locks on push rule modifications. Contributed by Nick @ Beeper (@fizzadar).
43 changes: 27 additions & 16 deletions synapse/storage/databases/main/push_rule.py
Original file line number Diff line number Diff line change
Expand Up @@ -449,26 +449,28 @@ def _add_push_rule_relative_txn(
before: str,
after: str,
) -> None:
# Lock the table since otherwise we'll have annoying races between the
# SELECT here and the UPSERT below.
self.database_engine.lock_table(txn, "push_rules")
Comment on lines -452 to -454
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to describe the race to ensure we have a shared understanding. Given two rules rule_A and rule_B with priorities 0 and 1, respectively. If you make two requests:

  1. Add a new push rule (rule_X) after rule_A, this should make rule_B a priority of 2; rule_X priority 1; and leave rule_A at 0.
  2. Add a new push rule (rule_Y) after rule_A, this should make rule_B a priority of 3; rule_X priority 2; rule_Y priority 1; and leave rule_A at 0.

This is in a transaction (I assume running at READ COMMITTED), so what happens if these race?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the winner will be applied and the second transaction will be replayed using the new updated data, per (added some breaks to make it easier to read, my brain hurts!):

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress).

If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits .... it will attempt to apply its operation to the updated version of the row.

The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row.

If my understanding is correct READ COMMITTED will effectively correct the issue by the replay.

Copy link
Contributor Author

@Fizzadar Fizzadar Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said! Synapse's default is actually one better, REPEATABLE READ, in which case things are much simpler:

UPDATE, DELETE, MERGE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the transaction start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the repeatable read transaction will wait for the first updating transaction to commit ... if the first updater commits (and actually updated or deleted the row, not just locked it) then the repeatable read transaction will be rolled back with the message ERROR: could not serialize access due to concurrent update

Which synapse automatically retries, which would replay the transaction as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final note - there is an issue somewhere about switching to READ COMMITTED as the default, but it seems that would also suffice here in terms of the potential race conditions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I believe that you're correct (but my brain also hurts)!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree here. As things stand I don't see why we'd necessarily replay the transactions, as we may not have updated/deleted/locked any of the rows we SELECT against (and inserting a new row that would have been picked up by a SELECT isn't picked up by postgres except in SERIALIZABLE isolation AIUI).

I think what you want here is to run the selects with a FOR SHARE so that they do conflict with each other?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess most of the time the UPDATE will conflict, but if we have two requests to add a rule to the top of the push rules those transactions should conflict but won't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense, will add FOR SHARE in 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, we probably need a FOR UPDATE instead as we need the SELECT statements to conflict with each other and FOR SHARE won't do that https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-ROWS


relative_to_rule = before or after

res = self.db_pool.simple_select_one_txn(
txn,
table="push_rules",
keyvalues={"user_name": user_id, "rule_id": relative_to_rule},
retcols=["priority_class", "priority"],
allow_none=True,
)
sql = """
SELECT priority, priority_class FROM push_rules
WHERE user_name = ? AND rule_id = ?
"""

if isinstance(self.database_engine, PostgresEngine):
sql += " FOR UPDATE"
else:
# Annoyingly SQLite doesn't support row level locking, so lock the whole table
self.database_engine.lock_table(txn, "push_rules")

txn.execute(sql, (user_id, relative_to_rule))
row = txn.fetchone()

if not res:
if row is None:
raise RuleNotFoundException(
"before/after rule not found: %s" % (relative_to_rule,)
)

base_priority_class, base_rule_priority = res
base_rule_priority, base_priority_class = row

if base_priority_class != priority_class:
raise InconsistentRuleException(
Expand Down Expand Up @@ -516,9 +518,18 @@ def _add_push_rule_highest_priority_txn(
conditions_json: str,
actions_json: str,
) -> None:
# Lock the table since otherwise we'll have annoying races between the
# SELECT here and the UPSERT below.
self.database_engine.lock_table(txn, "push_rules")
if isinstance(self.database_engine, PostgresEngine):
# Postgres doesn't do FOR UPDATE on aggregate functions, so select the rows first
# then re-select the count/max below.
sql = """
SELECT * FROM push_rules
WHERE user_name = ? and priority_class = ?
FOR UPDATE
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth doing a txn.execute for this SQL? 😆

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But also, you probably want to do this as part of the COUNT(*) query below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep good spot - 2ec17da

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing a tx.execute here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah, missed after rewriting it again in 2ec17da; fix: 376313e

txn.execute(sql, (user_id, priority_class))
else:
# Annoyingly SQLite doesn't support row level locking, so lock the whole table
self.database_engine.lock_table(txn, "push_rules")

# find the highest priority rule in that class
sql = (
Expand Down
Loading