Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Background process rotate_notifs fails due to unique constraint violation #14641

Closed
highsineburgh opened this issue Dec 7, 2022 · 9 comments · Fixed by #14669
Closed

Background process rotate_notifs fails due to unique constraint violation #14641

highsineburgh opened this issue Dec 7, 2022 · 9 comments · Fixed by #14669
Assignees
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db A-Threads Threaded messages O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Needs-Info This issue is blocked awaiting information from the reporter

Comments

@highsineburgh
Copy link

Description

We've recently started seeing an exception being thrown every 30s from the background process 'rotate_notifs'. There does not appear to be any impact on users.

synapse.metrics.background_process_metrics: [rotate_notifs-179] Background process 'rotate_notifs' threw an exception

This appears to be the result of a violation of a unique constraint on the event_push_summary table. Full stack trace of the error is provided below.

Steps to reproduce

  • Tail logs

Homeserver

self-hosted

Synapse Version

{"server_version":"1.70.1","python_version":"3.9.13"}

Installation Method

Other (please mention below)

Database

SQLite

Workers

Single process

Platform

Running on NixOS using the Nix package on AWS instance.

Configuration

Some users have enabled threads.

Relevant log output

synapse.metrics.background_process_metrics: [rotate_notifs-179] Background process 'rotate_notifs' threw an exception
Traceback (most recent call last):
  File "/nix/store/5mwm1djp7w1vzvijjwwc4x9wx9y4cwzf-matrix-synapse-1.70.1/lib/python3.9/site-packages/synapse/metrics/background_process_metrics.py", line 240, in run
    return await func(*args, **kwargs)
  File "/nix/store/5mwm1djp7w1vzvijjwwc4x9wx9y4cwzf-matrix-synapse-1.70.1/lib/python3.9/site-packages/synapse/storage/databases/main/event_push_actions.py", line 1307, in _rotate_notifs
    caught_up = await self.db_pool.runInteraction(
  File "/nix/store/5mwm1djp7w1vzvijjwwc4x9wx9y4cwzf-matrix-synapse-1.70.1/lib/python3.9/site-packages/synapse/storage/database.py", line 881, in runInteraction
    return await delay_cancellation(_runInteraction())
  File "/nix/store/n2a4bbdslmh0mj9h9bv2cxwpgzgvcg0a-python3.9-Twisted-22.4.0/lib/python3.9/site-packages/twisted/internet/defer.py", line 1656, in _inlineCallbacks
    result = current_context.run(
  File "/nix/store/n2a4bbdslmh0mj9h9bv2cxwpgzgvcg0a-python3.9-Twisted-22.4.0/lib/python3.9/site-packages/twisted/python/failure.py", line 514, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/nix/store/5mwm1djp7w1vzvijjwwc4x9wx9y4cwzf-matrix-synapse-1.70.1/lib/python3.9/site-packages/synapse/storage/database.py", line 848, in _runInteraction
    result = await self.runWithConnection(
  File "/nix/store/5mwm1djp7w1vzvijjwwc4x9wx9y4cwzf-matrix-synapse-1.70.1/lib/python3.9/site-packages/synapse/storage/database.py", line 976, in runWithConnection
    return await make_deferred_yieldable(
  File "/nix/store/n2a4bbdslmh0mj9h9bv2cxwpgzgvcg0a-python3.9-Twisted-22.4.0/lib/python3.9/site-packages/twisted/python/threadpool.py", line 244, in inContext
    result = inContext.theWork()  # type: ignore[attr-defined]
  File "/nix/store/n2a4bbdslmh0mj9h9bv2cxwpgzgvcg0a-python3.9-Twisted-22.4.0/lib/python3.9/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
  File "/nix/store/n2a4bbdslmh0mj9h9bv2cxwpgzgvcg0a-python3.9-Twisted-22.4.0/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/nix/store/n2a4bbdslmh0mj9h9bv2cxwpgzgvcg0a-python3.9-Twisted-22.4.0/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
  File "/nix/store/n2a4bbdslmh0mj9h9bv2cxwpgzgvcg0a-python3.9-Twisted-22.4.0/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 282, in _runWithConnection
    result = func(conn, *args, **kw)
  File "/nix/store/5mwm1djp7w1vzvijjwwc4x9wx9y4cwzf-matrix-synapse-1.70.1/lib/python3.9/site-packages/synapse/storage/database.py", line 969, in inner_func
    return func(db_conn, *args, **kwargs)
  File "/nix/store/5mwm1djp7w1vzvijjwwc4x9wx9y4cwzf-matrix-synapse-1.70.1/lib/python3.9/site-packages/synapse/storage/database.py", line 710, in new_transaction
    r = func(cursor, *args, **kwargs)
  File "/nix/store/5mwm1djp7w1vzvijjwwc4x9wx9y4cwzf-matrix-synapse-1.70.1/lib/python3.9/site-packages/synapse/storage/databases/main/event_push_actions.py", line 1530, in _rotate_notifs_txn
    self._rotate_notifs_before_txn(
  File "/nix/store/5mwm1djp7w1vzvijjwwc4x9wx9y4cwzf-matrix-synapse-1.70.1/lib/python3.9/site-packages/synapse/storage/databases/main/event_push_actions.py", line 1643, in _rotate_notifs_before_txn
    self.db_pool.simple_upsert_many_txn(
  File "/nix/store/5mwm1djp7w1vzvijjwwc4x9wx9y4cwzf-matrix-synapse-1.70.1/lib/python3.9/site-packages/synapse/storage/database.py", line 1464, in simple_upsert_many_txn
    return self.simple_upsert_many_txn_native_upsert(
  File "/nix/store/5mwm1djp7w1vzvijjwwc4x9wx9y4cwzf-matrix-synapse-1.70.1/lib/python3.9/site-packages/synapse/storage/database.py", line 1571, in simple_upsert_many_txn_native_upsert
    return txn.execute_batch(sql, args)
  File "/nix/store/5mwm1djp7w1vzvijjwwc4x9wx9y4cwzf-matrix-synapse-1.70.1/lib/python3.9/site-packages/synapse/storage/database.py", line 368, in execute_batch
    self.executemany(sql, args)
  File "/nix/store/5mwm1djp7w1vzvijjwwc4x9wx9y4cwzf-matrix-synapse-1.70.1/lib/python3.9/site-packages/synapse/storage/database.py", line 391, in executemany
    self._do_execute(self.txn.executemany, sql, *args)
  File "/nix/store/5mwm1djp7w1vzvijjwwc4x9wx9y4cwzf-matrix-synapse-1.70.1/lib/python3.9/site-packages/synapse/storage/database.py", line 436, in _do_execute
    return func(sql, *args, **kwargs)
sqlite3.IntegrityError: UNIQUE constraint failed: event_push_summary.user_id, event_push_summary.room_id

Anything else that would be useful to know?

From the debug logs - the sql values looking at user_id, room_id and thread_id are unique, but user_id and room_id are not.

@DMRobertson DMRobertson changed the title Background process 'rotate_notifs' threw an exception error every 30 secs Background process rotate_notifs fails due to unique constraint violation Dec 8, 2022
@DMRobertson DMRobertson added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. A-Threads Threaded messages A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Uncommon Most users are unlikely to come across this or unexpected workflow X-Needs-Info This issue is blocked awaiting information from the reporter labels Dec 8, 2022
@DMRobertson
Copy link
Contributor

Can you please connect to the db and show us the output of

.schema event_push_summary

and

SELECT * FROM background_updates;

please?

@reivilibre
Copy link
Contributor

At first glance this looks like a duplicate of (part of) #14120 (introduced by #13753). #14120 was supposedly fixed by #14138 in v1.69.0rc2, but I think this is referring to a different aspect of the issue.

The new index on (user_id, room_id, thread_id) is less restrictive than the one on (user_id, room_id) and the old index should have been replaced once the new one is in place, but this was accidentally forgotten. #13776 should have fixed this... https://github.com/matrix-org/synapse/pull/13776/files#diff-e9a99066ac572d35ea4b91b415bec33dadeefcb93e861151f6f270c76095e640R29

The only reason I can imagine it didn't is if you have background updates left over; please could you SELECT * FROM background_updates ORDER BY ordering;?

@highsineburgh
Copy link
Author

event_push_summary schema:

> .schema event_push_summary
CREATE TABLE event_push_summary ( user_id TEXT NOT NULL, room_id TEXT NOT NULL, notif_count BIGINT NOT NULL, stream_ordering BIGINT NOT NULL , unread_count BIGINT, last_receipt_stream_ordering BIGINT, thread_id TEXT);
CREATE UNIQUE INDEX event_push_summary_unique_index ON event_push_summary (user_id, room_id) ;
CREATE UNIQUE INDEX event_push_summary_unique_index2 ON event_push_summary (user_id, room_id, thread_id) ;
CREATE INDEX event_push_summary_thread_id_null ON event_push_summary (thread_id) WHERE thread_id IS NULL;

background_updates:

sqlite> SELECT * FROM background_updates ORDER BY ordering;
sqlite>

If the issue is the unique index on user_id, room_id is it safe to drop this index?

@clokep
Copy link
Member

clokep commented Dec 9, 2022

Did you by any chance do a big jump from Synapse < 1.62.0 to >= 1.70.0? If so, what could have happened is that we registered multiple background jobs to create indexes (event_push_summary_unique_index from #13005, and event_push_summary_unique_index2 from #13753) and also tried to drop the first only only if it exists at the same time (see #14222).

We should probably add a background update to delete the first background job and attempt to re-delete the index (if it exists).

DELETE FROM background_updates WHERE update_name = 'event_push_summary_unique_index';
DROP INDEX IF EXISTS event_push_summary_unique_index;

We should check the other tables too, this could be an issue there too.

@highsineburgh
Copy link
Author

Did you by any chance do a big jump from Synapse < 1.62.0 to >= 1.70.0?

It appears that we did. The upgrade moved from 1.59.1 to 1.70.1 from what I can tell.

@asymmetric
Copy link
Contributor

asymmetric commented Dec 10, 2022

I've just seen this and I also did a big upgrade jump (in the same ballpark as @highsineburgh) a couple weeks back.

Additionally, just like @highsineburgh, I'm using sqlite.

sqlite> .schema event_push_summary
CREATE TABLE event_push_summary ( user_id TEXT NOT NULL, room_id TEXT NOT NULL, notif_count BIGINT NOT NULL, stream_ordering BIGINT NOT NULL , unread_count BIGINT, last_receipt_stream_ordering BIGINT, thread_id TEXT);
CREATE UNIQUE INDEX event_push_summary_unique_index ON event_push_summary (user_id, room_id) ;
CREATE UNIQUE INDEX event_push_summary_unique_index2 ON event_push_summary (user_id, room_id, thread_id) ;
CREATE INDEX event_push_summary_thread_id_null ON event_push_summary (thread_id) WHERE thread_id IS NULL;
sqlite> SELECT * FROM background_updates ORDER BY ordering;
sqlite> 

Synapse 1.71.0

@clokep
Copy link
Member

clokep commented Dec 12, 2022

@highsineburgh @asymmetric I took a look at this again and I think you're safe to run my SQL from above to drop the index. If you'd prefer to wait then #14669 should fix it for you (and any other affected people / stop it from occurring again in the future).

@highsineburgh
Copy link
Author

@highsineburgh @asymmetric I took a look at this again and I think you're safe to run my SQL from above to drop the index. If you'd prefer to wait then #14669 should fix it for you (and any other affected people / stop it from occurring again in the future).

Thanks, dropping the index has prevented the error message from appearing and everything seems to be functioning properly.

@clokep clokep linked a pull request Dec 15, 2022 that will close this issue
@clokep
Copy link
Member

clokep commented Dec 15, 2022

Should be fixed by #14669.

@clokep clokep closed this as completed Dec 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db A-Threads Threaded messages O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Needs-Info This issue is blocked awaiting information from the reporter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants