Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Make sequence cache_invalidation_stream_seq begin at 2 #13766

Merged
merged 6 commits into from
Sep 13, 2022

Conversation

MatMaul
Copy link
Contributor

@MatMaul MatMaul commented Sep 9, 2022

The sequence needs to begin at 2 because a bunch of code assume that get_next will return values >= 2, cf this comment:

# We start at 1 here as a) the first generated stream ID will be 2, and
# b) other parts of the code assume that stream IDs are strictly greater
# than 0.

This is triggered by the test in #13667.

Also discussed at an internal room.

Pull Request Checklist

@MatMaul MatMaul requested a review from a team as a code owner September 9, 2022 14:14
@MatMaul MatMaul force-pushed the mv/fix-cache-seq-two branch 2 times, most recently from 8327c7d to a19eb80 Compare September 9, 2022 15:09
Copy link
Contributor

@squahtx squahtx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Is the intent of this PR is to fix the tests and not any live deployments?

changelog.d/13766.bugfix Outdated Show resolved Hide resolved
@MatMaul
Copy link
Contributor Author

MatMaul commented Sep 12, 2022

Is the intent of this PR is to fix the tests and not any live deployments?

Triggering that on a deploy is doable I think but rare and low impact.
On creating a new server the caches and the data are empty, and the first action will usually be an user creation so chances are that a lost cache invalidation has no impact.
I think it's more easy to trigger for people that enable replication after the fact, but it's still contained to the really first invalidation on the first start. Any subsequent restart will fix the problem for sure.

@MatMaul MatMaul force-pushed the mv/fix-cache-seq-two branch from a9c101a to 0139587 Compare September 12, 2022 10:35
@MatMaul MatMaul requested a review from squahtx September 12, 2022 10:50
Copy link
Contributor

@squahtx squahtx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@MatMaul MatMaul force-pushed the mv/fix-cache-seq-two branch from 0139587 to e5d03c7 Compare September 12, 2022 12:27
@MatMaul MatMaul enabled auto-merge (squash) September 12, 2022 12:30
@MatMaul MatMaul force-pushed the mv/fix-cache-seq-two branch from e5d03c7 to 754334b Compare September 12, 2022 14:10
@MadLittleMods MadLittleMods added the T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. label Sep 12, 2022
@MadLittleMods MadLittleMods changed the title Make sequence cache_invalidation_stream_seq begins at 2 Make sequence cache_invalidation_stream_seq begin at 2 Sep 12, 2022
Comment on lines 32 to 34
-- The sequence needs to begin at 2 because a bunch of code assumes that
-- get_next_id_txn will return values >= 2, cf this comment:
-- https://github.com/matrix-org/synapse/blob/b93bd95e8ab64d27ae26841020f62ee61272a5f2/synapse/storage/util/id_generators.py#L344
Copy link
Contributor

@MadLittleMods MadLittleMods Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we detect when the stream position is <= 0 somewhere and throw an error to find our mistakes?

Seems like that could be at the end of the constructor of MultiWriterIdGenerator after we calculate the _current_positions

Maybe as part of synapse/storage/util/sequence.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was kind of trying to avoid spending more time on this TBH :) I can add an error in the logs but since triggering should be awfully rare (only on server creation or enabling replication the first time) I don't expect that we would easily catch all the code.

Copy link
Contributor

@MadLittleMods MadLittleMods Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatMaul chatted with me a bit about this point. The goal of this change is just so that others don't have to figure out all of this context again when they run into this pitfall with another table and their tests just don't work like what spawned this in the first place.

The real solution around this would be to fix the weird stream ID's starting at 2 constraint. Should just work starting from any number ideally 🤷

Instead of failing the process, @MatMaul mentioned a possibility of automatically correcting and fixing the problem by incrementing the stream sequence ID. But we don't really know if other tables have the same 2 constraint beyond the comment in the code and don't want to overfix the problem.

In any case, maintainers are happy to move this forward without ⏩

@MatMaul MatMaul disabled auto-merge September 13, 2022 07:18
Mathieu Velten and others added 5 commits September 13, 2022 09:33
Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
…gres

Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
@MatMaul MatMaul merged commit 12dacec into develop Sep 13, 2022
@MatMaul MatMaul deleted the mv/fix-cache-seq-two branch September 13, 2022 14:14
@@ -0,0 +1 @@
Fix a long-standing bug where the `cache_invalidation_stream_seq` sequence would begin at 1 instead of 2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for links: this entry got improved in ec4dca6.

@DMRobertson
Copy link
Contributor

The sequence needs to begin at 2 because a bunch of code assume that get_next will return values >= 2, cf this comment:

Are there other sequences that make this assumption which could be vulnerable to this problem?

@DMRobertson DMRobertson mentioned this pull request Sep 23, 2022
@MatMaul
Copy link
Contributor Author

MatMaul commented Sep 23, 2022

I don't think so but I haven't really tried hard to find them either TBH.

Fizzadar added a commit to beeper/synapse-legacy-fork that referenced this pull request Sep 30, 2022
Synapse 1.68.0 (2022-09-27)
===========================

Please note that Synapse will now refuse to start if configured to use a version of SQLite older than 3.27.

In addition, please note that installing Synapse from a source checkout now requires a recent Rust compiler.
Those using packages will not be affected. On most platforms, installing with `pip install matrix-synapse` will not be affected.
See the [upgrade notes](https://matrix-org.github.io/synapse/v1.68/upgrade.html#upgrading-to-v1680).

Bugfixes
--------

- Fix packaging to include `Cargo.lock` in `sdist`. ([\matrix-org#13909](matrix-org#13909))

Synapse 1.68.0rc2 (2022-09-23)
==============================

Bugfixes
--------

- Fix building from packaged sdist. Broken in v1.68.0rc1. ([\matrix-org#13866](matrix-org#13866))

Internal Changes
----------------

- Fix the release script not publishing binary wheels. ([\matrix-org#13850](matrix-org#13850))
- Lower minimum supported rustc version to 1.58.1. ([\matrix-org#13857](matrix-org#13857))
- Lock Rust dependencies' versions. ([\matrix-org#13858](matrix-org#13858))

Synapse 1.68.0rc1 (2022-09-20)
==============================

Features
--------

- Keep track of when we fail to process a pulled event over federation so we can intelligently back off in the future. ([\matrix-org#13589](matrix-org#13589), [\matrix-org#13814](matrix-org#13814))
- Add an [admin API endpoint to fetch messages within a particular window of time](https://matrix-org.github.io/synapse/v1.68/admin_api/rooms.html#room-messages-api). ([\matrix-org#13672](matrix-org#13672))
- Add an [admin API endpoint to find a user based on their external ID in an auth provider](https://matrix-org.github.io/synapse/v1.68/admin_api/user_admin_api.html#find-a-user-based-on-their-id-in-an-auth-provider). ([\matrix-org#13810](matrix-org#13810))
- Cancel the processing of key query requests when they time out. ([\matrix-org#13680](matrix-org#13680))
- Improve validation of request bodies for the following client-server API endpoints: [`/account/3pid/msisdn/requestToken`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3pidmsisdnrequesttoken), [`/org.matrix.msc3720/account_status`](https://github.com/matrix-org/matrix-spec-proposals/blob/babolivier/user_status/proposals/3720-account-status.md#post-_matrixclientv1account_status), [`/account/3pid/add`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3pidadd), [`/account/3pid/bind`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3pidbind), [`/account/3pid/delete`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3piddelete) and [`/account/3pid/unbind`](https://spec.matrix.org/v1.3/client-server-api/#post_matrixclientv3account3pidunbind). ([\matrix-org#13687](matrix-org#13687), [\matrix-org#13736](matrix-org#13736))
- Document the timestamp when a user accepts the consent, if [consent tracking](https://matrix-org.github.io/synapse/latest/consent_tracking.html) is used. ([\matrix-org#13741](matrix-org#13741))
- Add a `listeners[x].request_id_header` configuration option to specify which request header to extract and use as the request ID in order to correlate requests from a reverse proxy. ([\matrix-org#13801](matrix-org#13801))

Bugfixes
--------

- Fix a bug introduced in Synapse 1.41.0 where the `/hierarchy` API returned non-standard information (a `room_id` field under each entry in `children_state`). ([\matrix-org#13506](matrix-org#13506))
- Fix a long-standing bug where previously rejected events could end up in room state because they pass auth checks given the current state of the room. ([\matrix-org#13723](matrix-org#13723))
- Fix a long-standing bug where Synapse fails to start if a signing key file contains an empty line. ([\matrix-org#13738](matrix-org#13738))
- Fix a long-standing bug where Synapse would fail to handle malformed user IDs or room aliases gracefully in certain cases. ([\matrix-org#13746](matrix-org#13746))
- Fix a long-standing bug where device lists would remain cached when remote users left and rejoined the last room shared with the local homeserver. ([\matrix-org#13749](matrix-org#13749), [\matrix-org#13826](matrix-org#13826))
- Fix a long-standing bug that could cause stale caches in some rare cases on the first startup of Synapse with replication. ([\matrix-org#13766](matrix-org#13766))
- Fix a long-standing spec compliance bug where Synapse would accept a trailing slash on the end of `/get_missing_events` federation requests. ([\matrix-org#13789](matrix-org#13789))
- Delete associated data from `event_failed_pull_attempts`, `insertion_events`, `insertion_event_extremities`, `insertion_event_extremities`, `insertion_event_extremities` when purging the room. ([\matrix-org#13825](matrix-org#13825))

Improved Documentation
----------------------

- Note that `libpq` is required on ARM-based Macs. ([\matrix-org#13480](matrix-org#13480))
- Fix a mistake in the config manual introduced in Synapse 1.22.0: the `event_cache_size` _is_ scaled by `caches.global_factor`. ([\matrix-org#13726](matrix-org#13726))
- Fix a typo in the documentation for the login ratelimiting configuration. ([\matrix-org#13727](matrix-org#13727))
- Define Synapse's compatability policy for SQLite versions. ([\matrix-org#13728](matrix-org#13728))
- Add docs for the common fix of deleting the `matrix_synapse.egg-info/` directory for fixing Python dependency problems. ([\matrix-org#13785](matrix-org#13785))
- Update request log format documentation to mention the format used when the authenticated user is controlling another user. ([\matrix-org#13794](matrix-org#13794))

Deprecations and Removals
-------------------------

- Synapse will now refuse to start if configured to use SQLite < 3.27. ([\matrix-org#13760](matrix-org#13760))
- Don't include redundant `prev_state` in new events. Contributed by Denis Kariakin (@dakariakin). ([\matrix-org#13791](matrix-org#13791))

Internal Changes
----------------

- Add a stub Rust crate. ([\matrix-org#12595](matrix-org#12595), [\matrix-org#13734](matrix-org#13734), [\matrix-org#13735](matrix-org#13735), [\matrix-org#13743](matrix-org#13743), [\matrix-org#13763](matrix-org#13763), [\matrix-org#13769](matrix-org#13769), [\matrix-org#13778](matrix-org#13778))
- Bump the minimum dependency of `matrix_common` to 1.3.0 to make use of the `MXCUri` class. Use `MXCUri` to simplify media retention test code. ([\matrix-org#13162](matrix-org#13162))
- Add and populate the `event_stream_ordering` column on the `receipts` table for future optimisation of push action processing. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13703](matrix-org#13703))
- Rename the `EventFormatVersions` enum values so that they line up with room version numbers. ([\matrix-org#13706](matrix-org#13706))
- Update trial old deps CI to use Poetry 1.2.0. ([\matrix-org#13707](matrix-org#13707), [\matrix-org#13725](matrix-org#13725))
- Add experimental configuration option to allow disabling legacy Prometheus metric names. ([\matrix-org#13714](matrix-org#13714), [\matrix-org#13717](matrix-org#13717), [\matrix-org#13718](matrix-org#13718))
- Fix typechecking with latest types-jsonschema. ([\matrix-org#13724](matrix-org#13724))
- Strip number suffix from instance name to consolidate services that traces are spread over. ([\matrix-org#13729](matrix-org#13729))
- Instrument `get_metadata_for_events` for understandable traces in Jaeger. ([\matrix-org#13730](matrix-org#13730))
- Remove old queries to join room memberships to current state events. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13745](matrix-org#13745))
- Avoid raising an error due to malformed user IDs in `get_current_hosts_in_room`. Malformed user IDs cannot currently join a room, so this error would not be hit. ([\matrix-org#13748](matrix-org#13748))
- Update the docstrings for `get_users_in_room` and `get_current_hosts_in_room` to explain the impact of partial state. ([\matrix-org#13750](matrix-org#13750))
- Use an additional database query when persisting receipts. ([\matrix-org#13752](matrix-org#13752))
- Preparatory work for storing thread IDs for notifications and receipts. ([\matrix-org#13753](matrix-org#13753))
- Re-type hint some collections as read-only. ([\matrix-org#13754](matrix-org#13754))
- Remove unused Prometheus recording rules from `synapse-v2.rules` and add comments describing where the rest are used. ([\matrix-org#13756](matrix-org#13756))
- Add a check for editable installs if the Rust library needs rebuilding. ([\matrix-org#13759](matrix-org#13759))
- Tag traces with the instance name to be able to easily jump into the right logs and filter traces by instance. ([\matrix-org#13761](matrix-org#13761))
- Concurrently fetch room push actions when calculating badge counts. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13765](matrix-org#13765))
- Update the script which makes full schema dumps. ([\matrix-org#13770](matrix-org#13770))
- Deduplicate `is_server_notices_room`. ([\matrix-org#13780](matrix-org#13780))
- Simplify the dependency DAG in the tests workflow. ([\matrix-org#13784](matrix-org#13784))
- Remove an old, incorrect migration file. ([\matrix-org#13788](matrix-org#13788))
- Remove unused method in `synapse.api.auth.Auth`. ([\matrix-org#13795](matrix-org#13795))
- Fix a memory leak when running the unit tests. ([\matrix-org#13798](matrix-org#13798))
- Use partial indices on SQLite. ([\matrix-org#13802](matrix-org#13802))
- Check that portdb generates the same postgres schema as that in the source tree. ([\matrix-org#13808](matrix-org#13808))
- Fix Docker build when Rust .so has been built locally first. ([\matrix-org#13811](matrix-org#13811))
- Complement: Initialise the Postgres database directly inside the target image instead of the base Postgres image to fix building using Buildah. ([\matrix-org#13819](matrix-org#13819))
- Support providing an index predicate clause when doing upserts. ([\matrix-org#13822](matrix-org#13822))
- Minor speedups to linting in CI. ([\matrix-org#13827](matrix-org#13827))

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE1508oLYUKainYFJakD7OEIo53t0FAmMy4FoACgkQkD7OEIo5
# 3t009g/6A4S26H6NG4GM44JD9+OB25fO59m9UAWWLrePmOKsBaGXVp86scPq9epI
# vQbr4Czi8WEqCJlMRxIWLXv7BL3TLXnLF1vC0wSE6YiJqrPU9vMZ0UYxWNErl8Sr
# eFBpuHXDlfppQUXs903iNmXVbdTpXCVjdTEwaZmgU1/FKydgU0o90PQseb/hnegX
# hcJrepL6xhcs37fP2zdlixissLQ85WE10x6h7FX+SkCEHGvkiKrqSvXsS4ZN0Scn
# vGCy3GD69j/ZpRu7RczdDwwzCRerg6r7spokRK/b6pzz9sXmCyY8SrrUyEEdzveN
# uEEe4A8vmR3v0sR4Ao2cGZ7zy/jq6WyrWjmfOd+hfYD9tXx/g8190RFkRQrrkYVU
# jTNdD6Zom0rtENEgHuFQ8joD96MNsaq4dvDefYYpcXDOh3YZnA/fiwfmG9XZRq6u
# B42RZEtUZ3sjZ3VdRb3AvhPTTrY4kiwEqVBFSUTBKEAKXQtdrsiqv2QPvQTSC5BJ
# YFXMwj32E7Zi3mTYjl60yggtBAxYt49KsHL8qAq75t6A38HpnYEyEW1R3S6VDmtp
# rIR5ktxyzoGvxpJ4YPUdC4A2hbaOztwPvGE3iEDiPgUdkfb6m8x3MhaWhShid4Df
# v2BQu+SFIl1wXPLxjlX2rmkQMhUGD2RGYmkUgYfoWnjdTtQV10w=
# =4eYj
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue Sep 27 12:36:58 2022 BST
# gpg:                using RSA key D79D3CA0B61429A8A760525A903ECE108A39DEDD
# gpg: Can't check signature: No public key

# Conflicts:
#	docker/Dockerfile
#	poetry.lock
#	synapse/push/bulk_push_rule_evaluator.py
#	synapse/push/push_tools.py
#	synapse/storage/databases/main/event_push_actions.py
#	synapse/storage/databases/main/events_worker.py
#	tests/replication/slave/storage/test_events.py
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants