-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add config options for media retention #12732
Conversation
IMHO to run it every hour is very often. Is it enough to run every 6, 12 or 24 hours? Then it need a notice that the retention policy shorter then this period is not very sensible. |
@dklimpel Perhaps it is worth making it configurable, yes. There's really no reason not to, and the implementation work to add that is minimal. |
b319264
to
35839a0
Compare
4f1c50a
to
1d859c9
Compare
( | ||
self.hs.config.server.server_name, | ||
self.local_not_recently_accessed_media, | ||
), | ||
(self.hs.config.server.server_name, self.local_never_accessed_media), | ||
], | ||
not_purged=[ | ||
(self.hs.config.server.server_name, self.local_recently_accessed_media), | ||
(self.remote_server_name, self.remote_recently_accessed_media), | ||
(self.remote_server_name, self.remote_not_recently_accessed_media), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I too find this fiendishly messy.
I've another PR in the works to create a MXCUri
object which can be much more easily passed around, and used to convert between str and object representations throughout the codebase.
This PR has been updated to address review feedback and add unit tests. |
@dklimpel @anoadragon453: The choice here is between a) run it often with each run being cheap, or b) run it less frequently and have each run be potentially expensive. To illustrate this consider a busy server that is receiving GBs of new media a day, if we delete it every hour then we're talking about deleting ~50MB of data every run, but running once a day means deleting GBs at once. This may or may not have a noticeable impact on the server. In general, assuming each run has relatively low overhead we prefer running more frequently rather than doing lots of work infrequently. |
Should this information added to |
We had a discussion in #synapse-dev:matrix.org regarding the benefit of the The conclusion is that it's probably not worth adding a config option until it becomes apparent that one is needed. Instead, we're going to aim for some reasonable default (say, 6 hours) and then see how that works in testing. If it ends up that a different period is needed for each system, then we can easily re-introduce the config option. But options are hard to remove once they're added - so let's see if we need one first. |
Synapse 1.61.0 (2022-06-14) =========================== This release removes support for the non-standard feature known both as 'groups' and as 'communities', which have been superseded by *Spaces*. See [the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1610) for more details. Improved Documentation ---------------------- - Mention removed community/group worker endpoints in [upgrade.md](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1610s). Contributed by @olmari. ([\matrix-org#13023](matrix-org#13023)) Synapse 1.61.0rc1 (2022-06-07) ============================== Features -------- - Add new `media_retention` options to the homeserver config for routinely cleaning up non-recently accessed media. ([\matrix-org#12732](matrix-org#12732), [\matrix-org#12972](matrix-org#12972), [\matrix-org#12977](matrix-org#12977)) - Experimental support for [MSC3772](matrix-org/matrix-spec-proposals#3772): Push rule for mutually related events. ([\matrix-org#12740](matrix-org#12740), [\matrix-org#12859](matrix-org#12859)) - Update to the `check_event_for_spam` module callback: Deprecate the current callback signature, replace it with a new signature that is both less ambiguous (replacing booleans with explicit allow/block) and more powerful (ability to return explicit error codes). ([\matrix-org#12808](matrix-org#12808)) - Add storage and module API methods to get monthly active users (and their corresponding appservices) within an optionally specified time range. ([\matrix-org#12838](matrix-org#12838), [\matrix-org#12917](matrix-org#12917)) - Support the new error code `ORG.MATRIX.MSC3823.USER_ACCOUNT_SUSPENDED` from [MSC3823](matrix-org/matrix-spec-proposals#3823). ([\matrix-org#12845](matrix-org#12845), [\matrix-org#12923](matrix-org#12923)) - Add a configurable background job to delete stale devices. ([\matrix-org#12855](matrix-org#12855)) - Improve URL previews for pages with empty elements. ([\matrix-org#12951](matrix-org#12951)) - Allow updating a user's password using the admin API without logging out their devices. Contributed by @jcgruenhage. ([\matrix-org#12952](matrix-org#12952)) Bugfixes -------- - Always send an `access_token` in `/thirdparty/` requests to appservices, as required by the [Application Service API specification](https://spec.matrix.org/v1.1/application-service-api/#third-party-networks). ([\matrix-org#12746](matrix-org#12746)) - Implement [MSC3816](matrix-org/matrix-spec-proposals#3816): sending the root event in a thread should count as having 'participated' in it. ([\matrix-org#12766](matrix-org#12766)) - Delete events from the `federation_inbound_events_staging` table when a room is purged through the admin API. ([\matrix-org#12784](matrix-org#12784)) - Fix a bug where we did not correctly handle invalid device list updates over federation. Contributed by Carl Bordum Hansen. ([\matrix-org#12829](matrix-org#12829)) - Fix a bug which allowed multiple async operations to access database locks concurrently. Contributed by @sumnerevans @ Beeper. ([\matrix-org#12832](matrix-org#12832)) - Fix an issue introduced in Synapse 0.34 where the `/notifications` endpoint would only return notifications if a user registered at least one pusher. Contributed by Famedly. ([\matrix-org#12840](matrix-org#12840)) - Fix a bug where servers using a Postgres database would fail to backfill from an insertion event when MSC2716 is enabled (`experimental_features.msc2716_enabled`). ([\matrix-org#12843](matrix-org#12843)) - Fix [MSC3787](matrix-org/matrix-spec-proposals#3787) rooms being omitted from room directory, room summary and space hierarchy responses. ([\matrix-org#12858](matrix-org#12858)) - Fix a bug introduced in Synapse 1.54.0 which could sometimes cause exceptions when handling federated traffic. ([\matrix-org#12877](matrix-org#12877)) - Fix a bug introduced in Synapse 1.59.0 which caused room deletion to fail with a foreign key violation error. ([\matrix-org#12889](matrix-org#12889)) - Fix a long-standing bug which caused the `/messages` endpoint to return an incorrect `end` attribute when there were no more events. Contributed by @Vetchu. ([\matrix-org#12903](matrix-org#12903)) - Fix a bug introduced in Synapse 1.58.0 where `/sync` would fail if the most recent event in a room was a redaction of an event that has since been purged. ([\matrix-org#12905](matrix-org#12905)) - Fix a potential memory leak when generating thumbnails. ([\matrix-org#12932](matrix-org#12932)) - Fix a long-standing bug where a URL preview would break if the image failed to download. ([\matrix-org#12950](matrix-org#12950)) Improved Documentation ---------------------- - Fix typographical errors in documentation. ([\matrix-org#12863](matrix-org#12863)) - Fix documentation incorrectly stating the `sendToDevice` endpoint can be directed at generic workers. Contributed by Nick @ Beeper. ([\matrix-org#12867](matrix-org#12867)) Deprecations and Removals ------------------------- - Remove support for the non-standard groups/communities feature from Synapse. ([\matrix-org#12553](matrix-org#12553), [\matrix-org#12558](matrix-org#12558), [\matrix-org#12563](matrix-org#12563), [\matrix-org#12895](matrix-org#12895), [\matrix-org#12897](matrix-org#12897), [\matrix-org#12899](matrix-org#12899), [\matrix-org#12900](matrix-org#12900), [\matrix-org#12936](matrix-org#12936), [\matrix-org#12966](matrix-org#12966)) - Remove contributed `kick_users.py` script. This is broken under Python 3, and is not added to the environment when `pip install`ing Synapse. ([\matrix-org#12908](matrix-org#12908)) - Remove `contrib/jitsimeetbridge`. This was an unused experiment that hasn't been meaningfully changed since 2014. ([\matrix-org#12909](matrix-org#12909)) - Remove unused `contrib/experiements/cursesio.py` script, which fails to run under Python 3. ([\matrix-org#12910](matrix-org#12910)) - Remove unused `contrib/experiements/test_messaging.py` script. This fails to run on Python 3. ([\matrix-org#12911](matrix-org#12911)) Internal Changes ---------------- - Test Synapse against Complement with workers. ([\matrix-org#12810](matrix-org#12810), [\matrix-org#12933](matrix-org#12933)) - Reduce the amount of state we pull from the DB. ([\matrix-org#12811](matrix-org#12811), [\matrix-org#12964](matrix-org#12964)) - Try other homeservers when re-syncing state for rooms with partial state. ([\matrix-org#12812](matrix-org#12812)) - Resume state re-syncing for rooms with partial state after a Synapse restart. ([\matrix-org#12813](matrix-org#12813)) - Remove Mutual Rooms' ([MSC2666](matrix-org/matrix-spec-proposals#2666)) endpoint dependency on the User Directory. ([\matrix-org#12836](matrix-org#12836)) - Experimental: expand `check_event_for_spam` with ability to return additional fields. This enables spam-checker implementations to experiment with mechanisms to give users more information about why they are blocked and whether any action is needed from them to be unblocked. ([\matrix-org#12846](matrix-org#12846)) - Remove `dont_notify` from the `.m.rule.room.server_acl` rule. ([\matrix-org#12849](matrix-org#12849)) - Remove the unstable `/hierarchy` endpoint from [MSC2946](matrix-org/matrix-spec-proposals#2946). ([\matrix-org#12851](matrix-org#12851)) - Pull out less state when handling gaps in room DAG. ([\matrix-org#12852](matrix-org#12852), [\matrix-org#12904](matrix-org#12904)) - Clean-up the push rules datastore. ([\matrix-org#12856](matrix-org#12856)) - Correct a type annotation in the URL preview source code. ([\matrix-org#12860](matrix-org#12860)) - Update `pyjwt` dependency to [2.4.0](https://github.com/jpadilla/pyjwt/releases/tag/2.4.0). ([\matrix-org#12865](matrix-org#12865)) - Enable the `/account/whoami` endpoint on synapse worker processes. Contributed by Nick @ Beeper. ([\matrix-org#12866](matrix-org#12866)) - Enable the `batch_send` endpoint on synapse worker processes. Contributed by Nick @ Beeper. ([\matrix-org#12868](matrix-org#12868)) - Don't generate empty AS transactions when the AS is flagged as down. Contributed by Nick @ Beeper. ([\matrix-org#12869](matrix-org#12869)) - Fix up the variable `state_store` naming. ([\matrix-org#12871](matrix-org#12871)) - Faster room joins: when querying the current state of the room, wait for state to be populated. ([\matrix-org#12872](matrix-org#12872)) - Avoid running queries which will never result in deletions. ([\matrix-org#12879](matrix-org#12879)) - Use constants for EDU types. ([\matrix-org#12884](matrix-org#12884)) - Reduce database load of `/sync` when presence is enabled. ([\matrix-org#12885](matrix-org#12885)) - Refactor `have_seen_events` to reduce memory consumed when processing federation traffic. ([\matrix-org#12886](matrix-org#12886)) - Refactor receipt linearization code. ([\matrix-org#12888](matrix-org#12888)) - Add type annotations to `synapse.logging.opentracing`. ([\matrix-org#12894](matrix-org#12894)) - Remove PyNaCl occurrences directly used in Synapse code. ([\matrix-org#12902](matrix-org#12902)) - Bump types-jsonschema from 4.4.1 to 4.4.6. ([\matrix-org#12912](matrix-org#12912)) - Rename storage classes. ([\matrix-org#12913](matrix-org#12913)) - Preparation for database schema simplifications: stop reading from `event_edges.room_id`. ([\matrix-org#12914](matrix-org#12914)) - Check if we are in a virtual environment before overriding the `PYTHONPATH` environment variable in the demo script. ([\matrix-org#12916](matrix-org#12916)) - Improve the logging when signature checks on events fail. ([\matrix-org#12925](matrix-org#12925)) # -----BEGIN PGP SIGNATURE----- # # iQFEBAABCgAuFiEEBTGR3/RnAzBGUif3pULk7RsPrAkFAmKoaa0QHGVyaWtAbWF0 # cml4Lm9yZwAKCRClQuTtGw+sCVn3B/sF8hdhrZ7hWW40ST3eG9cEKNFrj/xZXiaI # zho3ryrxaQF68BSKot15AvZSprEdwBXWrb8WeTjyw+QH7vTKrCQDZ0p7qubn10Z7 # BuKq9hyYjyCLjBZrgy8d4U3Y8gsSByuO59YKHNLn+UTJLOs5GTH8Wprwh4mpU3Jl # +o+cC+lMSVcyZij2hihFymcSxWq/I9WL0dsjRif8x0BUQwRXwmXc6+mhlgLBe2Zs # 2dUouzJ8NVZcjfWvsg4noXPrNQ/IiyCVZlSIgaDftDIxVSPk5/rXiUUNex8Tn1+I # TnOgnXhpOQD1vwVGYS8LrcfA0ubSili7xUJ8k2e5TkCjpkaVnYNu # =q+7N # -----END PGP SIGNATURE----- # gpg: Signature made Tue Jun 14 11:57:49 2022 BST # gpg: using RSA key 053191DFF4670330465227F7A542E4ED1B0FAC09 # gpg: issuer "erik@matrix.org" # gpg: WARNING: server 'dirmngr' is older than us (2.2.34 < 2.3.6) # gpg: Note: Outdated servers may lack important security fixes. # gpg: Note: Use the command "gpgconf --kill all" to restart them. # gpg: Can't check signature: No public key # Conflicts: # docs/workers.md # synapse/handlers/message.py # synapse/handlers/pagination.py # synapse/push/bulk_push_rule_evaluator.py # synapse/push/push_rule_evaluator.py # synapse/rest/client/receipts.py # synapse/storage/databases/main/appservice.py # tests/push/test_push_rule_evaluator.py
Upstream NEWS, less bugfixes and minor improvements: Synapse 1.61.0 (2022-06-14) =========================== This release removes support for the non-standard feature known both as 'groups' and as 'communities', which have been superseded by *Spaces*. Synapse 1.61.0rc1 (2022-06-07) ============================== Features -------- - Add new `media_retention` options to the homeserver config for routinely cleaning up non-recently accessed media. ([\#12732](matrix-org/synapse#12732), [\#12972](matrix-org/synapse#12972), [\#12977](matrix-org/synapse#12977)) - Experimental support for [MSC3772](matrix-org/matrix-spec-proposals#3772): Push rule for mutually related events. ([\#12740](matrix-org/synapse#12740), [\#12859](matrix-org/synapse#12859)) - Update to the `check_event_for_spam` module callback: Deprecate the current callback signature, replace it with a new signature that is both less ambiguous (replacing booleans with explicit allow/block) and more powerful (ability to return explicit error codes). ([\#12808](matrix-org/synapse#12808)) - Add storage and module API methods to get monthly active users (and their corresponding appservices) within an optionally specified time range. ([\#12838](matrix-org/synapse#12838), [\#12917](matrix-org/synapse#12917)) - Support the new error code `ORG.MATRIX.MSC3823.USER_ACCOUNT_SUSPENDED` from [MSC3823](matrix-org/matrix-spec-proposals#3823). ([\#12845](matrix-org/synapse#12845), [\#12923](matrix-org/synapse#12923)) - Add a configurable background job to delete stale devices. ([\#12855](matrix-org/synapse#12855)) - Allow updating a user's password using the admin API without logging out their devices. Contributed by @jcgruenhage. ([\#12952](matrix-org/synapse#12952)) Deprecations and Removals ------------------------- - Remove support for the non-standard groups/communities feature from Synapse. ([\#12553](matrix-org/synapse#12553), [\#12558](matrix-org/synapse#12558), [\#12563](matrix-org/synapse#12563), [\#12895](matrix-org/synapse#12895), [\#12897](matrix-org/synapse#12897), [\#12899](matrix-org/synapse#12899), [\#12900](matrix-org/synapse#12900), [\#12936](matrix-org/synapse#12936), [\#12966](matrix-org/synapse#12966)) - Remove contributed `kick_users.py` script. This is broken under Python 3, and is not added to the environment when `pip install`ing Synapse. ([\#12908](matrix-org/synapse#12908)) Synapse 1.60.0 (2022-05-31) =========================== This release of Synapse adds a unique index to the `state_group_edges` table, in order to prevent accidentally introducing duplicate information (for example, because a database backup was restored multiple times). If your Synapse database already has duplicate rows in this table, this could fail with an error and require manual remediation. Additionally, the signature of the `check_event_for_spam` module callback has changed. The previous signature has been deprecated and remains working for now. Module authors should update their modules to use the new signature where possible. Synapse 1.60.0rc2 (2022-05-27) ============================== Features -------- - Add an option allowing users to use their password to reauthenticate for privileged actions even though password login is disabled. ([\#12883](matrix-org/synapse#12883)) Synapse 1.60.0rc1 (2022-05-24) ============================== Features -------- - Measure the time taken in spam-checking callbacks and expose those measurements as metrics. ([\#12513](matrix-org/synapse#12513)) - Add a `default_power_level_content_override` config option to set default room power levels per room preset. ([\#12618](matrix-org/synapse#12618)) - Add support for [MSC3787: Allowing knocks to restricted rooms](matrix-org/matrix-spec-proposals#3787). ([\#12623](matrix-org/synapse#12623)) - Send `USER_IP` commands on a different Redis channel, in order to reduce traffic to workers that do not process these commands. ([\#12672](matrix-org/synapse#12672), [\#12809](matrix-org/synapse#12809)) - Synapse will now reload [cache config](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#caching) when it receives a [SIGHUP](https://en.wikipedia.org/wiki/SIGHUP) signal. ([\#12673](matrix-org/synapse#12673)) - Add a config options to allow for auto-tuning of caches. ([\#12701](matrix-org/synapse#12701)) - Update [MSC2716](matrix-org/matrix-spec-proposals#2716) implementation to process marker events from the current state to avoid markers being lost in timeline gaps for federated servers which would cause the imported history to be undiscovered. ([\#12718](matrix-org/synapse#12718)) - Add a `drop_federated_event` callback to `SpamChecker` to disregard inbound federated events before they take up much processing power, in an emergency. ([\#12744](matrix-org/synapse#12744)) - Implement [MSC3818: Copy room type on upgrade](matrix-org/matrix-spec-proposals#3818). ([\#12786](matrix-org/synapse#12786), [\#12792](matrix-org/synapse#12792)) - Update to the `check_event_for_spam` module callback. Deprecate the current callback signature, replace it with a new signature that is both less ambiguous (replacing booleans with explicit allow/block) and more powerful (ability to return explicit error codes). ([\#12808](matrix-org/synapse#12808)) Deprecations and Removals ------------------------- - Require a body in POST requests to `/rooms/{roomId}/receipt/{receiptType}/{eventId}`, as required by the [Matrix specification](https://spec.matrix.org/v1.2/client-server-api/#post_matrixclientv3roomsroomidreceiptreceipttypeeventid). This breaks compatibility with Element Android 1.2.0 and earlier: users of those clients will be unable to send read receipts. ([\#12709](matrix-org/synapse#12709)) Synapse 1.59.1 (2022-05-18) =========================== This release fixes a long-standing issue which could prevent Synapse's user directory for updating properly. Synapse 1.59.0 (2022-05-17) =========================== Synapse 1.59 makes several changes that server administrators should be aware of: - Device name lookup over federation is now disabled by default. ([\#12616](matrix-org/synapse#12616)) - The `synapse.app.appservice` and `synapse.app.user_dir` worker application types are now deprecated. ([\#12452](matrix-org/synapse#12452), [\#12654](matrix-org/synapse#12654)) Additionally, this release removes the non-standard `m.login.jwt` login type from Synapse. It can be replaced with `org.matrix.login.jwt` for identical behaviour. This is only used if `jwt_config.enabled` is set to `true` in the configuration. ([\#12597](matrix-org/synapse#12597)) Synapse 1.59.0rc2 (2022-05-16) ============================== Synapse 1.59.0rc1 (2022-05-10) ============================== Features -------- - Support [MSC3266](matrix-org/matrix-spec-proposals#3266) room summaries over federation. ([\#11507](matrix-org/synapse#11507)) - Implement [changes](matrix-org/matrix-spec-proposals@4a77139) to [MSC2285 (hidden read receipts)](matrix-org/matrix-spec-proposals#2285). Contributed by @SimonBrandner. ([\#12168](matrix-org/synapse#12168), [\#12635](matrix-org/synapse#12635), [\#12636](matrix-org/synapse#12636), [\#12670](matrix-org/synapse#12670)) - Extend the [module API](https://github.com/matrix-org/synapse/blob/release-v1.59/synapse/module_api/__init__.py) to allow modules to change actions for existing push rules of local users. ([\#12406](matrix-org/synapse#12406)) - Add the `notify_appservices_from_worker` configuration option (superseding `notify_appservices`) to allow a generic worker to be designated as the worker to send traffic to Application Services. ([\#12452](matrix-org/synapse#12452)) - Add the `update_user_directory_from_worker` configuration option (superseding `update_user_directory`) to allow a generic worker to be designated as the worker to update the user directory. ([\#12654](matrix-org/synapse#12654)) - Add new `enable_registration_token_3pid_bypass` configuration option to allow registrations via token as an alternative to verifying a 3pid. ([\#12526](matrix-org/synapse#12526)) - Implement [MSC3786](matrix-org/matrix-spec-proposals#3786): Add a default push rule to ignore `m.room.server_acl` events. ([\#12601](matrix-org/synapse#12601)) - Add new `mau_appservice_trial_days` configuration option to specify a different trial period for users registered via an appservice. ([\#12619](matrix-org/synapse#12619)) Deprecations and Removals ------------------------- - Remove unstable identifiers from [MSC3069](matrix-org/matrix-spec-proposals#3069). ([\#12596](matrix-org/synapse#12596)) - Remove the unspecified `m.login.jwt` login type and the unstable `uk.half-shot.msc2778.login.application_service` from [MSC2778](matrix-org/matrix-spec-proposals#2778). ([\#12597](matrix-org/synapse#12597)) - Synapse now requires at least Python 3.7.1 (up from 3.7.0), for compatibility with the latest Twisted trunk. ([\#12613](matrix-org/synapse#12613))
Closes #6832.
Adds configuration options for automatically removing media if it hasn't been accessed (either locally or remotely) after a certain period of time. This is managed by an hourly background job that searches the database for old media according to the configured lifetimes, removing the files from disk if any is found.
TODO:
Questions:
Do we apply any jitter to our background jobs? Otherwise I feel that a lot is going to happen 1hr after your start your homeserver.I don't think we do. However, the period is now configurable with the default being every 24 hours. This is also not really a problem to consider in this PR.