From b0261a8549ee2c6d7d885b37fdf6b04ba9719b71 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Tue, 3 Sep 2019 19:04:10 +0100 Subject: [PATCH 01/12] MSC for deleting expired or redacted attachments --- proposals/2278-deleting-content.md | 100 +++++++++++++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 proposals/2278-deleting-content.md diff --git a/proposals/2278-deleting-content.md b/proposals/2278-deleting-content.md new file mode 100644 index 00000000000..634762b56fb --- /dev/null +++ b/proposals/2278-deleting-content.md @@ -0,0 +1,100 @@ +# Proposal for deleting content for expired and redacted messages + +## Overview + +[MSC1763](https://https://github.com/matrix-org/matrix-doc/pull/1763) proposes +the `m.room.retention` state event for defining how aggressively servers +should purge old messages for a given room. + +It originally also specified how media for purged events should be purged from +disk, however this was split out into a new MSC [by +request](https://github.com/matrix-org/matrix-doc/pull/1763#discussion_r320289119) +during review. + +## Proposal + +We handle encrypted & unencrypted rooms differently. Both require an API to +delete content from the local media repo (bug +[#790](https://github.com/matrix-org/matrix-doc/issues/790)), for which we +propose: + +``` +DELETE /_matrix/media/r0/download/{serverName}/{mediaId} +{} +``` + +The user must be authenticated via access_token or Authorization header as the +original uploader or as a server admin in order to delete the content. + +Servers may wish to quarantine the deleted content for some timeframe before +actually purging it from storage, in order to mitigate abuse. + + XXX: We might want to provide an undelete API too to let users rescue + their content that they accidentally deleted, as you would get on a + typical desktop OS file manager. Perhaps `DELETE` with `{ undelete: true }`? + + XXX: We might also want to let admins quarantine rather than delete attachments + without a timelimit by passing `{ quarantine: true }` or similar. + +Server admins may choose to mark some content as undeletable in their +implementation (e.g. for sticker packs and other content which should never be +deleted or quarantined.) + +### Encrypted rooms + +There is no way for server to know what events refer to which MXC URL, so we +leave it up to the client to DELETE any MXC URLs referred to by an event after +it expires or redacts its local copy of an event. + +We rely on the fact that MXC URLs should not be reused between encrypted +events, as we expect each event to have different message keys to avoid +correlation. As a result, it should be safe to assume each attachment has +only one referring event, and so when a client deems that the event should +be deleted, it is safe to also delete the attachment without breaking any +other events. + +It seems reasonable to consider the special case of forwarding encrypted +attachments between rooms as an a 'copy by reference' - if the original +event gets deleted, the others should too. If this isn't desired, then +the attachment should be reencrypted. + +### Unencrypted rooms + +It's common for MXC URLs to be shared between unencrypted events - e.g. reusing +sticker media, or when forwarding messages between rooms, etc. In this instance, +the homeserver (not media server) should count the references to a given MXC URL +by events which refer to it. + +If all events which refer to it have been purged or redacted, the HS should delete +the attachment - either by internally deleting the media, or if using an +external media repository, by calling the DELETE api upon it. + +If a new event is received over federation which refers to a deleted +attachment, then the server should operate as if it has never heard of that +attachment; pulling it in over federation from whatever the source server is. +(This will break if the remote server sends an event referring to the local +MXC URL, so don't do that.) + +In the scenario of (say) a redacted membership event, it's possible that the +refcount of an unwanted avatar might be greater than zero (due to the avatar +being referenced in multiple rooms), but the room admin may want to still +purge the content from their server. This can be achieved by DELETEing the +content independently from redacting the membership events. + +## Tradeoffs + +Assuming that encrypted events don't reuse attachments is controversial but +hopefully acceptable. It does mean that stickers in encrypted rooms will end +up getting re-encrypted/decrypted every time, but is hopefully be acceptable +given the resulting improvement in privacy. + +## Security considerations + +Media repo implementations might want to use `srm` or a similar secure +deletion tool to scrub deleted data off disk. + +If the same attachment is sent multiple times across encrypted events (even if +encrypted separately per event), it's worth noting that the size of the +encrypted attachment and associated traffic patterns will be an easy way to +identify attachment reuse (e.g. who's forwarding a sensitive file to each +other). \ No newline at end of file From e161cb2395012f8b792eb86be83117deb05fd386 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Tue, 3 Sep 2019 20:03:39 +0100 Subject: [PATCH 02/12] Update proposals/2278-deleting-content.md Co-Authored-By: Travis Ralston --- proposals/2278-deleting-content.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/2278-deleting-content.md b/proposals/2278-deleting-content.md index 634762b56fb..b35aaaf5060 100644 --- a/proposals/2278-deleting-content.md +++ b/proposals/2278-deleting-content.md @@ -24,7 +24,7 @@ DELETE /_matrix/media/r0/download/{serverName}/{mediaId} ``` The user must be authenticated via access_token or Authorization header as the -original uploader or as a server admin in order to delete the content. +original uploader, or however the server sees fit in order to delete the content. Servers may wish to quarantine the deleted content for some timeframe before actually purging it from storage, in order to mitigate abuse. @@ -97,4 +97,4 @@ If the same attachment is sent multiple times across encrypted events (even if encrypted separately per event), it's worth noting that the size of the encrypted attachment and associated traffic patterns will be an easy way to identify attachment reuse (e.g. who's forwarding a sensitive file to each -other). \ No newline at end of file +other). From f8b6b7c89af44f6810190b4b2ea803ed8ffddb68 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Tue, 3 Sep 2019 20:03:48 +0100 Subject: [PATCH 03/12] Update proposals/2278-deleting-content.md Co-Authored-By: Travis Ralston --- proposals/2278-deleting-content.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/2278-deleting-content.md b/proposals/2278-deleting-content.md index b35aaaf5060..4f310becbaf 100644 --- a/proposals/2278-deleting-content.md +++ b/proposals/2278-deleting-content.md @@ -31,7 +31,7 @@ actually purging it from storage, in order to mitigate abuse. XXX: We might want to provide an undelete API too to let users rescue their content that they accidentally deleted, as you would get on a - typical desktop OS file manager. Perhaps `DELETE` with `{ undelete: true }`? + typical desktop OS file manager. Perhaps `DELETE` with `{ undo: true }`? XXX: We might also want to let admins quarantine rather than delete attachments without a timelimit by passing `{ quarantine: true }` or similar. From 24068a755bae770b26f9443e4b4b733df9220fef Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Tue, 3 Sep 2019 20:07:24 +0100 Subject: [PATCH 04/12] specify DELETE better --- proposals/2278-deleting-content.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/proposals/2278-deleting-content.md b/proposals/2278-deleting-content.md index 4f310becbaf..84e37aa7444 100644 --- a/proposals/2278-deleting-content.md +++ b/proposals/2278-deleting-content.md @@ -20,8 +20,13 @@ propose: ``` DELETE /_matrix/media/r0/download/{serverName}/{mediaId} -{} ``` +with a JSON dict as a request body. + +The API would returns: + * `200 OK {}` on success + * `403` if invalid access_token or not authorised to delete. + * `404` if the content described in the URL does not exist on the local server. The user must be authenticated via access_token or Authorization header as the original uploader, or however the server sees fit in order to delete the content. From 0e8216fe9efa28f1e41423d296ae10b9ca83fcbe Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Tue, 3 Sep 2019 20:09:15 +0100 Subject: [PATCH 05/12] clarify problems when referring to remote MXC URLs --- proposals/2278-deleting-content.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/proposals/2278-deleting-content.md b/proposals/2278-deleting-content.md index 84e37aa7444..884911f2367 100644 --- a/proposals/2278-deleting-content.md +++ b/proposals/2278-deleting-content.md @@ -77,8 +77,9 @@ external media repository, by calling the DELETE api upon it. If a new event is received over federation which refers to a deleted attachment, then the server should operate as if it has never heard of that attachment; pulling it in over federation from whatever the source server is. -(This will break if the remote server sends an event referring to the local -MXC URL, so don't do that.) +This will break if a remote server sends an event referring to a local +MXC URL which may have been deleted, so don't do that - clients on servers +should send MXC URLs which refer to their local server, not remote ones. In the scenario of (say) a redacted membership event, it's possible that the refcount of an unwanted avatar might be greater than zero (due to the avatar From 4261b7d900c516c16a7d92e6407f077589948535 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Tue, 3 Sep 2019 22:35:21 +0100 Subject: [PATCH 06/12] matrix errors --- proposals/2278-deleting-content.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/2278-deleting-content.md b/proposals/2278-deleting-content.md index 884911f2367..49e293f4c9a 100644 --- a/proposals/2278-deleting-content.md +++ b/proposals/2278-deleting-content.md @@ -25,8 +25,8 @@ with a JSON dict as a request body. The API would returns: * `200 OK {}` on success - * `403` if invalid access_token or not authorised to delete. - * `404` if the content described in the URL does not exist on the local server. + * `403` with error `M_FORBIDDEN` if invalid access_token or not authorised to delete. + * `404` with error `M_NOT_FOUND` if the content described in the URL does not exist on the local server. The user must be authenticated via access_token or Authorization header as the original uploader, or however the server sees fit in order to delete the content. From a32bce045adf45b42a99f6a818a99ca809995d94 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Tue, 3 Sep 2019 23:54:28 +0100 Subject: [PATCH 07/12] comment on 404s --- proposals/2278-deleting-content.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/proposals/2278-deleting-content.md b/proposals/2278-deleting-content.md index 49e293f4c9a..30d08dee8b4 100644 --- a/proposals/2278-deleting-content.md +++ b/proposals/2278-deleting-content.md @@ -81,6 +81,12 @@ This will break if a remote server sends an event referring to a local MXC URL which may have been deleted, so don't do that - clients on servers should send MXC URLs which refer to their local server, not remote ones. +This means that if the local server chooses to expire the source event sooner +than a remote server does, the remote server might end up not being able to +sync the media from the local server and so display a broken attachment. +This feels fairly reasonable; if you don't want people to end up with 404s +on attachments, you shouldn't go deleting things. + In the scenario of (say) a redacted membership event, it's possible that the refcount of an unwanted avatar might be greater than zero (due to the avatar being referenced in multiple rooms), but the room admin may want to still From c3142c9cd29f7c35089b0964312775ba6ffce9d2 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Wed, 4 Sep 2019 09:30:48 +0100 Subject: [PATCH 08/12] mention a 'touching' alt --- proposals/2278-deleting-content.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/proposals/2278-deleting-content.md b/proposals/2278-deleting-content.md index 30d08dee8b4..64ebc44405c 100644 --- a/proposals/2278-deleting-content.md +++ b/proposals/2278-deleting-content.md @@ -100,6 +100,14 @@ hopefully acceptable. It does mean that stickers in encrypted rooms will end up getting re-encrypted/decrypted every time, but is hopefully be acceptable given the resulting improvement in privacy. +An alternative approach to solving the problem of attachment reuse could be to +expect clients to somehow 'touch' uploaded local attachments whenever they +send an event which refers to them - effectively renewing their retention +lifetime. However, in E2EE rooms this ends up leaking which events refer to +which attachments (or at least claim to), and also gives a vector for abuse +where malicious client could bypass the retention schedule by repeatedly +retouching a file to keep it alive. + ## Security considerations Media repo implementations might want to use `srm` or a similar secure From 9a409e7c0894e3b36cb351709cef8b165dfa535d Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Thu, 11 Jun 2020 21:44:34 +0100 Subject: [PATCH 09/12] Update proposals/2278-deleting-content.md Co-authored-by: Hubert Chathi --- proposals/2278-deleting-content.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/2278-deleting-content.md b/proposals/2278-deleting-content.md index 64ebc44405c..60e37050078 100644 --- a/proposals/2278-deleting-content.md +++ b/proposals/2278-deleting-content.md @@ -59,7 +59,7 @@ be deleted, it is safe to also delete the attachment without breaking any other events. It seems reasonable to consider the special case of forwarding encrypted -attachments between rooms as an a 'copy by reference' - if the original +attachments between rooms as a 'copy by reference' - if the original event gets deleted, the others should too. If this isn't desired, then the attachment should be reencrypted. From a320b8edb5484db59c1e3e9b3b92baad1ee3c5b7 Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Thu, 11 Jun 2020 21:44:43 +0100 Subject: [PATCH 10/12] Update proposals/2278-deleting-content.md Co-authored-by: Hubert Chathi --- proposals/2278-deleting-content.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/2278-deleting-content.md b/proposals/2278-deleting-content.md index 60e37050078..87aef49595a 100644 --- a/proposals/2278-deleting-content.md +++ b/proposals/2278-deleting-content.md @@ -97,7 +97,7 @@ content independently from redacting the membership events. Assuming that encrypted events don't reuse attachments is controversial but hopefully acceptable. It does mean that stickers in encrypted rooms will end -up getting re-encrypted/decrypted every time, but is hopefully be acceptable +up getting re-encrypted/decrypted every time, but is hopefully acceptable given the resulting improvement in privacy. An alternative approach to solving the problem of attachment reuse could be to From 6185341231ce9223a99e703ebe4dedba527fe51a Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Thu, 11 Jun 2020 21:45:37 +0100 Subject: [PATCH 11/12] Update proposals/2278-deleting-content.md Co-authored-by: Hubert Chathi --- proposals/2278-deleting-content.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/2278-deleting-content.md b/proposals/2278-deleting-content.md index 87aef49595a..6163ae1f26d 100644 --- a/proposals/2278-deleting-content.md +++ b/proposals/2278-deleting-content.md @@ -23,7 +23,7 @@ DELETE /_matrix/media/r0/download/{serverName}/{mediaId} ``` with a JSON dict as a request body. -The API would returns: +The API would return: * `200 OK {}` on success * `403` with error `M_FORBIDDEN` if invalid access_token or not authorised to delete. * `404` with error `M_NOT_FOUND` if the content described in the URL does not exist on the local server. From d0b58f2c03aa961837513988a3f7ef032376e7bf Mon Sep 17 00:00:00 2001 From: Matthew Hodgson Date: Fri, 12 Jun 2020 00:49:48 +0100 Subject: [PATCH 12/12] incorporate review --- proposals/2278-deleting-content.md | 55 +++++++++++++++++++++--------- 1 file changed, 39 insertions(+), 16 deletions(-) diff --git a/proposals/2278-deleting-content.md b/proposals/2278-deleting-content.md index 6163ae1f26d..3175bd30c07 100644 --- a/proposals/2278-deleting-content.md +++ b/proposals/2278-deleting-content.md @@ -9,7 +9,9 @@ should purge old messages for a given room. It originally also specified how media for purged events should be purged from disk, however this was split out into a new MSC [by request](https://github.com/matrix-org/matrix-doc/pull/1763#discussion_r320289119) -during review. +during review. This proposal also solves +https://github.com/vector-im/riot-meta/issues/168 - the ability to garbage +collect attachments from redacted events. ## Proposal @@ -21,7 +23,6 @@ propose: ``` DELETE /_matrix/media/r0/download/{serverName}/{mediaId} ``` -with a JSON dict as a request body. The API would return: * `200 OK {}` on success @@ -29,17 +30,23 @@ The API would return: * `404` with error `M_NOT_FOUND` if the content described in the URL does not exist on the local server. The user must be authenticated via access_token or Authorization header as the -original uploader, or however the server sees fit in order to delete the content. +original uploader, or server admin (as determined by the server implementation). Servers may wish to quarantine the deleted content for some timeframe before actually purging it from storage, in order to mitigate abuse. - XXX: We might want to provide an undelete API too to let users rescue - their content that they accidentally deleted, as you would get on a - typical desktop OS file manager. Perhaps `DELETE` with `{ undo: true }`? +If `serverName` is not the local server, the local cache (if any) of the content +should be deleted. This proposal makes no effort to delete the remote content. - XXX: We might also want to let admins quarantine rather than delete attachments - without a timelimit by passing `{ quarantine: true }` or similar. +Overlapping or near-overlapping authorised requests to `DELETE` for existing +content may either return 200 or 404 based on implementation choice. + +*XXX: We might want to provide an undelete API too to let users rescue +their content that they accidentally deleted, as you would get on a +typical desktop OS file manager. Perhaps `DELETE` with `?undo=true`?* + +*XXX: We might also want to let admins quarantine rather than delete attachments +without a timelimit by passing `?quarantine=true` or similar.* Server admins may choose to mark some content as undeletable in their implementation (e.g. for sticker packs and other content which should never be @@ -58,17 +65,19 @@ only one referring event, and so when a client deems that the event should be deleted, it is safe to also delete the attachment without breaking any other events. -It seems reasonable to consider the special case of forwarding encrypted -attachments between rooms as a 'copy by reference' - if the original -event gets deleted, the others should too. If this isn't desired, then -the attachment should be reencrypted. +It seems reasonable to consider the special case of clients forwarding +encrypted attachments between rooms as a 'copy by reference' - if the +original event gets deleted, the copies should too. If this isn't desired, +then the attachment should have been reencrypted and stored as a separate +instance in the media repo. ### Unencrypted rooms -It's common for MXC URLs to be shared between unencrypted events - e.g. reusing -sticker media, or when forwarding messages between rooms, etc. In this instance, -the homeserver (not media server) should count the references to a given MXC URL -by events which refer to it. +It's common for MXC URLs to be shared between unencrypted events - e.g. +reusing sticker media, or when forwarding messages between rooms, etc. In +this instance, the homeserver (not media server) should count the references +to a given MXC URL by events which refer to it (including state events such as +avatar URLs in `m.room.membership` events.) If all events which refer to it have been purged or redacted, the HS should delete the attachment - either by internally deleting the media, or if using an @@ -93,6 +102,20 @@ being referenced in multiple rooms), but the room admin may want to still purge the content from their server. This can be achieved by DELETEing the content independently from redacting the membership events. +*N.B. we can't currently distinguish an E2EE attachment with unknown refering +events, from a non-E2EE attachment with zero references which should be GCd. +So we use mime-types as a heuristic to recognise E2EE attachments, and to stop +them from being GC'd This would of course be vulnerable to an attacker lying +about their mime-type in order to stop their repository entries being GC'd, +but given E2EE attachments already let you bypass the GC, this doesn't feel +like a big issue.* + +Encrypted attachments should be stored with a mime-type of +`application/aes-encrypted` (to be registered), and attachments +with this mime-type which have never been referenced by an event should +be exempt from GC. For backwards compatibility, this rule may also be +applied to attachments with mime-type of `application/octet-stream`. + ## Tradeoffs Assuming that encrypted events don't reuse attachments is controversial but