Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC1763: Proposal for specifying configurable message retention periods #1763

Open
wants to merge 37 commits into
base: old_master
Choose a base branch
from
Open
Changes from 34 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
687b650
first cut of MSC1763 for configurable event retention
ara4n Dec 30, 2018
f770440
ephemeral msging ended up in scope
ara4n Dec 30, 2018
b25367e
fix english
ara4n Dec 30, 2018
2aafa02
clarify this only applies to non-state events; fix retention JSON str…
ara4n Dec 30, 2018
64695ed
make conflict alg explicit for user retention settings
ara4n Dec 30, 2018
c493dbd
change max >= min invariant
ara4n Dec 30, 2018
0afc3af
spell out that self-destructing msgs need explicit RRs
ara4n Dec 30, 2018
7597e03
more validation on fields
ara4n Dec 30, 2018
7a8d204
spell out how the example server admin overrides would work
ara4n Dec 30, 2018
4646fcd
improve wording; spell out purge/redact dichotomy; add explicit alg
ara4n Dec 30, 2018
c55158d
clarify redaction semantic and default PL
ara4n Dec 30, 2018
6e33c2f
track max's idea of advertising retention per-server
ara4n Dec 30, 2018
28ea4e1
fix normatives
ara4n Dec 30, 2018
cca99dd
clarify client behaviour
ara4n Jan 4, 2019
a4974b6
make self_destruct set a timer in seconds rather than be binary.
ara4n Jan 4, 2019
c27394c
clarify warning about conflicts
ara4n Jan 5, 2019
f0553c0
Merge branch 'master' into matthew/msc1763
ara4n Aug 10, 2019
bdce6f1
remove per-message retention and self-destruct messages entirely to t…
ara4n Aug 10, 2019
a30a853
spell out that events will disappear from event streams when purged
ara4n Aug 10, 2019
c281420
add the 'why not nego?' tradeoff
ara4n Aug 10, 2019
ef215dd
clarify the intention to not default to finite message retention
ara4n Aug 10, 2019
0b6a209
spell out not to default to a max_lifetime
ara4n Aug 10, 2019
5c29779
incorporate review
ara4n Aug 11, 2019
032e63b
Apply suggestions from code review
ara4n Aug 11, 2019
1a4101e
link #2228
ara4n Aug 11, 2019
90b17d6
units
ara4n Aug 11, 2019
32f21ac
lifetimes in milliseconds
ara4n Aug 16, 2019
a1b8726
fix json number ranges
ara4n Aug 17, 2019
ee0a7ee
Update 1763-configurable-retention-periods.md
richvdh Aug 19, 2019
cabef48
Apply suggestions from code review
ara4n Aug 26, 2019
f5c3729
incorporate review
ara4n Aug 26, 2019
f8ceb97
spell out an example UI for warning about retention
ara4n Aug 26, 2019
8b1a0c3
clarify care & feeding of DAG
ara4n Aug 28, 2019
9357ec6
incorporate more @richvdh review
ara4n Aug 28, 2019
ac2f87e
Apply suggestions from code review
ara4n Sep 3, 2019
116c5b9
split out media attachment clean-up to #2278
ara4n Sep 3, 2019
f809087
Massively rewrite the proposal
babolivier Oct 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
323 changes: 323 additions & 0 deletions proposals/1763-configurable-retention-periods.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,323 @@
# Proposal for specifying configurable per-room message retention periods.
ara4n marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

@ShadowJonathan ShadowJonathan Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m sensing an innate conflict within this MSCs interests, one where it both wants to reduce server history in rooms, but where it also simultaneously expects to be able to fetch that history from thin air at any convenient time. I have a feeling it’s written with the underlying idea that large servers will carry all the events in the federation, with some servers being able to fetch from those at any time.

…however, this is mentioned nowhere in the MSC, where it skirts around these problems by putting these assumptions between the lines, while not thinking critically about what this means for the larger federation; more dependency on large servers.

With this, it does not bring a lucid solution to the problem of dealing with history retention, one where any server eventually has to face that it cannot fetch events it knows exist(ed), but are now expected to respond with them to a client’s query.

The semantic equivalent of HTTP Error 410 (“gone”) has to exist somewhere here, to be able to tell clients it’s unable to fetch a historical event due to history retention, and all sad and happy paths that spring from that. The current stance against this is “you’re SOL, have a 404 with no context”.


I don’t see this MSC deal with the reality that it is deleting events, I don’t see a coherent solution to allow some servers to “archive” history, and make that explicit (also in the rooms, for privacy concerns, for people who wanna know which servers are ignoring retention rules and archiving anyways)

Servers ignoring retention rules does have a basis, namely one of actually archiving historic conversations, in a similar philosophy as The Internet Archive. If this MSC were to go through as-is, then we’d have a similar situation as the general internet, namely one where all history is lost to time due to individual retention strategies.

While reliance on large servers isn’t what a federation would want, an explicit form of mentioning where at least people are aware which servers are backing up, and which ones aren’t, would help this MSC greatly in the long run.


A major shortcoming of Matrix has been the inability to specify how long
events should stored by the servers and clients which participate in a given
room.

This proposal aims to specify a simple yet flexible set of rules which allow
users, room admins and server admins to determine how long data should be
stored for a room, from the perspective of respecting the privacy requirements
of that room (which may range from a "burn after reading" ephemeral conversation,
through to FOIA-style public record keeping requirements).

As well as enforcing privacy requirements, these rules provide a way for server
administrators to better manage disk space (e.g. to enforce rules such as "don't
store remote events for public rooms for more than a month").

This proposal originally tried to also define semantics for per-message
retention as well as per-room; this has been split out into
[MSC2228](https://github.com/matrix-org/matrix-doc/pull/2228) in order to get
the easier per-room semantics landed.

## Problem:

Matrix is inherently a protocol for storing and synchronising conversation
history, and various parties may wish to control how long that history is stored
for.

* Users may wish to specify a maximum age for their messages for privacy
purposes, for instance:
* to avoid their messages (or message metadata) being profiled by
unscrupulous or compromised homeservers
* to avoid their messages in public rooms staying indefinitely on the public
record
* because of legal/corporate requirements to store message history for a
limited period of time
* because of legal/corporate requirements to store messages forever
(e.g. FOIA)
* to provide "ephemeral messaging" semantics where messages are best-effort
deleted after being read.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I question the feasibility of this - on what I essentially see as a matrix-specced version of Synapse's History Purge functionality. What would qualify exactly as "after read"? Shouldn't this be removed and left alone for MSC2228 to specify or address?

* Room admins may wish to specify a retention policy for all messages in a
room.
* A room admin may wish to enforce a lower or upper bound on message
retention on behalf of its users, overriding their preferences.
ara4n marked this conversation as resolved.
Show resolved Hide resolved
* A bridged room should be able to enforce the data retention policies of the
remote rooms.
* Server admins may wish to specify a retention policy for their copy of given
rooms, in order to manage disk space.

Additionally, we would like to provide this behaviour whilst also ensuring that
users generally see a consistent view of message history, without lots of gaps
and one-sided conversations where messages have been automatically removed.

At the least, it should be possible for people participating in a conversation
to know the expected lifetime of the other messages in the conversation **at
the time they are sent** in order to know how best to interact with them (i.e.
whether they are knowingly participating in a ephemeral conversation or not).

We would also like to set the expectation that rooms typically have a long
message retention - allowing those who wish to use Matrix to act as an archive
of their conversations to do so. If everyone starts defaulting their rooms to
finite retention periods, then the value of Matrix as a knowledge repository is
broken.

This proposal does not try to solve the problems of:
* GDPR erasure (as this involves retrospectively changing the lifetime of
messages)
ara4n marked this conversation as resolved.
Show resolved Hide resolved
* Bulk redaction (e.g. to remove all messages from an abusive user in a room,
as again this is retrospectively changing message lifetime)
* Specifying history retention based on the number of messages (as opposed to
their age) in a room. This is descoped because it is effectively a disk space
management problem for a given server or client, rather than a policy
problem of the room. It can be solved as an implementation specific manner, or
a new MSC can be proposed to standardise letting clients specify disk quotas
per room.
* Per-message retention (as having a mix of message lifetime within a room
complicates implementation considerably - for instance, you cannot just
purge arbitrary events from the DB without fracturing the DAG of the room,
and so a different approach is required)

## Proposal

### Room Admin-specified per-room retention

We introduce a `m.room.retention` state event, which room admins can set to
richvdh marked this conversation as resolved.
Show resolved Hide resolved
mandate the history retention behaviour for a given room. It follows the
default PL semantics for a state event (requiring PL of 50 by default to be
set).

The following fields are defined in the `m.room.retention` contents:

`max_lifetime`:
ara4n marked this conversation as resolved.
Show resolved Hide resolved
the maximum duration in milliseconds for which a server must store events in this room.
Must be null or an integer in range [0, 2<sup>53</sup>-1]. If absent, or
null, should be interpreted as 'forever'.

`min_lifetime`:
ara4n marked this conversation as resolved.
Show resolved Hide resolved
the minimum duration in milliseconds for which a server should store events in this room.
Must be null or an integer in range [0, 2<sup>53</sup>-1]. If absent, or
null, should be interpreted as 'forever'.

`expire_on_clients`:
a boolean for whether clients must expire messages clientside to match the
min/max lifetime fields. If absent, or null, should be interpreted as false.
The intention of this is to distinguish between rules intended to impose a
data retention policy on the server - versus rules intended to provide a
degree of privacy by requesting all data is purged from all clients after a
given time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is true, does that mean that the retention rules apply to both servers and clients?

(Reading below, it seems that this is the case, but it seems unclear to me here.)


Retention is only considered for non-state events.
Copy link

@pv pv May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid question as general audience: what does this imply for the room topic, membership, etc. state data? Is the full history of e.g. who was a member of a room and when retained or purged? If retained, should the summary on the top mention this limitation?

Copy link
Contributor

@babolivier babolivier May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are state events and according to the current MSC they must be retained. The issue here is that some state events are used to authorise new events, e.g. so you can't send messages into a room that you haven't joined (i.e. in the state of which there's no join event from you), so purging state events could potentially break the room. We could theoretically avoid that by carefully selecting which state events should not be purged and which ones can (and I'm not even sure about that) but then it becomes a ticking time bomb because one day we're bound to forget about that and make some changes in state events without updating the retention policies spec and break everything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If retained, should the summary on the top mention this limitation?

I think this limitation is mentioned at the right place here, but ymmv.

Copy link

@pv pv May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The summary contains a statement "... set of rules which allow users, room admins and server admins to determine how long data should be stored for a room, from the perspective of respecting the privacy requirements of that room" which seems to incorrectly imply that the retention rules apply to all data. This was my initial understanding also when reading the configuration file in the current synapse implementation. Just a suggestion from user perspective, but I think it would be important to be clear what it does and doesn't do, so that people can make an informed decision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, that makes sense, fair point.


If set, these fields SHOULD replace other retention behaviour configured by
the user or server admin - even if it means forcing laxer privacy requirements
on that user. This is a conscious privacy tradeoff to allow admins to specify
explicit privacy requirements for a room. For instance, a room may explicitly
require all messages in the room be stored forever with `min_lifetime: null`.

In the instance of `min_lifetime` or `max_lifetime` being overridden, the
invariant that `max_lifetime >= min_lifetime` must be maintained by clamping
max_lifetime to be equal to `min_lifetime`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
max_lifetime to be equal to `min_lifetime`.
`max_lifetime` to be equal to `min_lifetime`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, it makes more sense to clamp min_lifetime to be max_lifetime, rather than the other way around, because currently, it makes sense to set min_lifetime and leave max_lifetime unset (and the result is as expected, as the min_lifetime takes effect, and the max_lifetime remains at its default), but if you set max_lifetime and leave min_lifetime unset, then it will unexpectedly ignore the value for max_lifetime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this also be added as a fallback when the max_lifetime >= min_lifetime invariant is broken?


If the user's retention settings conflicts with those in the room, then the
ara4n marked this conversation as resolved.
Show resolved Hide resolved
user's clients are expected to warn the user when participating in the room.
A conflict exists if the user has configured their client to create rooms with
retention settings which differing from the values on the `m.room.retention`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
retention settings which differing from the values on the `m.room.retention`
retention settings which differ from the values on the `m.room.retention`

state event. This is particularly important in order to warn the user if the
room's retention is longer than their default requested retention period.

The UI for this could be a warning banner in the room to remind the user that
that room's retention setting doesn't match their preferred default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read somewhere else that the spec doesn't mandate (anymore) how clients will expose UI elements to users, maybe a more abstract description should be used as to when the client is warned, such as;

  • when the client joins a room
  • when the retention settings are changed
  • when the client is viewing retention settings (e.g. "warning, there are 4 rooms which override this behaviour")


For instance:

```json
{
"max_lifetime": 86400000,
}
```

The above example means that servers receiving messages in this room should
store the event for only 86400 seconds (1 day), as measured from that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_lifetime is in milliseconds.

event's `origin_server_ts`, after which they MUST purge all references to that
event (e.g. from their db and any in-memory queues).

We consciously do not redact the event, as we are trying to eliminate metadata
and save disk space at the cost of deliberately discarding older messages from
the DAG.

```json
{
"min_lifetime": 2419200000,
}
```

The above example means that servers receiving this message SHOULD store the
event forever, but can choose to purge their copy after 28 days (or longer) in
order to reclaim diskspace.

### Server Admin-specified per-room retention

Server admins have two ways of influencing message retention on their server:

1) Specifying a default `m.room.retention` for rooms created on the server, as
ara4n marked this conversation as resolved.
Show resolved Hide resolved
defined as a per-server implementation configuration option which inserts the
state events after creating the room, and before `initial_state` is applied on
`/createRoom` (effectively augmenting the presets used when creating a room).
If a server admin is trying to conserve diskspace, they may do so by
specifying and enforcing a relatively low min_lifetime (e.g. 1 month), but not
specify a max_lifetime, in the hope that other servers will retain the data
for longer. This is not recommended however, as it harms users who want to
use Matrix like e-mail, as a permenant archive of their conversations.

2) By adjusting how aggressively their server enforces the the `min_lifetime`
ara4n marked this conversation as resolved.
Show resolved Hide resolved
value for message retention within a room. For instance, a server admin could
configure their server to attempt to automatically purge remote messages in
public rooms which are older than three months (unless min_lifetime for those
messages was set higher).

A possible implementation-specific server configuration here could be
something like:
* target_lifetime_public_remote_events: 3 months
* target_lifetime_public_local_events: null # forever
* target_lifetime_private_remote_events: null # forever
* target_lifetime_private_local_events: null # forever

...which would try to automatically purge remote events from public rooms after
3 months (assuming their individual min_lifetime is not higher), but leave
others alone.

These config values would interact with the min_lifetime and max_lifetime
values in the different classes of room by decreasing the effective
max_lifetime to the proposed value (whilst preserving the `max_lifetime >=
min_lifetime` invariant). However, the precise behaviour would be up to the
server implementation.

Server admins could also override the requested retention limits (e.g. if
resource constrained), but this isn't recommended given it may result in
history being irrevocably lost against the senders' wishes.

## Pruning algorithm
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do need something here to encourage clients to delete/discard the megolm keys for pruned e2e convos?


To summarise, servers and clients must implement the pruning algorithm as
follows. For each event `E` in the room:

If we're a client (including bots and bridges), apply the algorithm:
* if specified, the `expire_on_clients` field in the `m.room.retention` event for the room (as of `E`) is true.
* otherwise, don't apply the algorithm.

The maximum lifetime of an event is calculated as:
* if specified, the `max_lifetime` field in the `m.room.retention` event (as of `E`) for the room.
Copy link
Contributor

@babolivier babolivier Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as of E)

Isn't that going to create holes in the DAG of rooms and make pagination (and maybe also federation) potentially faffy? Also, in the event of the first retention policy of a room being set in the middle of the history of the room, won't that make it difficult/impossible to reach the messages that were sent before the policy was set?

We could think of solutions like retrieving the most recent expired event and purging everything before (though we'd need to take min_lifetime into account and figure out what to do if the retention policy is lacking, which seems to be left as an implementation detail), or redacting events upon expiry and only purging them if there's no event before, or also calculating the expiration date of an event using the current retention policy in the room rather than as it was when E was sent (i.e. making it an implementation detail whether the policy used is the current one in the room or the one as of E).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw from my own experience since the release of the support for this feature in Synapse people seem to expect retention policies to apply retroactively, so perhaps we should just use the latest m.room.retention state event in the room (even though I can see how it creates a different behaviour than most state events).

* otherwise, the message's maximum lifetime is considered 'forever'.

The minimum lifetime of an event is calculated as:
* if specified, the `min_lifetime` field in the `m.room.retention` event (as of `E`) for the room.
* otherwise, the message's minimum lifetime is considered 'forever'.
* for clients, `min_lifetime` should be considered to be 0 (as there is no
requirement for clients to persist events).

If the calculated `max_lifetime` is less than the `min_lifetime` then the `max_lifetime`
is set to be equal to the `min_lifetime`.

The server/client then selects a lifetime of the event to lie between the
calculated values of minimum and maximum lifetime, based on their implementation
and configuration requirements. The selected lifetime MUST NOT exceed the
calculated maximum lifetime. The selected lifetime SHOULD NOT be less than the
calculated minimum lifetime, but may be less in case of constrained resources,
in which case the server should prioritise retaining locally generated events
over remote generated events.

Server/clients then set a maintenance task to remove ("purge") old events and
references to their IDs from their DB and in-memory queues after the lifetime
has expired (starting timing from the absolute origin_server_ts on the event).
ara4n marked this conversation as resolved.
Show resolved Hide resolved
It's worth noting that this means events may sometimes disappear from event
streams; calling the same `/sync` or `/messages` API twice may give different
results if some of the events have disappeared in the interim.

A room must have at least one forward extremity in order to allow new events
to be sent within it. Therefore servers must redact rather than purge obsolete
events which are forward extremities in order to avoid wedging the room.

Server implementations must ensure that clients cannot back-paginate into a
region of the event graph which has been purged (bearing in mind that other
servers may or may not give a successful response to requests to backfill such
events). One approach to this could be to discard the backwards extremities
caused by a purge, or otherwise mark them as unpaginatable. There is a
separate related [spec
bug](https://github.com/matrix-org/matrix-doc/issues/2251) and [impl
bug](https://github.com/matrix-org/synapse/issues/1623) that the CS API does
not currently provide a well-defined way to say when /messages has hit a hole
ara4n marked this conversation as resolved.
Show resolved Hide resolved
in the DAG or the start of the room and cannot paginate further.
ara4n marked this conversation as resolved.
Show resolved Hide resolved

If possible, servers/clients should remove downstream notifications of a message
once it has expired (e.g. by cancelling push notifications).

If a user tries to re-backfill in history which has already been purged, it's
up to the server implementation's configuration on whether to allow it or not,
ara4n marked this conversation as resolved.
Show resolved Hide resolved
and if allowed, configure how long the backfill should persist before being
purged again.

Media uploads must also be expired in line with the retention policy of the
richvdh marked this conversation as resolved.
Show resolved Hide resolved
room. For unencrypted rooms this is easy; when the event that references a
piece of content is expired, the content must be expired too - assuming the
content was first uploaded in that room. (This allows for content reuse in
ara4n marked this conversation as resolved.
Show resolved Hide resolved
retention-limited rooms for things like stickers).

For encrypted rooms, there is (currently) no alternative than have the client
manually delete media content from the server as it expires its own local
copies of messages. (This requires us to have actually implemented a [media
deletion API](https://github.com/matrix-org/matrix-doc/issues/790) at last.)

Clients and Servers are recommended to not default to setting a `max_lifetime`
when creating rooms; instead users should only specify a `max_lifetime` when
they need it for a specific conversation. This avoids unintentionally
stopping users from using Matrix as a way to archive their conversations if
they so desire.

## Tradeoffs

This proposal tries to keep it simple by letting the room admin mandate the
retention behaviour for a room. However, we could alternatively have a negotiation
between the client and its server to determine the viable retention for a room.
Or we could have the servers negotiate together to decide the retention for a room.
Both seem overengineered, however.

It also doesn't solve specifying storage quotas per room (i.e. "store the last
500 messages in this room"), to avoid scope creep. This can be handled by an
MSC for configuring resource quotas per room (or per user) in general.

It also doesn't solve per-message retention behaviour - this has been split out
into a seperate MSC.

We don't announce room retention settings within a room per-server. The
advantage would be full flexibility in terms of servers announcing their
different policies for a room (and possibly letting users know how likely
history is to be retained, or conversely letting servers know if they need to
step up to retain history). The disadvantage is that it could make for very
complex UX for end-users: "Warning, some servers in this room have overridden
history retention to conflict with your preferences" etc.

We let servers specify a default `m.room.retention` for rooms created on their
servers as a coarse way to encourage users to not suck up disk space (although
it's not recommended). This is also how we force E2E encryption on, but it
feels quite fragmentory to have magical presets which do different things
depending on which server you're on. The alternative would be some kind of
federation-aware negotiation where a server refuses to participate in a room
unless it gets its way on retention settings, however this feels unnecessarily
draconian and complex.

## Security considerations

It's always a gentlemen's agreement for servers and clients alike to actually
uphold the requested retention behaviour; users should never rely on deletion
actually having happened.

## Conclusion

Previous attempts to solve this have got stuck by trying to combine together too many
disparate problems (e.g. reclaiming diskspace; aiding user data privacy; self-destructing
messages; mega-redaction; clearing history on specific devices; etc) - see
https://github.com/matrix-org/matrix-doc/issues/440 and https://github.com/matrix-org/matrix-doc/issues/447
for the history.

This proposal attempts to simplify things to strictly considering the question of
how long servers (and clients) should persist events for.