Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3216: Synchronized access control for Spaces #3216

Open
wants to merge 3 commits into
base: old_master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
266 changes: 266 additions & 0 deletions proposals/3216-spaces-synchronized-access-control.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
# MSC3216: Synchronized access control for Spaces

A mechanism like Spaces
([MSC1772](https://github.com/matrix-org/matrix-doc/pull/1772), now merged) is
frequently used to group together rooms with different topics, that belong to
the same community or organization with a single moderation team and policy. To
this end, it should be possible to define both roles ("powerlevels") and
permissions ("required powerlevels") once and then have them apply to every room
in a Space.

[MSC2962](https://github.com/matrix-org/matrix-doc/pull/2962) is one proposal
that aims to address the "roles" side, though not the "permissions" side.
Unfortunately, it has a high degree of complexity, does not fit well into the
existing protocol semantics, and has a lot of unanswered questions regarding
authorative homeservers which could lead to it becoming a de-facto centralized
mechanism.

This MSC proposes an alternative solution to the problem which does not suffer
from these issues, and which additionally also covers the "permissions" side of
the problem. *Unlike* the original MSC2962 (since split out into
[MSC3083](https://github.com/matrix-org/matrix-doc/pull/3083)), however, it does
not address membership restrictions at all - I consider this a separate concern
which may not even be specific to Spaces, and therefore it should live in a
separate proposal. It would be neither a dependency nor a dependent of this
proposal.

## Proposal

Whereas MSC2962 addresses the problem from an angle of a specially-privileged
Copy link
Member

@ara4n ara4n May 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, thanks for writing this up and the amount of effort which has gone into the proposal here.

However, I think there is a core misunderstanding on 2962 (which is probably my fault in terms of not writing it up well, or the write-up having bugs).

The proposed pseudo-users in MSC2962 are no different to the admin user(s) which already exist in the room. Just as a malicious server operator could reach in and puppet a human admin in the room, they could equally reach in and puppet the pseudo-user. The pseudo-user(s) have no more or less permission than the room admin(s), and do not cause any more centralisation (given you could & should have one per server where an admin is present).

So I don't think this introduces a confused deputy problem which is any worse than the risk in this proposal of the admin itself being abused - and it shouldn't matter whether changes are made directly by an admin or by the pseudo-user. Meanwhile, it avoids the complexities of "what happens if the admin isn't in the child room".

That said, this MSC might well provide a viable solution too - but I just want to ensure that MSC2962 isn't misunderstood!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, with "specially-privileged", I don't mean that the pseudo-user has higher privileges than the admin. Rather, I mean that it independently has privileges, and would need to implement its own authorization system - separate from the usual state resolution algorithm - to allow users to use its administrative privileges on their behalf.

This sort of "double authorization" is a typical candidate for the confused deputy problem; the risk is in there being any mismatches whatsoever between the pseudo-user's authorization mechanism and the regular authorization mechanism in the protocol. This could allow a user to command the pseudo-user to make an administrative change that they, themselves, would not have been allowed to make directly (because the change itself is made on the authority of the pseudo-user, not the commanding user).

This is unlike the scenario of a server operator puppeting a user; there, trust has been explicitly placed in that server admin to use this power responsibly (which can include "not at all"), and authorization rule mismatches do not play a role.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for lag on this - have been on holiday.

Rather, I mean that it independently has privileges, and would need to implement its own authorization system - separate from the usual state resolution algorithm - to allow users to use its administrative privileges on their behalf.

Ah, right, so you're worried that the power levels of the admins in a room could get out of sync with the power levels of the serverbots of their servers.

pseudo-user which is externally controlled with its own authorization rules,
this proposal instead addresses it as strictly a *replication* problem - if an
already-privileged user chooses to apply role or permission changes to a Space
as a whole, how can this be replicated to all of its child rooms in a
semantically consistent and predictable manner?

This approach has three main advantages:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a purely MSC-style note, it's better to tell the reader what the proposal is before telling them the advantages, then list the advantages along with the disadvantages later: it's not a sales pitch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the specific 'advantages' phrasing here would probably disappear if this MSC were not phrased as 'an alternative to ____', and it would be about 'motivation' instead, I disagree that the content itself is in the wrong place, and the ordering is deliberate.

In technical writing, it's generally good practice to lead with a motivation immediately after describing the high-level concept (as has been done here), before diving into the mechanical details; that way, the reader can not only more easily determine the relevance of something to what they are currently working on, but they also have a conceptual frame of reference to understand the specific design decisions and direction in.

(That having been said, in this case point 1 and most of point 2 would no longer be relevant to mention if the references to the other MSC were removed - so the motivation section would end up a lot shorter.)


1. It does not introduce an all-powerful pseudo-user with its own access control
mechanisms. This avoids the entire class of [confused deputy
problems](https://en.wikipedia.org/wiki/Confused_deputy_problem), where said
pseudo-user could be abused by an attacker to make changes that they would
not otherwise be permitted to make, if for any reason its access control
mechanism didn't *exactly* match that of the normal powerlevel system.
2. All access control changes are made *directly* by the initiating user. This
means that there is no question of which homeserver should be authoratively
responsible for updating the powerlevels - the answer is always "the
homeserver of the user making the changes". Likewise, it means that it is
always clear from the room history who initiated the change, without needing
further lookups.
3. It is completely agnostic to the actual powerlevel system used; since the
proposal just concerns itself with *replication*, it does not matter *what*
gets replicated, and so any future changes to the access control mechanism in
the Matrix protocol would likely not require changes to this replication
mechanism.

In the rest of this proposal, "roles" will be used to refer what are currently
the powerlevel assignments to specific users, and "permissions" will be used to
refer to what are currently the *required* powerlevels for executing some sort
of action.

The high-level mechanism works as follows: when a user updates either the roles
or permissions of a Space, their homeserver will, on their behalf, translate
this to a role/permission update for each room that the Space contains. If this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The space contains" ie. the spaces's sub-rooms? Doesn't it have to be the other way around, ie. the room needs to declare that it wants to delegate control to a space? Otherwise I can make a space and add any room to it, then trick an admin of that room into changing the power levels of that rooms by changing the power levels of my space.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A central point of this MSC is that the Space itself - deliberately - holds no authority over any room. Every Space-originating PL change in a room is actually made directly by the user initiating it through a Space, who therefore already needs to have the necessary administrative access in the room to apply those changes.

In other words: merely adding a room to a Space is a no-op from an access perspective. For a room PL change to succeed through a Space, the originating user would need to have the necessary permissions already anyway, and so could just have made the changes on the room directly, bypassing the Space entirely.

Under those conditions, do you still see a way in which this could be abused?

is not possible because the user has an insufficient role in some rooms, then
the operation will atomically fail unless an "allow partial" flag is set. If the
replication fails in *all* rooms, the operation will fail completely, with or
without the flag.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this also mean it would fail if the user just didn't want to be in all of the rooms in that space?

Copy link
Author

@joepie91 joepie91 Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think it is technically possible allowed by the spec for a user to create an event in a room that they are not joined to, as long as they are not banned or kicked and they have a sufficient powerlevel assigned to them? I'm not 100% sure about this, though. If not, then yes, that would be a concern.


UI-wise, this allows for roughly three scenarios:

1. The user has the necessary role in all of the Space's rooms, and the
operation succeeds at once. This is the common case for Spaces which are
correctly set up to provide multiple topic rooms for a single community or
organization.
2. The user has the necessary role in some but not all of the Space's rooms. In
this case, the flag-less operation will fail, and the user can be prompted
whether they would like to apply the change to those rooms where it is
possible anyway. If the user chooses yes, the operation is reattempted, but
this time with an "allow partial" flag, and it succeeds. This case will
typically occur when a Space is not correctly set up, or in private Spaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also happen on a "correctly set up" space where the space is a public umbrella of independent thematic rooms, like "Country space" or "International project" space (like OSM).

for the user's personal bookmarking/categorization purposes.
3. The user does not have the necessary role in any of the Space's rooms. The
operation will fail. This is also typical for private Spaces.

The prerequisite for correct setup of a Space, such that this replication
mechanism will function as intended, is for at least one user to have a
sufficient role in all of the Space's rooms - where "sufficient" means "has the
necessary role to apply all of the intended access control changes, including
future ones". In practice, the easiest way to satisfy this requirement under
typical circumstances is to ensure that they are an Admin (PL 100) in all of the
Space's rooms.

From this point on, the "bootstrapping user" can then simply define the rest of
the roles and permissions for the Space, which gets replicated to all of its
rooms; those with a sufficient role on this replicated list can then,
transitively, start to use the Spaces access control replication mechanism
themselves, as *their* roles are now also consistent across rooms. In this way,
the mechanism keeps itself consistent across rooms over time, so long as no
rules are defined directly in a specific room that interfere with it (such as a
single-room demote).

Such interfering rules may actually be desirable in some cases; for example,
preventing a specific user from obtaining a specific role in a specific room,
despite holding it in the rest of the Space. Because of this, replicated roles
and permissions are defined separately in `m.room.power_levels` from the
explicitly-given ones; this works similarly to MSC2962, where the
explicitly-given settings take precedence.

### Technical changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is backwards. Right now it puts the inherited keys in a separate key. This means that it changes the permission check algorithm. I wonder if it makes more sense to add an overrides key, and maybe even local_overrides and child_overrides keys. (Shown below, but maybe we can skip them until we have specific use cases).

When you update one of these keys the homeserver should propagate the changes by recomputing the regular m.room.power_levels object. This way the m.room.power_levels is still all that is needed to check permissions, using the same algorithm. I think the significant downside is that a client that is not spaces aware will have their changes overwritten on the next inherited change.

So for example:

// Space
{
   overrides: { ... }, // Human editable.
   local_overrides: { ... }, // Human editable.
   child_overrides: { ... }, // Human editable.
   ... // Computed as overrides + local_overrides
}

// Sub-space
{
   overrides: { ... }, // Human editable.
   local_overrides: { ... }, // Human editable.
   child_overrides: { ... }, // Human editable.
   ... // Computed as:
       //   space.overrides + space.child_overrides
       //   + overrides + local_overrides
}

// Room
{
   overrides: { ... }, // Human editable.
   local_overrides: { ... }, // Human editable.
   child_overrides: { ... }, // Human editable.
   ... // Computed as:
       //   space.overrides + space.child_overrides
       //   + subspace.overrides + subspace.child_overrides
       //   + overrides + local_overrides
}

(For now I assume that each space has one "main" parent. It would be possible to do merging but the expected behaviour there is probably highly controversial).


The `m.room.power_levels` event would be extended with a `space_defaults`
property, which may contain all of the same keys (except for `space_defaults`
itself) that an `m.room.power_levels` event may contain directly.

Additionally, an `m.space.power_levels` state event would be introduced, which
stores the most recent replicated powerlevels directly into the Space. This
event does not play a role in the state resolution algorithm, and only serves as
a way for clients to learn what the most recent Space-level powerlevels were set
to, for UI/display purposes. This way, clients do not need to pull the
`space_defaults` from multiple rooms and try to work out which of them actually
belong to this specific Space.
Comment on lines +115 to +121
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is exactly in this event? Is is just a copy of the space_defaults from the parent space? What if there are nested parents? In that case is it the "merge" of all of them?

Also I'm not sure how this is not used in the resolution algorithm. How else does the algorithm get the space_defaults from parent spaces?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also finding this quite confusing - an example would go a long way here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially just a bookkeeping event, that is notably stored in the Space, not in its rooms.

When a user opens up the "Space-wide powerlevels" UI, it will need to be pre-populated with the most recent set of Space-wide powerlevels (ie. "replicated powerlevels"), so that they can adjust them and execute a new replication. But where would it get these previously replicated PLs from?

That's what this event is for. It's essentially just a copy of whatever the last replicated PLs for the space were, for clients to work from UI-wise. If it didn't exist, clients would have to fish these powerlevels out of the space_defaults from rooms in the Space - and importantly, if the room is in multiple Spaces, the client would need to somehow work out whether those space_defaults were actually set by the specific Space that the user is trying to edit or not, which would get messy fast.

So, in summary:

  • space_defaults: Key within m.room.power_levels, stored in a child room. Used in state resolution to authorize room events.
  • m.space.power_levels: Event type, stored in a Space. Not used in state resolution, but used to populate client UIs when a user wishes to edit the Space-wide PLs.

As long as the space-default PLs in a child room are only set through a single parent space, these two things will always contain the exact same powerlevels.

Does that clarify it?


The state resolution algorithm would be modified so that instead of the current
permission/role lookup algorithms...

```
kick / ban / invite / redact -> 50
events[type] -> state_default / events_default -> 50 / 0
users[mxid] -> users_default -> 0
```

... the algorithms are changed to include a lookup in the `space_defaults`, like
so:

```
kick / ban / invite / redact -> space_defaults.kick / space_defaults.ban / space_defaults.invite / space_defaults.redact -> 50
events[type] -> space_defaults.events[type] -> state_default / events_default -> space_defaults.state_default / space_defaults.events_default -> 50 / 0
users[mxid] -> space_defaults.users[mxid] -> users_default -> space_defaults.users_default -> 0
```

Note that a `space_defaults` lookup for specific users and event types is done
__before__ using a local event/user default like `state_default`; otherwise, it
would not be possible for a Space to specify any specific entries at all
whenever a local default is set. Essentially, "check specific before general
levels" has precedence over "check local before space levels".

For this to work correctly, it's important that clients and homeserver
implementations *do not* automatically set any permissions in a newly-created
room, other than those which the user has explicitly asked for - explicitly
reproducing the defaults in an `m.room.power_levels` event would result in the
fallback to `space_defaults` never occurring.

A new API endpoint would be created at `POST
/_matrix/client/r0/spaces/{spaceId}/set_power_levels`, taking a `power_levels`
property that defines the new power levels to apply to the Space's rooms (to be
inserted in the `space_defaults` property of each room's power levels event),
and an `allow_partial_update` property that implements the behaviour described
earlier.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may make sense to provide a list of rooms to ignore instead of allowing all failures. This way you can try, get an error back with the rooms that you don't have permission to update, then post the update. This would provide a nice UX as they see what changes they will be making (or the list of rooms their changes won't affect) before actually making the atomic update.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking this over for a bit, I think this is a good idea, also for another reason. It would prevent a sort-of race condition in the following scenario:

  1. User applies powerlevels to A,B,C,D
  2. Fails because of room A
  3. User loses permissions to change powerlevels in B
  4. User retries with allow_partial_update

Within specifying explicit rooms to allow failure for, this would result in the change being applied to C and D, whereas the client may have previously believed that it would also apply to B (which is no longer true).

This does introduce a tradeoff, though: clients need to be prepared to deal with potentially multiple failures in a row, UX-wise. Either by choosing to accept the change automatically and retry with an expanded allow-failures list, or by telling the user "hey something else has changed, now the list of unapplied rooms will be this list instead".

Does this sound right, or can you see some kind of UX issue with this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, a client would call this API in addition to settings the power levels in the space directly? And if you used a client that didn't know about this API, your changes wouldn't propagate to the rooms?

Again, an example would be massively appreciated.

Copy link
Author

@joepie91 joepie91 Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this proposal, the Space is treated as also being a room; therefore, "setting PLs to all rooms in a Space" is a separate operation from "setting PLs for a Space itself". That is why this is a separate endpoint, rather than using a general-purpose "set PL for room" approach.

Calling this endpoint results in two things happening on the server side:

  1. A m.space.power_levels event is created in the Space itself, for bookkeeping purposes (as explained in another thread).
  2. The m.room.power_levels events of all child rooms in the Space are updated with new space_defaults.

Both of these contain the same powerlevels, as specified in the power_levels property of the API call. Notably, no m.room.power_levels is created in the Space itself - as that would be a semantically different operation.


XXX: An endpoint prefix of `/rooms` could also be used instead of `/spaces`?
This would be consistent with eg. the `join` and `invite` routes, but since this
operation is only semantically meaningful on a Space, it should perhaps *not* be
consistent in that way.

Subspaces should be accounted for, and treated as a part of the Space, for the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do the new space defaults interact with any existing space defaults, possibly one that came from a space containing this one, or from an intermediate subspace? In other words, is it possible to use this mechanism in a hierarchical way, or does it support only one "manager" level by design?

purpose of this operation; that is, all rooms in subspaces (to arbitrary depth)
should have the powerlevel changes applied to them, likewise causing a failure
if this is not possible for some rooms.

The following error responses may occur for this endpoint:

- __200__ - Operation succeeded as requested; either all or some rooms have been
updated, depending on the `allow_partial_update` flag.
- __403 M_FORBIDDEN__ - Reserved for Space-level permission issues, eg. if there
were to be some kind of required permission in the Space itself for doing a
replicated powerlevel change. This proposal does not currently define such a
mechanism, but reserves it to avoid confusion with permission issues caused by
rooms.
- __403 M_ALL_FORBIDDEN__ - The user does not have the necessary access to
change powerlevels in *any* of the rooms in the Space. Retrying with
`allow_partial_update` will still fail.
- __403 M_PARTIALLY_FORBIDDEN__ - The user does not have the necessary access to
change powerlevels in *some* of the rooms in the Space. Retrying with
`allow_partial_update` will likely succeed; though a client should still
account for receiving an `M_ALL_FORBIDDEN` in that case, eg. if the user's
role in the affected rooms changes in the meantime.

XXX: Should we allow a homeserver to always return M_PARTIALLY_FORBIDDEN even
when *none* of the rooms can be updated, eg. for performance reasons? This would
make it difficult for clients to present a coherent UX, as a "do you want to
apply the change to only some of the rooms" dialog will be shown even in cases
where choosing "yes" could never result in success anyway - which would be
confusing messaging to users.

If successful, the response body is as follows:

```js
{
"partialSuccess": true, // Whether only some rooms succeeded (true) or *all* rooms succeeded (false)
"failedRooms": [ // A list of room IDs of those rooms whose powerlevels could *not* be updated
"!room1:homeserver.tld",
"!room2:homeserver.tld",
"!room4:homeserver.tld"
]
}
```

## Potential issues

1. This approach will require a new room version to be specified, which takes
into consideration `space_defaults`. The same is true, however, for MSC2962.

2. With this approach, powerlevel changes will be published as an event created
by the user who actually initiated them, rather than a bot. While this
removes a certain degree of anonymity from moderation operations, that
anonymity can be regained by simply outsourcing this job to a bot user if a
given community desires to have this property.

3. Determining whether a powerlevel change was "successfully replicated" is
dependent on having an accurate view of the *current* permissions and roles
of each affected room. There may be circumstances where the initiating user's
homeserver is 'behind' on this, and so their homeserver may end up
(incorrectly) reporting success, with the change being reverted down the
line. This is already currently the case for regular powerlevel changes, and
should be quite rare, so I don't feel that this is a significant problem.

4. It is possible for a room to be "desynced" from the Space when one room
moderator removes the role of another, making it impossible for them to do a
replicated powerlevel change across the Space without using the 'partial'
flag. While this doesn't literally *break* anything, it can be confusing for
a moderator to be faced with an unexpected warning dialog, and there should
probably be some mechanism for debugging such desyncs; eg. a way to see the
specific rooms in the Space where the user's role diverges from the other
rooms.

5. If a room is added to multiple Spaces, and receives different replicated
powerlevel changes through both of them, the room's powerlevels can
"oscillate" between those of the two Spaces. This is mainly a UX issue.

## Alternatives

Those listed in
[MSC2962](https://github.com/matrix-org/matrix-doc/blob/matthew/msc2962/proposals/2962-spaces-access-control.md#alternatives),
as well as MSC2962 itself.

## Security considerations

As stated above, a major consideration for choosing this approach is that it's
not prone to confused deputy problems.

It does not introduce any new security mechanisms, other than a small mechanical
change in state resolution, the correctness of which can be easily reasoned
about.

## Unstable prefix

Unstable implementations should use the following identifiers:

- Space-level state event: `net.cryto.msc3216.space.power_levels` instead of
`m.space.power_levels`
- Space defaults property: `net.cryto.msc3216.space_defaults` instead of
`space_defaults`
- API endpoint:
`/_matrix/client/unstable/net.cryto.msc3216/spaces/{spaceId}/set_power_levels`
instead of `/_matrix/client/r0/spaces/{spaceId}/set_power_levels`
- Room version: `net.cryto.msc3216.1`