Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

m.room.aliases is broken and here's how to fix it: #1130

Closed
jevolk opened this issue Feb 23, 2018 · 10 comments
Closed

m.room.aliases is broken and here's how to fix it: #1130

jevolk opened this issue Feb 23, 2018 · 10 comments
Labels
client-server Client-Server API feature Suggestion for a significant extension which needs considerable consideration

Comments

@jevolk
Copy link
Contributor

jevolk commented Feb 23, 2018

The room aliases system has several fundamental defects which render it unusable. These include security weaknesses. This issue will proffer two remedies to be applied in stages which will repair these defects and allow the system to be used as intended.

Background

The room aliases system is fundamental to the operation of the entire Matrix communication protocol. Aliases provide human-readable addressing for Matrix chat rooms. Without aliases, rooms can only be addressed by a single globally unique opaque identifier, the room_id. As an analogy, if the room_id was a networked computer's IP address, then room aliases would be the domain names pointing to the IP address. Contrary to an IP address on the internet, the Matrix protocol lacks any means by which to provide routing for room_id's, for example, by failing to provide an overlaying DHT. Due to this limitation, room aliases are required for finding rooms within the Matrix system.

How they work

The specification for room aliases is spread out over many places. They are first found in the Matrix Introduction, section 2.5.1. Their grammar is further detailed in the Matrix Appendix materials. They are stored and federated through a "state event" with the type m.room.aliases specified in the Client-Server protocol r0.3.0 section 10.5.1. Additionally, a suite of endpoints has been made available in the Client-Server protocol r0.3.0 section 7.2.

Example of the m.room.aliases event content:

"content":
{
    "aliases":
    [
        "#somewhere:localhost",
        "#another:localhost"
    ]
}

Defects

Poor key selection and value typing

The key of the m.room.aliases event (the state_key) is a homeserver domain and the value (content) an array of the aliases. With the key of the event being a homeserver domain, only one event can be used to query the aliases provided by a specific homeserver. To provide multiple aliases, an array type was selected for the content.

This immediately confuses client developers. Developers are accustomed to modifying the state of the room through the 6.4.1 rooms/state PUT endpoint. Intuitively, developers assume that they should faithfully copy the existing event content, update it with changes, and then submit it to that endpoint. This may or may not be the case depending on the implementation, but the 7.2 suite is also redundantly provided to perform these operations.

Lack of authority over the homeserver's specific key

Ideally, homeservers choose the aliases they believe best describe the room. For example, one homeserver may choose #wines:foo.com while another homeserver chooses #sommeliers:bar.fr. The chosen names are how users of their homeserver may find the room. If those aliases are changed or disappear, their users may no longer be able to find the room. The problem here is that an operator of any server with access to change one m.room.aliases event has access to change all of the other aliases events for other servers apart from their own.

Related work

A similar issue focusing specifically on the security implication was brought up by #625 over two years ago and it remains open with zero ideas. Not long after, #654 mentions an endpoint which is used by Riot but has remained unspecified for over three years as evidenced by #417. Overall, no substantive attempts have been made to solve this problem. Instead, its brokenness is assumed as gospel and more is built on top of the error rather than solving it.

Solution

By resolving this issue here, this effort hopes to:

  • Close all of the above issues and make the aliases system work as intended without all of the cautionary caveats and warnings.
  • Eliminate the secret endpoints used by Riot.
  • Eliminate the Client-Server 7.2 endpoint section.

Stage One

The first phase to address this problem is simple: implementations deny homeservers from updating m.room.aliases events which don't belong to them. In other words, the room's state machine denies a transition from an m.room.aliases event where state_key != origin.

Stage Two

The m.room.aliases event is replaced with an m.room.alias event. The new event is appropriately keyed and valued. As an example, consider the following form:

"origin": "foo.com",
"type": "m.room.alias",
"state_key": "#sommeliers:foo.com",
"content":
{
    /* available for discussion */
}

Conclusions

This problem exists because Matrix.org believes room state cannot be deleted. This is an erroneous belief. Room state is de facto deleted when it is the target of an m.room.redaction event.

The fact that Matrix.org has failed to implement deletions has leaked into the specification by building on this error by designing the content of the m.room.aliases event as an array such that it allows for aliases to be removed. This error has then lead matrix.org to the realizations of the above defects and has lead to further errors by way of the unspecified directory endpoints.

Matrix.org will have to implement proper behavior of their room state machine when a state event is the target of a redaction. Whether or not the specification should be updated to indicate this behavior de jure may be a matter for another issue.

@richvdh
Copy link
Member

richvdh commented Feb 23, 2018

oh thank goodness you came to our rescue.

@jevolk
Copy link
Contributor Author

jevolk commented Feb 23, 2018

house2
Rescues are a speciality of mine.

@richvdh
Copy link
Member

richvdh commented Feb 23, 2018

alternatively expressed with slightly less snark: you have been banned from multiple Matrix rooms for continually shouting about how you would have designed Matrix better. Good for you, but shouting about it here isn't any more appreciated.

This problem exists because Matrix.org believes room state cannot be deleted.

I fail to see how you have reached the conclusion that the problems you refer to are anything to do with state deletion.

Room state is de facto deleted when it is the target of an m.room.redaction event.

No, room state is reset to an empty content on a redaction event. You can also do that without a redaction event.

The fact that Matrix.org has failed to implement deletions has leaked into the specification by building on this error by designing the content of the m.room.aliases event as an array such that it allows for aliases to be removed.

really, no.

@richvdh richvdh closed this as completed Feb 23, 2018
@jevolk
Copy link
Contributor Author

jevolk commented Feb 23, 2018

You're not helping people take your protocol seriously by sabotaging attempts to address the issues and stifling discussion this way. You're also not making any attempt to fix it.

@jevolk
Copy link
Contributor Author

jevolk commented Feb 23, 2018

I'll address your response on the merits anyway

alternatively expressed with slightly less snark: you have been banned from multiple Matrix rooms for continually shouting about how you would have designed Matrix better. Good for you, but shouting about it here isn't any more appreciated.

This is an inappropriate and unprofessional ad hominem attack merely because you don't like what I have to say. Not cool.

This problem exists because Matrix.org believes room state cannot be deleted.

I fail to see how you have reached the conclusion that the problems you refer to are anything to do with state deletion.

Perhaps I didn't make this clear before you closed the issue but room state deletion is required for an effective implementation of the phase two solution.

Room state is de facto deleted when it is the target of an m.room.redaction event.

No, room state is reset to an empty content on a redaction event. You can also do that without a redaction event.

No, that's what your implementation does. The redaction event is a logical delete indication, out of fact.

The fact that Matrix.org has failed to implement deletions has leaked into the specification by building on this error by designing the content of the m.room.aliases event as an array such that it allows for aliases to be removed.

really, no.

The design of the array value is clearly because deletions are not supported (perhaps redactions didn't even exist at that point) -- so really, yes.

@richvdh
Copy link
Member

richvdh commented Feb 23, 2018

ok, I'm sorry I was hasty. I saw assumptions about the design of the protocol which I (still) feel are incorrect, and closed the issue on that basis rather than stopping to consider the whole of your proposal. I'll reopen this so that we can discuss the issues you raise, should you so wish.

The design of the m.room.aliases event as an array predates my involvement with the project, but, looking at the evolution of the implementation, I believe it just felt like the natural structure for listing the aliases supported by a server. I don't think it was anything to do with support or otherwise for removing state from a room. It may also be worth noting that a number of the directory endpoints predated the existence of the m.room.aliases event.

I understand your proposal for "stage one" and agree that it is one potential solution to solving the current byzantine split of alias information. Indeed it is rather elegant, though would require a bump in the event authorisation rules, which is non-trivial to achieve in practice.

However, to be honest, I don't understand why "stage two" is necessary or even particularly helpful. Can you explain further?

@richvdh richvdh reopened this Feb 23, 2018
@jevolk
Copy link
Contributor Author

jevolk commented Feb 24, 2018

I believe it just felt like the natural structure for listing the aliases supported by a server.

Indeed, multiple aliases for a domain may exist. Therefor it may feel natural to organize this data by:

  • Selecting the domain as the key.
  • Selecting a list (array) as the value.

I don't think it was anything to do with support or otherwise for removing state from a room.

By selecting the aforementioned state_key and content list value, the room can feature support for both adding and deleting room aliases by replacing the m.room.aliases state event entirely. Because removing state from a room is not supported, this is the only way to feature the deleting of aliases.

It may also be worth noting that a number of the directory endpoints predated the existence of the m.room.aliases event.

I should also correct a word choice from my original post. The specified endpoints at present are not truly redundant. Their purpose appears to add atomicity for updating the m.room.aliases list content. Consider when two clients attempt to update that event through the state endpoint: each will acquire the existing content, modify the list, and release the new content. One of those clients will overwrite the other's changes. The endpoints provide transactionality to avoid this.

However, to be honest, I don't understand why "stage two" is necessary or even particularly helpful. Can you explain further?

"Stage one" may address the second defect, the security weakness, but it does not address both defects. "Stage two" is comprehensive because it solves the problem the way most problems in computer science and engineering are tackled: by breaking it down into smaller parts. The m.room.alias event introduces granularity. The current list value in the m.room.aliases content lacks granularity. This is a pitfall. Its detrimental effects are seen with an exception for this event being made where clients must use REST endpoints to modify it. Other state events are directly updated as events. This is an inelegance.

There is no benefit for the specification to designate aliases content as a list merely because multiple aliases are in play. There is already implementation precedent for collections of state events by type. Consider that the m.room.member events are granular. There is not a single m.room.member state event listing all of the members in a content array type. Implementations are tasked with managing collections of state: the m.room.alias is just another collection.

In fact, state deletion support is not truly necessary for an m.room.alias event, though it would be preferred. Instead, the content can contain a key such as {"active":true}. Existing implementations redacting this event, destroying its content, would actually deactivate the alias; even if the state_key remains present in the implementation's mapping. I don't find this to be elegant. It would be ideal if implementations treated redactions of a state event as a de-mapping. As the maintenance of state mappings are an implementation issue, I stand by my conclusion that implementations are at fault for allowing a state_key to remain in their mapping which points to an empty content. That is useless.

@richvdh
Copy link
Member

richvdh commented Feb 24, 2018 via email

@richvdh richvdh added the feature Suggestion for a significant extension which needs considerable consideration label Mar 7, 2018
@turt2live turt2live added the client-server Client-Server API label Feb 6, 2019
@richvdh
Copy link
Member

richvdh commented May 2, 2019

to follow up on part of this, jevolk defined stage one as:

The first phase to address this problem is simple: implementations deny homeservers from updating m.room.aliases events which don't belong to them. In other words, the room's state machine denies a transition from an m.room.aliases event where state_key != origin.

As far as I can tell, this has always been the case in practice, though it only made it into the written specification as of #1591. (Note that origin is actually a redundant field which should be removed as per https://github.com/matrix-org/matrix-doc/issues/1664; the value of interest is the domain of the sender.)

@richvdh
Copy link
Member

richvdh commented Aug 29, 2019

Given that stage one of the original proposal is redundant (as it has always been the case in practice), I have split stage two out to matrix-org/matrix-spec#532 and will close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-server Client-Server API feature Suggestion for a significant extension which needs considerable consideration
Projects
None yet
Development

No branches or pull requests

3 participants