Currently some servers will de-duplicate media in an unpredictable way whereas others will not. Further, some implementations have the capability to return a potentially unexpected origin for their MXC URIs. This proposal aims to acknowledge the status quo by specifying it explicitly.
MXC URIs can have an origin which does not match the server name on /upload
. This is currently
implied as potentially being possible under the specification, however this MSC aims to make that
behaviour to be valid and expected by clients. This means, for example, that @alice:example.org
could receive an MXC URI pointing to mxc://cdn.upstream.com/abc123
. No changes are implied by the
origin: it is to be looked up like any other domain name, just as it does today.
Servers SHOULD NOT attempt to "deduplicate" media by returning the same MXC URI for previously uploaded content, unless the upload meets requirements outlined below. Uploads are often accompanied by a single reference in an event, and in a world where it is possible to delete media by event ID it is important to be able to delete a specific record without side effects. How the implementation handles this internally is up to it - it just cannot return the same MXC URI for what appears to be the same content.
If the server wants to support deduplication, it should only do so when the media (body), uploader, origin homeserver, and provided filename all match. This scenario could be perceived as a missed request on the client side and therefore could be a retry.
Enforcing that media cannot be deduplicated at the MXC URI level could lead to media ID exhaustion on the server side, however by explicitly allowing the server to return a different origin for the URI the pool of potential IDs is unbounded.
By explicitly allowing the server to return a content_uri
which does not match their server name
the server could potentially imply that media was uploaded to a different server. For example, a user
wishing to upload to example.com
could be told that their media got uploaded to the public matrix.org
homeserver instead. This is perceived by the proposal as a bad idea and needs no enforcement to prevent,
as unless the server managed to gain access to matrix.org
the media will safely 404.
Implementations may have already deduplicated media such that one MXC URI does not reference one event, however the intent is to fix the problem going forward and less so resolve the past. Some clients also have "Forward" features which do not re-upload media, which would cause multiple events to reference the same media.
We could not handle deduplication at the spec level, however this leaves implementations open to issues down the line when we do support deleting/erasing media.
We could also not allow the returned content_uri
to reference another server. The use case for allowing
this specific behaviour is to allow media to be hosted by a dedicated CDN-like service instead of forcing
all traffic through the homeserver.
Some considerations are mentioned in the Potential Issues section.
Though not mentioned in the specification, servers can already lie about the MXC URI being returned, such as always returning a reference to the same image regardless of what was uploaded. This is not solved by this proposal, and generally not perceived as a legitimate threat currently.
No unstable prefixes are required for this MSC.