Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Update the media repository documentation #11415

Merged
merged 3 commits into from
Nov 29, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/11415.doc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Update the media repository documentation.
87 changes: 68 additions & 19 deletions docs/media_repository.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,78 @@

*Synapse implementation-specific details for the media repository*

The media repository is where attachments and avatar photos are stored.
It stores attachment content and thumbnails for media uploaded by local users.
It caches attachment content and thumbnails for media uploaded by remote users.
The media repository
* stores avatars, attachments and their thumbnails for media uploaded by local
users.
* caches avatars, attachments and their thumbnails for media uploaded by remote
users.
* caches resources and thumbnails used for URL previews.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth cross-referencing this to the URL preview doc I wrote: https://github.com/matrix-org/synapse/blob/develop/docs/development/url_previews.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good shout, let's do that.


## Storage
All media in Matrix can be identified by a unique
[MXC URI](https://spec.matrix.org/latest/client-server-api/#matrix-content-mxc-uris),
consisting of a server name and media ID:
```
mxc://<server-name>/<media-id>
```

Each item of media is assigned a `media_id` when it is uploaded.
The `media_id` is a randomly chosen, URL safe 24 character string.
## Local Media
Synapse generates 24 character media IDs for content uploaded by local users.
These media IDs consist of upper and lowercase letters and are case-sensitive.
Other homeserver implementations may generate media IDs differently.

Metadata such as the MIME type, upload time and length are stored in the
sqlite3 database indexed by `media_id`.
Local media is recorded in the `local_media_repository` table, which includes
metadata such as MIME types, upload times and file sizes.
Note that this table is shared by the URL cache, which has a different media ID
scheme.

Content is stored on the filesystem under a `"local_content"` directory.
### Paths
A file with media ID `aabbcccccccccccccccccccc` and its `128x96` `image/jpeg`
thumbnail, created by scaling, would be stored at:
```
local_content/aa/bb/cccccccccccccccccccc
local_thumbnails/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale
```
Older thumbnails may omit the thumbnailing method:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply it is a particular method or just unknown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judging by #7124 it could be either method. But it only applies for remote thumbnails.

```
local_thumbnails/aa/bb/cccccccccccccccccccc/128-96-image-jpeg
```

Thumbnails are stored under a `"local_thumbnails"` directory.
## Remote Media
When media from a remote homeserver is requested from Synapse, it is assigned
a local `filesystem_id`, with the same format as locally-generated media IDs,
as described above.

The item with `media_id` `"aabbccccccccdddddddddddd"` is stored under
`"local_content/aa/bb/ccccccccdddddddddddd"`. Its thumbnail with width
`128` and height `96` and type `"image/jpeg"` is stored under
`"local_thumbnails/aa/bb/ccccccccdddddddddddd/128-96-image-jpeg"`
A record of remote media is stored in the `remote_media_cache` table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this map the MXC URL to a local media ID essentially?

Copy link
Contributor Author

@squahtx squahtx Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, plus store almost the same metadata as local_media_repository.


Remote content is cached under `"remote_content"` directory. Each item of
remote content is assigned a local `"filesystem_id"` to ensure that the
directory structure `"remote_content/server_name/aa/bb/ccccccccdddddddddddd"`
is appropriate. Thumbnails for remote content are stored under
`"remote_thumbnail/server_name/..."`
### Paths
A file from `matrix.org` with `filesystem_id` `aabbcccccccccccccccccccc` and its
`128x96` `image/jpeg` thumbnail, created by scaling, would be stored at:
```
remote_content/matrix.org/aa/bb/cccccccccccccccccccc
remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale
```
Older thumbnails may omit the thumbnailing method:
```
remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg
```

Note that `remote_thumbnail/` does not have an `s`.

## URL Previews
When generating previews for URLs, Synapse may download and cache various
resources, including images. These resources are assigned temporary media IDs
of the form `yyyy-mm-dd_aaaaaaaaaaaaaaaa`, where `yyyy-mm-dd` is the current
date and `aaaaaaaaaaaaaaaa` is a random sequence of 16 case-sensitive letters.

The metadata for these cached resources is stored in the
`local_media_repository` and `local_media_repository_url_cache` tables.

Resources for URL previews are deleted after a few days.

### Paths
The file with media ID `yyyy-mm-dd_aaaaaaaaaaaaaaaa` and its `128x96`
`image/jpeg` thumbnail, created by scaling, would be stored at:
```
url_cache/yyyy-mm-dd/aaaaaaaaaaaaaaaa
url_cache_thumbnails/yyyy-mm-dd/aaaaaaaaaaaaaaaa/128-96-image-jpeg-scale
```