Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Refactor MediaRepository to separate out storage #2767

Merged
merged 17 commits into from
Jan 16, 2018

Conversation

erikjohnston
Copy link
Member

The end intention here is:

  1. Have a local FS cache of media (which by default has everything in it)
  2. Allow specifying additional StorageProviders that can be used to fetch media from, and optionally upload to.

In the future we should make these behaviours configurable through config options, as well as keeping track of last access time for local as well as remote media.

We should also look into making the MediaStorage layer aware of the deletion of old content, as currently that is still done by MediaRepository/PreviewUrlResource. Otherwise, only the MediaStorage and StorageProvider classes should care where things are on disk.

@erikjohnston
Copy link
Member Author

This is a long long PR, though a lot of it is just adding new classes. I can split this up into separate PRs, but that would break the backup media repo stuff (as that's not added back in until the last commit)

with media_storage.store_into_file(info) as (f, fname, finish_cb):
# .. write into f ...
yield finish_cb()
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not ecstatic by this API shape TBH, but I'm somewhat keen to ensure that the open file gets cleaned up correctly.

@@ -1,5 +1,6 @@
# -*- coding: utf-8 -*-
# Copyright 2014-2016 OpenMarket Ltd
# Copyright 2018 New Vecotr Ltd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vecotr

@defer.inlineCallbacks
def get_local_media(self, request, media_id, name):
"""Responds to reqests for local media, if exists, or returns 404.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs for param types and return type please

else:
yield self._respond_remote_file(
request, server_name, media_id, name
)

@defer.inlineCallbacks
def _respond_local_file(self, request, media_id, name):
media_info = yield self.store.get_local_media(media_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.store is now redundant in the constructor (as are self.clock and self.version_string while you are there)

else:
file_path = self.filepaths.local_media_filepath(media_id)

yield respond_with_file(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would have been helpful if you could have put the switch from respond_with_file to respond_with_responder in a separate commit to the switch from _respond_local_file to get_local_media, ftr.

logger = logging.getLogger(__name__)


class MediaStorage(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future reference: it would also have been helpful if you could have built this class up over separate commits showing how it is used, rather than presenting the whole class as a fait accomplit that I then need to cross-reference to the commits where it is actually used. For example, in the first commit, add fetch_media and use it in get_local_media. Then add store_file in the next commit.

filesystem_id = media_info["filesystem_id"]
upload_name = name if name else media_info["upload_name"]

file_path = self.filepaths.remote_media_filepath(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filepaths now also redundant in the constructor

self.dynamic_thumbnails = hs.config.dynamic_thumbnails
self.thumbnail_requirements = hs.config.thumbnail_requirements

self.remote_media_linearizer = Linearizer(name="media_remote")

self.recently_accessed_remotes = set()

self.media_storage = MediaStorage(self.primary_base_path, self.filepaths)
# List of StorageProvider's where we should search for media and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageProviders

self.media_storage = MediaStorage(self.primary_base_path, self.filepaths)
# List of StorageProvider's where we should search for media and
# potentially upload to.
self.storage_providers = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a class field rather than a local var?

@@ -32,9 +32,10 @@ class MediaStorage(object):
"""Responsible for storing/fetching files from local sources.
"""

def __init__(self, local_media_directory, filepaths):
def __init__(self, local_media_directory, filepaths, storage_providers):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please doc the param types, especially the new one.

pass


class StorageProviderWrapper(StorageProvider):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel a bit like this could just go into FileStorageProviderBackend, but perhaps you have plans or reasons that won't work

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are options that will probably be used across any future storage providers, so it felt a bit silly to implement the same options across multiple ones (I have a S3 one lurking about).

For example, I can imagine in the future we might end up with config like:

storage_providers:
  - name: backup_fs
    store_thmbnails: false
    type: FileStorageProvider
    params:
       directory: /foo/bar

or something

@richvdh richvdh assigned erikjohnston and unassigned richvdh Jan 11, 2018
@erikjohnston erikjohnston assigned richvdh and unassigned erikjohnston Jan 12, 2018
@erikjohnston
Copy link
Member Author

I've made Responder a context manager, I can't decide whether it makes it nicer or not.

@richvdh
Copy link
Member

richvdh commented Jan 12, 2018

I've made Responder a context manager, I can't decide whether it makes it nicer or not.

Nor can I really. I think it's probably an improvement.

media_type (str): The media/content type.
file_size (int): Size in bytes of the media, if known.
file_size (int): Size in bytes of the media. If not known it should be None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so int|None I guess.

name (str|None): Optional name that, if specified, will be used as
the filename in the Content-Disposition header of the response.

Retruns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retruns

name (str|None): Optional name that, if specified, will be used as
the filename in the Content-Disposition header of the response.

Retruns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retruns


Args:
request(twisted.web.http.Request)
media_id (str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, but please could you doc media_id, here and all the other places it's used? it would help me be less confused about the difference between it and file_id.

server_name, media_id)
raise SynapseError(502, "Failed to fetch remote media")

yield finish()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I realised that belatedly. sorry.

os.remove(fname)
except Exception:
pass
raise e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. In that case please can youdo the fiddling with sys.exc_info (raise exc_info[0], exc_info[1], exc_info[2])?

@richvdh richvdh assigned erikjohnston and unassigned richvdh Jan 12, 2018
@erikjohnston
Copy link
Member Author

I've docced the media_id params, though whether they're useful is another question. I struggled a bit to come up with something better than "media_id is a media ID" :/ Suggestions welcome

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@richvdh richvdh assigned erikjohnston and unassigned richvdh Jan 15, 2018
@erikjohnston
Copy link
Member Author

Thank you :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants