Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Allow sharding of media repository #7447

Closed
erikjohnston opened this issue May 7, 2020 · 5 comments
Closed

Allow sharding of media repository #7447

erikjohnston opened this issue May 7, 2020 · 5 comments
Assignees
Labels
A-Performance Performance, both client-facing and admin-facing

Comments

@erikjohnston
Copy link
Member

Looks like the media repository on matrix.org has high CPU usage, so we should investigate how we can scale it out a bit.

The main sources of CPU usage seem to be:

  • S3 storage provider
  • URL previews (I think due to SSL overhead maybe?)
@babolivier babolivier added the A-Performance Performance, both client-facing and admin-facing label May 7, 2020
@ara4n
Copy link
Member

ara4n commented May 15, 2020

URL previews (I think due to SSL overhead maybe?)

impossible - URL preview code is perfect in all ways!

@clokep
Copy link
Member

clokep commented May 19, 2020

URL previews (I think due to SSL overhead maybe?)

I think these already share SSL contexts due to #7094?

@erikjohnston
Copy link
Member Author

URL previews (I think due to SSL overhead maybe?)

I think these already share SSL contexts due to #7094?

Probably, but I think that just reduces overhead from INSANE to normal levels, maybe

@clokep
Copy link
Member

clokep commented May 22, 2020

@erikjohnston So what's the thought here? How would one get started with this? Was there particular hot spots noticed while profiling or is profiling the first step?

@erikjohnston
Copy link
Member Author

I think next steps are to look at the media repository code to see what looks easy to split out. Things to look out for:

  1. Any background looping calls that need to only happen on one worker
  2. DB functions that assume they're the only writer
  3. Any locks or ratelimiting
  4. Any cached database functions are correctly invalidated.

I don't think there is much of any of those going on in the media repository, so depending on what you find it might be quite easy to make it possible to e.g. run multiple media repositories for downloads or URL previews or what not. (It may not even requires any code changes, depending)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Performance Performance, both client-facing and admin-facing
Projects
None yet
Development

No branches or pull requests

4 participants