Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Media files in url_cache/ and url_cache_thumbnails/ hang around for too long #10863

Closed
squahtx opened this issue Sep 21, 2021 · 7 comments
Closed
Labels
S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@squahtx
Copy link
Contributor

squahtx commented Sep 21, 2021

Description

Spotted by @richvdh:

Media files in url_cache/ and url_cache_thumbnails/ are hanging around for a few days, which is far longer than expected

Version information

  • Homeserver: matrix.org
@DMRobertson DMRobertson added S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. labels Sep 21, 2021
@squahtx
Copy link
Contributor Author

squahtx commented Sep 21, 2021

Some observations:

  1. url_cache/ retains some random empty yyyy-mm-dd/ directories
  2. url_cache/yyyy-mm-dd/ occasionally contains random files older than 2 days
    • A good fraction of these are 0 bytes
    • The non-empty ones can be HTML or non-HTML. It's hard to tell whether they are truncated.
  3. url_cache/yyyy-mm-dd/ contains lots of files from up to 2 days ago
  4. url_cache_thumbnails/ retains all empty yyyy-mm-dd/ directories
  5. url_cache_thumbnails/yyyy-mm-dd/ contains lots of files from up to 2 days ago

@squahtx
Copy link
Contributor Author

squahtx commented Sep 21, 2021

3 and 5 seem to be the expected behaviour:
The code to delete old url_cache files has a 2 day threshold for both url_cache and url_cache_thumbnails:

# Now we delete old images associated with the url cache.
# These may be cached for a bit on the client (i.e., they
# may have a room open with a preview url thing open).
# So we wait a couple of days before deleting, just in case.
expire_before = now - 2 * 24 * ONE_HOUR
media_ids = await self.store.get_url_cache_media_before(expire_before)

url_cache files will be deleted before the 2 day threshold if their expiry time is short enough, but it comes from an oEmbed response:

expires = oembed_result.cache_age or ONE_HOUR

and when inspecting the local_media_repository_url_cache database table on a live instance, about 80% of records expire more than 2 days into the future.

@richvdh
Copy link
Member

richvdh commented Sep 21, 2021

oh, interesting. @clokep is capping the expiry of oembed results in #10814.

@clokep
Copy link
Member

clokep commented Sep 21, 2021

oh, interesting. @clokep is capping the expiry of oembed results in #10814.

The current code in #10814 applies it to all URL previews, not just oEmbed, but this is kind of a lie since (currently) non-oEmbed ones always get an expiry time of 1 hour.

From the above it looks like we don't expire things when that time passes, but we actually expire things that expired 2 days ago. I think what this actually means is that a URL preview result will last at most 3 days (after #10814 is merged).

Hopefully that sounds reasonable?

@squahtx
Copy link
Contributor Author

squahtx commented Sep 27, 2021

#10924 ought to fix point 4 in the list above (url_cache_thumbnails/ retains all empty yyyy-mm-dd/ directories)

@squahtx
Copy link
Contributor Author

squahtx commented Sep 27, 2021

Which leaves 1 and 2:

  1. url_cache/ retains some random empty yyyy-mm-dd/ directories
  2. url_cache/yyyy-mm-dd/ occasionally contains random files older than 2 days

The best explanation I can come up with for these is that sometimes URL preview requests get stuck downloading the URL and outlive the 2 day cache expiry. Then:

  • when the URL download fails, its cache file is removed, but there is no logic to clean up the yyyy-mm-dd/ directory
  • when Synapse is shut down forcefully, cache files from the stuck previews are left around with no database entries

We saw an example of this when restarting matrix.org on 2021-09-22:

2021-09-22 12:45:55,858 - synapse.app._base - 495 - INFO - sentinel - Shutting down...
2021-09-22 12:45:55,945 - synapse.http.server - 88 - INFO - GET-949210 - <XForwardedForRequest at 0x7f698908ecd8 method='GET' uri='/_matrix/media/r0/preview_url?url=<url redacted>' clientproto='HTTP/1.1' site=8124> SynapseError: 502 - Failed to download remote body: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>, <twisted.python.failure.Failure twisted.web.http._DataLoss: >]
2021-09-22 12:45:55,945 - synapse.http.server - 657 - WARNING - GET-949210 - Not sending response to request <XForwardedForRequest at 0x7f698908ecd8 method='GET' uri='/_matrix/media/r0/preview_url?url=<url redacted>' clientproto='HTTP/1.1' site=8124>, already disconnected.
2021-09-22 12:45:55,945 - synapse.access.http.8124 - 410 - INFO - GET-949210 - <redacted> - 8124 - {<redacted>} Processed request: 409262.668sec/-409082.668sec (0.033sec, 0.000sec) (0.002sec/0.007sec/3) 0B 200! "GET /_matrix/media/r0/preview_url?url=<url redacted> HTTP/1.1" "Element/1.1.5 (<redacted>)" [0 dbevts]
2021-09-22 12:45:55,953 - synapse.http.site - 325 - INFO - GET-4202780 - Connection from client lost before response was sent

The 2nd line is twisted aborting the fetch of the URL to be previewed.
The request duration puts its start on 2021-09-17, whose corresponding url_cache/2021-09-17 directory was left undeleted.

However, URL preview requests shouldn't take this long, since there is supposed to be a 60 second timeout when fetching URLs.

@squahtx
Copy link
Contributor Author

squahtx commented Oct 21, 2021

The best explanation I can come up with for these is that sometimes URL preview requests get stuck downloading the URL and outlive the 2 day cache expiry.
...
However, URL preview requests shouldn't take this long, since there is supposed to be a 60 second timeout when fetching URLs.

This has been reported previously in #8302, closing this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

No branches or pull requests

4 participants