-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
…d, experimental, etc. just putting it here for safekeeping for now
…ngcache; loads of other fixes
def get_url_cache_txn(txn): | ||
# get the most recently cached result (relative to the given ts) | ||
sql = ( | ||
"SELECT response_code, etag, expires, og, media_id, max(download_ts)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to be doing ORDER BY download_ts DESC LIMIT 1
rather than max(download_ts)
I think you need to run |
Can we make the entire thing optional somehow? We probably can't run it by default anyway given that it needs an IP blacklist. |
# first check the memory cache - good to handle all the clients on this | ||
# HS thundering away to preview the same URL at the same time. | ||
try: | ||
og = self.cache[url] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use cache.get() rather try: except:
…oint. defaults to off. Add url_preview_ip_range_blacklist to let admins specify internal IP ranges that must not be spidered. Add url_preview_url_blacklist to let admins specify URL patterns that must not be spidered. Implement a custom SpiderEndpoint and associated support classes to implement url_preview_ip_range_blacklist Add commentary and generally address PR feedback
incorporate all the PR feedback - @NegativeMjark PTAL |
isLeaf = True | ||
|
||
def __init__(self, hs, filepaths): | ||
if not html: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The not html
probably throws if lxml isn't installed.
@NegativeMjark addressed these too, and now throwing sensible exceptions. PTAL |
"blacklist in url_preview_ip_range_blacklist for url previewing " | ||
"to work" | ||
) | ||
raise RunTimeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its RuntimeError
not RunTimeError
. This sort of typo can be picked up by running flake8 synapse
fwiw.
Other than fixing the typo's and style warnings, it LGTM. I'm slightly concerned by the lack of tests for it though. |
SpiderHttpClient
derived fromSimpleHttpClient
, which follows redirects and handles gzip CTE correctlyget_file
support toSimpleHttpClient
, knowingly duplicated for now from matrixfederationclient.preview_url_resource
to implement the new media/r0/preview_url API. This:lxml
, returning the metadata as a JSON bloblocal_media_repository_url_cache
table to the DB for the on-disk URL cacheget_url_cache
andstore_url_cache
tomedia_repository.py
to wrap the new tableN.B. that following redirects will not work correctly until https://twistedmatrix.com/trac/ticket/8265 is merged. Unsure if it's worth maintaining our own Twisted fork until that happens.
Given I'm hardly a python/twisted expert, review would be particularly appreciated on:
This is part of a set of PRs spanning vector-web, matrix-react-sdk, matrix-js-sdk and synapse.
See also element-hq/element-web#1343 and matrix-org/matrix-react-sdk#260 and matrix-org/matrix-js-sdk#122