-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Youtube captions (link previews) are useless #9733
Comments
(FTR: This is about link previews) This is not neccecarily a problem with synapse, synapse is doing it's job perfectly by previewing the url as-is fetched, because |
I agree that it's not particularly a bug in Synapse; however the only parties able to resolve this issue are Google and Synapse (or the 3rd party component it's using), and I have my doubts about Google doing anything about it :). IIRC e.g. Slack doesn't have this issue, so it's resolvable; even if with special handling. |
For one plausible solution consider the following session:
|
ohh bother. we had this with twitter (#7643). It looks like we should do the same trick as we did with them (hardcode a mapping to the oembed api):
|
I guess, this will be affecting an increasing number of (less high-profile) sites as well, such as https://www.golem.de (a german news-portal). Hardcoding exceptions for youtube is certainly warranted - but in the long run, it might be nice to be able to specify custom hooks in synapse's configuration, although I'm not sure if that's really worth the effort. |
This shouldn't be too hard, it would also be nice to default to using the documented providers (https://oembed.com/providers.json). |
Oooo, thanks for mentioning that, shouldn't that just be preloaded and used directly when URL previews are enabled? |
It should probably be tried. I don't know if it will regress other previews. 🤷 |
also on Hetzner, experiencing the same issue |
If anyone wants a temporary user sided fix for themselves, I made this tampermonkey script : https://gist.github.com/ItsCinnabar/ebcfe4f6b3ea7d224a8e1ef0783edeb2 Just edit the match url to your site and load it into tampermonkey/greasemonkey/etc |
I found a way how to get it working again, you need to change your user agent to curl synapse/synapse/http/client.py Line 321 in 5a15377
replace to something like this: self.user_agent = "curl/7.59.0"
now youtube previews are working again |
This works for youtube (which is great, thanks!), but it's not a silver bullet as it depends on how the sites handles different user-agents, so a more versatile approach might still be warranted. |
yeah, you are right, but for now I think it suits me personally very well and I havnt encountered any url preview problem by now, I guess to make it youtube.com specific you would need to implement some if check for youtube specific and anything else just makes requests through the matrix user agent |
this also fixes previews for sites like anilist.co that only displayed a "please use a modern browser" error message before editing this. |
Setting the user agent to Unfortunately, having worked on a framework like embed.ly in the past, it is easy to get to 90%, but the last 10% can be really difficult. What we ended up doing was having our own user agent on the first try, but if the returned content was blocked, we tried again with google bot and other crawler user agent (facebook, twitter...). But some website can get really smart, I remember some validating the user agent with TCP TTL (IIRC windows is 128 and linux is 64). I don't know what the best fix would be for synapse. Maybe the user agent could be configurable? Also maybe it could be configurable to use some external API or external command line tool on the home server. In the end, having nice preview inline is crucial to a good user experience, but it is really hard to get right. |
I still think the best fix is to use the oembed api. Changing the useragent is a hack and is always going to be brittle. |
well this was labeled as s-minor, it seems the devs dont give a damn since they are not in the eu with their instances and if nobody gives a damn about implementing this oembed api for youtube there are 2 solutions, the user agent hack or hosting the synapse somewhere where this please sign in to youtube preview does not happen. also I havnt had any trouble with curl as my user agent in synapse, everything works perfectly fine so far |
Well, I don't think this tone is helpful. We are all trying to make things better. Anyway, I agree that the user agent hack is brittle, per my experience it is not really a solution. But I also know it requires a lot of work to generate good previews. OEmbed is part of the solution and should be supported at some point, but having a configurable user agent can be a quick fix that shouldn't harm anything. But the work involved to support OEmbed shouldn't be that big, if we look at https://github.com/webrecorder/oembed.link it is not that huge. |
Maybe it wasn't explicit enough above, but OEmbed is already supported (see #7920). It currently hard-codes Twitter as the only supported service (see synapse/synapse/rest/media/v1/preview_url_resource.py Lines 72 to 86 in 4b965c8
Options to solve this would be:
If someone is interested in working on this I'll gladly help work through any of the above with them, but that is likely a discussion for #synapse-dev:matrix.org. |
I think using the list mentioned in #9733 (comment) is the way to go, and maybe make it use configurable (list URL). So:
seems a good approach |
I just wanted to note that adding @tulir's "UrlPreviewBot" UA workaround fixed both twitter image previews as well as youtube previews for me. 🎉. https://mau.dev/maunium/synapse/-/commit/55d926999cffee893cb4951890a33985beaf70ba |
I'm taking a quick stab at this, by putting the oembed_globs in config, later possibly defaulting the sample config to derive from https://oembed.com/providers.json Edit: so unfortunately this is not quite as trivial, Youtube's oEmbed response is an iframe which we can't send over the e.g {
"title": "The Giant Comes to Life...(POWER LOADER: PART 14)",
"author_name": "Hacksmith Industries",
"author_url": "https://www.youtube.com/c/theHacksmith",
"type": "video",
"height": 113,
"width": 200,
"version": "1.0",
"provider_name": "YouTube",
"provider_url": "https://www.youtube.com/",
"thumbnail_height": 360,
"thumbnail_width": 480,
"thumbnail_url": "https://i.ytimg.com/vi/62tPTgpmT1U/hqdefault.jpg",
"html": "\u003ciframe width=\u0022200\u0022 height=\u0022113\u0022 src=\u0022https://www.youtube.com/embed/62tPTgpmT1U?feature=oembed\u0022 frameborder=\u00220\u0022 allow=\u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\u0022 allowfullscreen\u003e\u003c/iframe\u003e"
} vs Twitter which has no title but sends a blockquote we send over to the client {
"url": "https:\/\/twitter.com\/CroydonCyclists\/status\/1147416388874768389",
"author_name": "Croydon Cycling Campaign",
"author_url": "https:\/\/twitter.com\/CroydonCyclists",
"html": "\u003Cblockquote class=\"twitter-tweet\"\u003E\u003Cp lang=\"en\" dir=\"ltr\"\u003ETurns out that Lime bike will fine you for parking their bikes in parts of central Croydon where cycling is legal and there are parking racks. Beyond stupid. \u003Ca href=\"https:\/\/t.co\/EtDlbUSfog\"\u003Epic.twitter.com\/EtDlbUSfog\u003C\/a\u003E\u003C\/p\u003E— Croydon Cycling Campaign (@CroydonCyclists) \u003Ca href=\"https:\/\/twitter.com\/CroydonCyclists\/status\/1147416388874768389?ref_src=twsrc%5Etfw\"\u003EJuly 6, 2019\u003C\/a\u003E\u003C\/blockquote\u003E\n\u003Cscript async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"\u003E\u003C\/script\u003E\n",
"width": 550,
"height": null,
"type": "rich",
"cache_age": "3153600000",
"provider_name": "Twitter",
"provider_url": "https:\/\/twitter.com",
"version": "1.0"
} Edit2: With some tweaking, I can get some better results out of it, but the code needs a bit of refactoring, all the oEmbed results go through a media/file interface and its not appropriate. |
Discord has some custom behaviour and design for youtube specifically, FYI. it's intended to be invisible, but that kind of special treatment is a bit problematic for element. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I've removed the conspiracy theories, suggestions of workarounds that have already been discussed 5 times, and "me too!" comments. None of these are helpful; please stay on topic. Yes it's annoying, no it's not a conspiracy by the evil Synapse maintainers to make your life worse. We know it's possible to work around the problem by changing the User-agent. Per #9733 (comment): I'd rather not do that as I think it will be brittle. Props to @t3chguy who, rather than complaining about the problem, has started work on a PR to fix it. |
This comment has been minimized.
This comment has been minimized.
As a maintainer it is draining to see users spewing such garbage about something you put so much time into. |
This comment has been minimized.
This comment has been minimized.
I'm going to take further discussion of the oembed implementation to #2752. |
#10714 has made good progress on this by changing the preview API to use a configurable list of oEmbed providers; however youtube previews are still somewhat useless as the default provider list doesn't include an entry for youtube. @clokep are you aware of any reason we shouldn't include an entry for youtube in that file by default? |
oEmbed for YouTube doesn't really give a good response right now, in the image below the first preview is made without using oEmbed (but I'm in the US so I get a "real" description), while the second one is made with oEmbed: I think the tweaks in #10392 were meant to make this preview better. |
oh I see. So really we need to land the remaining tweaks in #10392 before we can make more progress here? |
Yeah, pretty much. I'm not super thrilled with the flow right now of how we do previews when using oEmbed, but that's rather tough to crack apart. It could really use some documentation on where caches are and such. I Think the gist is that we need to pull more info out of the oEmbed response though, e.g. the Here's what we get from oEmbed: {
"author_name" : "Rick Astley",
"author_url" : "https://www.youtube.com/c/RickastleyCoUkOfficial",
"height" : 113,
"html" : "<iframe width=\"200\" height=\"113\" src=\"https://www.youtube.com/embed/dQw4w9WgXcQ?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen></iframe>",
"provider_name" : "YouTube",
"provider_url" : "https://www.youtube.com/",
"thumbnail_height" : 360,
"thumbnail_url" : "https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg",
"thumbnail_width" : 480,
"title" : "Rick Astley - Never Gonna Give You Up (Official Music Video)",
"type" : "video",
"version" : "1.0",
"width" : 200
} What we get from Synapse (when configured to use oEmbed for YouTube): {
"matrix:image:size" : 18498,
"og:description" : null,
"og:image" : "mxc://localhost:8480/2021-09-01_AfteoaZUTZOUJfoa",
"og:image:height" : 360,
"og:image:type" : "image/jpeg",
"og:image:width" : 480
} This is really only pulling the For reference, this compares to what we get without using oEmbed: {
"matrix:image:size" : 65665,
"og:description" : "Rick Astley's official music video for “Never Gonna Give You Up” Subscribe to the official Rick Astley YouTube channel: https://RickAstley.lnk.to/YTSubIDFoll...",
"og:image" : "mxc://localhost:8480/2021-09-01_QwaVetzmVlEviNmK",
"og:image:height" : 720,
"og:image:type" : "image/jpeg",
"og:image:width" : 1280,
"og:site_name" : "YouTube",
"og:title" : "Rick Astley - Never Gonna Give You Up (Official Music Video)",
"og:type" : "video.other",
"og:url" : "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"og:video:height" : "720",
"og:video:secure_url" : "https://www.youtube.com/embed/dQw4w9WgXcQ",
"og:video:tag" : "rick astley never gonna give you up lyrics",
"og:video:type" : "text/html",
"og:video:url" : "https://www.youtube.com/embed/dQw4w9WgXcQ",
"og:video:width" : "1280"
} |
I put up #10819 which should help with this, but it doesn't give quite as good of a preview as the current HTML parsing. I've been unable to reproduce the blank / no preview for YouTube from US, UK, or France based servers. Are people still seeing issues with this? |
I get URL previews for YouTube now. I think YouTube rolled out a change where they don't auto-redirect to consent.youtube.com anymore. I remember that some weeks ago the redirect happened on and off for me, which looked to me like an A/B test on their part. Maybe it's fully rolled out yet? |
Same here, started working from Germany without updating synapse. |
Description
At some point Youtube has updated the site and now all (?) captions generated by Synapse for the site are:
Before you continue to YouTube
Sign in a Google company Before you continue to YouTube Google uses cookies and data to: Deliver and maintain services, like tracking outages and protecting against spam, fraud, and abuse Measure audience engagement and site statistics to understand how our services are used
This is basically useless considering the primary point of the function, in particular in the case of a very popular website.
Steps to reproduce
m.room.message
into a room, e.g. https://www.youtube.com/watch?v=RzJf02TIqxkExpected results:
youtube-dl --get-description
:Authentic recordings from inside Hetzner Online's data center park
Just like birds and insects, each server sings its own unique song.
Version information
The text was updated successfully, but these errors were encountered: