Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Youtube previews no longer available with v1.111.0 #17462

Open
salixor opened this issue Jul 21, 2024 · 33 comments
Open

Youtube previews no longer available with v1.111.0 #17462

salixor opened this issue Jul 21, 2024 · 33 comments

Comments

@salixor
Copy link

salixor commented Jul 21, 2024

Description

With v1.111.0 installed, Youtube previews for videos are no longer fetched, and simply show :

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

Previews were correctly fetched before this version.

Shorts are unaffected and still properly display the preview.

Steps to reproduce

  • Post a Youtube video in any channel
  • Preview isn't displayed and we get "Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube."

Homeserver

baguette.party

Synapse Version

1.111.0

Installation Method

Debian packages from packages.matrix.org

Database

Single PostgreSQL server

Workers

Single process

Platform

Ubuntu 22.04

Configuration

None.

Relevant log output

-

Anything else that would be useful to know?

Seems to work perfectly fine for people on v1.110.0, and it worked for me on v1.110.0 as well.

@devonh
Copy link
Member

devonh commented Jul 22, 2024

Would you be able to provide more details regarding in what situation this problem occurs?
I've tried to reproduce locally and the url previews for youtube links are always successfully shown.

Specifically:

  • what client are you using? (I've been testing using element web)
  • what links to youtube are failing? (I've tried both the full video link & the "sharing" link)
  • does the problem happen with existing rooms, new rooms, public rooms, etc.
  • is the problem the same whether someone else (on same homeserver or over federation) sends the link, or you send the link?
  • any log lines related to http requets to youtube.com or /_matrix/media/v3/preview_url, etc.

@salixor
Copy link
Author

salixor commented Jul 22, 2024

It happens with any Youtube videos (other than shorts), be it "youtu.be" or full "youtube.com" URLs.

Tested clients are (always with the latest version) :

  • Element Web
  • Element Desktop
  • Element Android
  • Schildichat Desktop
  • Schildichat Android

The problem happens with any rooms, public, private, existing, or new rooms.

It happens with links sent from me, sent from other people on my homeserver, or over federation. Other members on my homeserver report the same issue of not getting any previews, with the same situations (ie. every link).

Here are some logs relating to youtube.com (and the preview_url endpoint) :

2024-07-22 13:57:15,359 - synapse.access.http.8008 - 473 - INFO - GET-342456 - <IP> - 8008 - {<HOMESERVER_USER>} Processed request: 0.807sec/-0.000sec (0.090sec, 0.010sec) (0.013sec/0.186sec/7) 563B 200 "GET /_matrix/media/r0/preview_url?url=https%3A%2F%2Fwww.youtube.com%2Flive%2F<YT_LIVE_ID>&ts=<TIMESTAMP> HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) SchildiChat/1.11.36-sc.3 Chrome/114.0.5735.134 Electron/25.2.0 Safari/537.36" [0 dbevts]
2024-07-22 13:57:15,463 - synapse.http.client - 428 - INFO - GET-342459 - Received response to GET https://www.youtube.com/s/desktop/060ac52e/img/favicon.ico: 200
2024-07-22 13:57:15,497 - synapse.media.url_previewer - 689 - WARNING - GET-342459 - Couldn't get dims for https://www.youtube.com/s/desktop/060ac52e/img/favicon.ico
2024-07-22 15:52:08,661 - synapse.http.client - 428 - INFO - GET-360654 - Received response to GET https://www.youtube.com/watch?v=<YT_VIDEO_ID>: 200
2024-07-22 15:52:08,699 - synapse.http.client - 428 - INFO - GET-360657 - Received response to GET https://www.youtube.com/watch?v=<YT_VIDEO_ID>: 200
2024-07-22 15:52:09,492 - synapse.http.client - 428 - INFO - GET-360657 - Received response to GET https://www.youtube.com/s/desktop/060ac52e/img/favicon.ico: 200
2024-07-22 15:52:09,569 - synapse.media.url_previewer - 689 - WARNING - GET-360657 - Couldn't get dims for https://www.youtube.com/s/desktop/060ac52e/img/favicon.ico
2024-07-22 15:52:09,579 - synapse.http.client - 428 - INFO - GET-360654 - Received response to GET https://www.youtube.com/s/desktop/060ac52e/img/favicon.ico: 200
2024-07-22 15:52:09,638 - synapse.access.http.8008 - 473 - INFO - GET-360657 - <IP> - 8008 - {<HOMESERVER_USER>} Processed request: 1.572sec/-0.000sec (0.093sec, 0.001sec) (0.007sec/0.142sec/4) 288B 200 "GET /_matrix/media/r0/preview_url?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%<YT_VIDEO_ID>&ts=<TIMESTAMP> HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) SchildiChat/1.11.36-sc.3 Chrome/114.0.5735.134 Electron/25.2.0 Safari/537.36" [0 dbevts]
2024-07-22 15:52:09,669 - synapse.media.url_previewer - 689 - WARNING - GET-360654 - Couldn't get dims for https://www.youtube.com/s/desktop/060ac52e/img/favicon.ico
2024-07-22 15:52:09,733 - synapse.access.http.8008 - 473 - INFO - GET-360654 - <IP> - 8008 - {<HOMESERVER_USER>} Processed request: 1.747sec/-0.000sec (0.091sec, 0.010sec) (0.022sec/0.200sec/4) 288B 200 "GET /_matrix/media/r0/preview_url?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%<YT_VIDEO_ID>&ts=<TIMESTAMP> HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) SchildiChat/1.11.36-sc.3 Chrome/114.0.5735.134 Electron/25.2.0 Safari/537.36" [0 dbevts]
2024-07-22 16:33:01,443 - synapse.http.client - 428 - INFO - GET-366864 - Received response to GET https://www.youtube.com/s/desktop/060ac52e/img/favicon.ico: 200
2024-07-22 16:33:01,552 - synapse.media.url_previewer - 689 - WARNING - GET-366864 - Couldn't get dims for https://www.youtube.com/s/desktop/060ac52e/img/favicon.ico
2024-07-22 16:33:01,663 - synapse.http.client - 428 - INFO - GET-366866 - Received response to GET https://www.youtube.com/s/desktop/060ac52e/img/favicon.ico: 200
2024-07-22 16:33:01,709 - synapse.media.url_previewer - 689 - WARNING - GET-366866 - Couldn't get dims for https://www.youtube.com/s/desktop/060ac52e/img/favicon.ico
2024-07-22 16:33:02,315 - synapse.http.client - 428 - INFO - GET-366870 - Received response to GET https://www.youtube.com/s/desktop/060ac52e/img/favicon.ico: 200
2024-07-22 16:33:02,365 - synapse.media.url_previewer - 689 - WARNING - GET-366870 - Couldn't get dims for https://www.youtube.com/s/desktop/060ac52e/img/favicon.ico
2024-07-22 17:38:19,872 - synapse.http.client - 428 - INFO - GET-380556 - Received response to GET https://www.youtube.com/watch?v=<YT_VIDEO_ID>: 200
2024-07-22 17:38:20,824 - synapse.http.client - 428 - INFO - GET-380556 - Received response to GET https://www.youtube.com/s/desktop/10afb17a/img/favicon.ico: 200
2024-07-22 17:38:20,903 - synapse.media.url_previewer - 689 - WARNING - GET-380556 - Couldn't get dims for https://www.youtube.com/s/desktop/10afb17a/img/favicon.ico
2024-07-22 17:38:21,037 - synapse.access.http.8008 - 473 - INFO - GET-380556 - <IP> - 8008 - {<HOMESERVER_USER>} Processed request: 2.854sec/-0.000sec (0.090sec, 0.004sec) (0.022sec/0.228sec/4) 288B 200 "GET /_matrix/media/v3/preview_url?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%<YT_VIDEO_ID>&ts=<TIMESTAMP> HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.11.69 Chrome/124.0.6367.243 Electron/30.3.0 Safari/537.36" [0 dbevts]
2024-07-22 17:38:21,294 - synapse.http.client - 428 - INFO - GET-380554 - Received response to GET https://www.youtube.com/s/desktop/10afb17a/img/favicon.ico: 200
2024-07-22 17:38:21,378 - synapse.media.url_previewer - 689 - WARNING - GET-380554 - Couldn't get dims for https://www.youtube.com/s/desktop/10afb17a/img/favicon.ico
2024-07-22 17:44:21,815 - synapse.http.client - 428 - INFO - GET-381639 - Received response to GET https://www.youtube.com/watch?v=<YT_VIDEO_ID>: 200
2024-07-22 17:44:22,386 - synapse.http.client - 428 - INFO - GET-381639 - Received response to GET https://www.youtube.com/s/desktop/10afb17a/img/favicon.ico: 200
2024-07-22 17:44:22,415 - synapse.media.url_previewer - 689 - WARNING - GET-381639 - Couldn't get dims for https://www.youtube.com/s/desktop/10afb17a/img/favicon.ico
2024-07-22 17:44:22,466 - synapse.access.http.8008 - 473 - INFO - GET-381639 - <IP> - 8008 - {<HOMESERVER_USER>} Processed request: 0.891sec/-0.000sec (0.073sec, 0.003sec) (0.006sec/0.104sec/4) 288B 200 "GET /_matrix/media/v3/preview_url?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%<YT_VIDEO_ID>&ts=<TIMESTAMP> HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.11.69 Chrome/124.0.6367.243 Electron/30.3.0 Safari/537.36" [0 dbevts]
2024-07-22 17:44:46,902 - synapse.http.client - 428 - INFO - GET-381708 - Received response to GET https://www.youtube.com/watch?v=-<YT_VIDEO_ID>: 200
2024-07-22 17:44:47,537 - synapse.http.client - 428 - INFO - GET-381708 - Received response to GET https://www.youtube.com/s/desktop/10afb17a/img/favicon.ico: 200
2024-07-22 17:44:47,557 - synapse.media.url_previewer - 689 - WARNING - GET-381708 - Couldn't get dims for https://www.youtube.com/s/desktop/10afb17a/img/favicon.ico
2024-07-22 17:44:47,577 - synapse.access.http.8008 - 473 - INFO - GET-381708 - <IP> - 8008 - {<HOMESERVER_USER>} Processed request: 0.756sec/-0.000sec (0.081sec, 0.001sec) (0.005sec/0.060sec/4) 288B 200 "GET /_matrix/media/v3/preview_url?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%<YT_VIDEO_ID>&ts=<TIMESTAMP> HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.11.69 Chrome/124.0.6367.243 Electron/30.3.0 Safari/537.36" [0 dbevts]
2024-07-22 17:47:55,892 - synapse.http.client - 428 - INFO - GET-382146 - Received response to GET https://www.youtube.com/watch?v=<YT_VIDEO_ID>: 200
2024-07-22 17:47:56,631 - synapse.http.client - 428 - INFO - GET-382146 - Received response to GET https://www.youtube.com/s/desktop/10afb17a/img/favicon.ico: 200
2024-07-22 17:47:56,740 - synapse.media.url_previewer - 689 - WARNING - GET-382146 - Couldn't get dims for https://www.youtube.com/s/desktop/10afb17a/img/favicon.ico
2024-07-22 17:47:56,808 - synapse.access.http.8008 - 473 - INFO - GET-382146 - <IP> - 8008 - {<HOMESERVER_USER>} Processed request: 1.238sec/-0.000sec (0.094sec, 0.013sec) (0.019sec/0.163sec/4) 288B 200 "GET /_matrix/media/v3/preview_url?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%<YT_VIDEO_ID>&ts=<TIMESTAMP> HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.11.69 Chrome/124.0.6367.243 Electron/30.3.0 Safari/537.36" [0 dbevts]
2024-07-22 17:56:43,565 - synapse.access.http.8008 - 473 - INFO - GET-383368 - <IP> - 8008 - {<HOMESERVER_USER>} Processed request: 0.005sec/-0.000sec (0.004sec, 0.000sec) (0.000sec/0.000sec/0) 288B 200 "GET /_matrix/media/r0/preview_url?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%<YT_VIDEO_ID>&ts=<TIMESTAMP> HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) SchildiChat/1.11.36-sc.3 Chrome/114.0.5735.134 Electron/25.2.0 Safari/537.36" [0 dbevts]
2024-07-22 17:56:43,575 - synapse.access.http.8008 - 473 - INFO - GET-383369 - <IP> - 8008 - {<HOMESERVER_USER>} Processed request: 0.004sec/-0.000sec (0.004sec, 0.000sec) (0.000sec/0.000sec/0) 288B 200 "GET /_matrix/media/r0/preview_url?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%<YT_VIDEO_ID>&ts=<TIMESTAMP> HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) SchildiChat/1.11.36-sc.3 Chrome/114.0.5735.134 Electron/25.2.0 Safari/537.36" [0 dbevts]
2024-07-22 17:56:43,585 - synapse.access.http.8008 - 473 - INFO - GET-383370 - <IP> - 8008 - {<HOMESERVER_USER>} Processed request: 0.004sec/-0.000sec (0.004sec, 0.000sec) (0.000sec/0.000sec/0) 288B 200 "GET /_matrix/media/r0/preview_url?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%<YT_VIDEO_ID>&ts=<TIMESTAMP> HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) SchildiChat/1.11.36-sc.3 Chrome/114.0.5735.134 Electron/25.2.0 Safari/537.36" [0 dbevts]
2024-07-22 17:56:43,593 - synapse.access.http.8008 - 473 - INFO - GET-383371 - <IP> - 8008 - {<HOMESERVER_USER>} Processed request: 0.004sec/-0.000sec (0.004sec, 0.000sec) (0.000sec/0.000sec/0) 288B 200 "GET /_matrix/media/r0/preview_url?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%<YT_VIDEO_ID>&ts=<TIMESTAMP> HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) SchildiChat/1.11.36-sc.3 Chrome/114.0.5735.134 Electron/25.2.0 Safari/537.36" [0 dbevts]
2024-07-22 17:58:23,130 - synapse.http.client - 428 - INFO - GET-383716 - Received response to GET https://www.youtube.com/live/<LIVE_ID>?si=tlVOXnR5q-YnsOmD: 200
2024-07-22 17:58:24,782 - synapse.access.http.8008 - 473 - INFO - GET-383716 - <IP> - 8008 - {<HOMESERVER_USER>} Processed request: 5.330sec/-0.000sec (0.124sec, 0.003sec) (0.020sec/0.839sec/7) 563B 200 "GET /_matrix/media/v3/preview_url?url=https%3A%2F%2Fwww.youtube.com%2Flive%2F<YT_LIVE_ID>&ts=<TIMESTAMP> HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Element/1.11.69 Chrome/124.0.6367.243 Electron/30.3.0 Safari/537.36" [0 dbevts]

@clokep
Copy link
Contributor

clokep commented Jul 22, 2024

The location of the homeserver can affect what Google will show to non-logged in users. At some point we saw different responses between Europe/UK/USA.

@salixor
Copy link
Author

salixor commented Jul 22, 2024

The server is located in the EU, however we did get previews before. We did not get a non-logged in preview.

@devonh
Copy link
Member

devonh commented Jul 22, 2024

Hmm, very strange.
The url_previewer code hasn't changed between 1.110.0 & 1.111.0. In fact it hasn't really changed over the past year.
The only related changes here are around stabilizing the url_preview endpoints in the client api. Since you are testing with a wide range of clients, all with latest versions, I doubt the endpoint stabilizations would be causing the issue. Especially since other setups don't see this problem.

When I use app.element.io on my matrix.org account, I am able to successfully see URL previews to youtube links.

@daemontron
Copy link

@devonh For the past month Youtube previews stopped working for me as well, except on synapse:v1.103.0 which is versions before 1.111.0. Years prior Youtube previews have worked fine across multiple versions/upgrades. No changes at all had been done, it just suddenly stopped working one day.

I don't think this is version specific? Perhaps homeserver or config specific condition being hit or some kind of google/youtube side changes?

I tried upgrading to synapse:v1.112.0 and Youtube URL Previews still do not work, all other URL Previews do work.

I am unsure where or how to look at the logging @salixor is showing above, or what relevant logging I could provide to help track down the issue or compare what I am running into. If someone could point me to where best to look, please let me know!

Thanks

@salixor
Copy link
Author

salixor commented Aug 6, 2024

Logs are usually found at /var/log/matrix-synapse/homeserver.log.

"Glad" to see I'm not the only one! The breakage happened around the time we upgraded to v1.111.0, but it could also have been coincidental. I've had no issues with any other previews so I'm thinking something broke with Youtube, but logs don't show anything out of the usual...

@daemontron
Copy link

daemontron commented Aug 6, 2024

Thanks @salixor I am using https://github.com/spantaleev/matrix-docker-ansible-deploy, so instead of looking for /var/log/matrix-synapse/homeserver.log, I instead had to perform a journalctl -u matrix-synapse -e

I saw some similar things to you:

Aug 06 17:05:19 syn matrix-synapse[21224]: 2024-08-06 17:05:19,647 - synapse.media.url_previewer - 672 - WARNING - GET-76369 - Pre-caching image failed during URL preview: https://statcounter.com/gs_snapshots/os_combined-07-2024-desktop-00.png errored with 502: Requested file's content type not allowed for this operatio>
Aug 06 17:50:37 syn matrix-synapse[21224]: 2024-08-06 17:50:37,224 - synapse.media.url_previewer - 689 - WARNING - GET-82165 - Couldn't get dims for https://www.youtube.com/s/desktop/bf8c00d7/img/favicon.ico
Aug 06 17:52:02 syn matrix-synapse[21224]: 2024-08-06 17:52:02,891 - synapse.media.url_previewer - 689 - WARNING - GET-82279 - Couldn't get dims for https://www.youtube.com/s/desktop/bf8c00d7/img/favicon.ico

@daemontron
Copy link

When did you upgrade to v1.111.0 @salixor? , Looking at logging, it seems my first instances of "Couldn't get dims for" started for me on May 23 2024. With no prior logging mentioning synapse.media.url_previewer before that.

@daemontron
Copy link

Also, just like @salixor, YouTube Shorts work for me, just not regular videos.

@devonh
Copy link
Member

devonh commented Aug 6, 2024

@daemontron It looks like this log line is cut off, what is the rest of the line? (it should print out the content type it read)
Aug 06 17:05:19 syn matrix-synapse[21224]: 2024-08-06 17:05:19,647 - synapse.media.url_previewer - 672 - WARNING - GET-76369 - Pre-caching image failed during URL preview: https://statcounter.com/gs_snapshots/os_combined-07-2024-desktop-00.png errored with 502: Requested file's content type not allowed for this operatio>

(Looked at it a little more, not sure that line is relevant to youtube links - maybe it's something about viewing youtube from different regions leading to different redirects or somesuch thing?)

@devonh
Copy link
Member

devonh commented Aug 6, 2024

@daemontron Is your server also based in the EU?

@salixor
Copy link
Author

salixor commented Aug 6, 2024

When did you upgrade to v1.111.0 @salixor? , Looking at logging, it seems my first instances of "Couldn't get dims for" started for me on May 23 2024. With no prior logging mentioning synapse.media.url_previewer before that.

The upgrade was a few days after its release, on July 18th.

@clokep
Copy link
Contributor

clokep commented Aug 6, 2024

Note that previews should mostly work even if downloading the image fails. (This used to break but was fixed at some point...)

@daemontron
Copy link

daemontron commented Aug 7, 2024

@daemontron Is your server also based in the EU?

@devonh US Based

It looks like this log line is cut off, what is the rest of the line? (it should print out the content type it read)

Aug 06 17:05:19 syn matrix-synapse[21224]: 2024-08-06 17:05:19,647 - synapse.media.url_previewer - 672 - WARNING - GET-76369 - Pre-caching image failed during URL preview: https://statcounter.com/gs_snapshots/os_combined-07-2024-desktop-00.png errored with 502: Requested file's content type not allowed for this operation: application/download

@daemontron
Copy link

daemontron commented Aug 7, 2024

@devonh @salixor I don't think this is a Matrix / Synapse issue, this seems to be regional, or IP Range based like said previously, some kind of slowly rolling out changes or something? Both my server and my home connection are US Based. Maybe It's targeted at popular hosting providers?

The server hits the issue below, my own connection at home does not:

From the server matrix/synapse is on:

YouTube</title><meta name="title" content=""><meta name="description" content="Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.">

"reelTitleText":{"runs":[{"text":"Northern Lights Seen From the International Space Station"

From a linux box on my home connection:

  • Regular YouTube Video:
    curl -s https://www.youtube.com/watch?v=VuIcT-u2Dmo | grep '<meta name="description"'
    YouTube</title><meta name="title" content="Pure Moods Hit Collection Commercial"><meta name="description" content="June 17, 1999Thursday, June 17, 1999 6/17/1999Pure Moods Hit Collection Commercial">

@salixor
Copy link
Author

salixor commented Aug 7, 2024

I am indeed getting the same results. I'll admit since it happened exactly when we upgraded versions, I didn't even consider it could just be something coincidental not related to the upgrade, as nothing else I used got those broken previews.

Well, I guess thanks Youtube for nothing, as usual.
We probably can close this issue, as I don't believe anything can be done on Synapse's side!

@jonas-w
Copy link

jonas-w commented Aug 22, 2024

Seems like youtube does not ship opengraph data when you are greeted with the "Login to verify that you are not a Bot".

But this can be fixed by using OEmbed, Synapse does not ship a default OEmbed provider config contrary to what it states in the Documentation.

So I just added the following to my homeserver config:

oembed:
  disable_default_providers: true
  additional_providers:
    - providers.json

I used this file: https://oembed.com/providers.json but it needed to be ""fixed"" because the oembed provider parser that exists in Synapse does not parse this file correctly, even though that is the official oembed provider file, you'll need to remove all entries that don't have "schemes" set in the entrypoints and remove the spotify entry, because Synapse doesn't like the "spotify:*" glob.

Or if you just care about youtube you'll only need the following in your providers.json:

[
  {
    "provider_name": "YouTube",
    "provider_url": "https://www.youtube.com/",
    "endpoints": [
      {
        "schemes": [
          "https://*.youtube.com/watch*",
          "https://*.youtube.com/v/*",
          "https://youtu.be/*",
          "https://*.youtube.com/playlist?list=*",
          "https://youtube.com/playlist?list=*",
          "https://*.youtube.com/shorts*",
          "https://youtube.com/shorts*",
          "https://*.youtube.com/embed/*"
        ],
        "url": "https://www.youtube.com/oembed",
        "discovery": true
      }
    ]
  }
]

Related issue: #9877

@clokep
Copy link
Contributor

clokep commented Aug 22, 2024

But this can be fixed by using OEmbed, Synapse does not ship a default OEmbed provider config contrary to what it states in the Documentation.

It does, but it includes limited data: https://github.com/element-hq/synapse/blob/develop/synapse/res/providers.json

because the oembed provider parser that exists in Synapse does not parse this file correctly, even though that is the official oembed provider file, you'll need to remove all entries that don't have "schemes" set in the entrypoints and remove the spotify entry, because Synapse doesn't like the "spotify:*" glob.

Please file a separate bug for this. (I would also ask that you use a less dismissive tone -- it isn't like the parser was designed purposefully to not parse this file. It was tested against it when it was originally developed, but the data might have changed or the code might have changed.)

@heftig
Copy link

heftig commented Aug 22, 2024

Before this was fixed (for a while) I used to work around this by hacking the HTTP agent for preview requests to be curl/7.84.0 (or whatever was current). That got the YouTube previews working.

@jonas-w
Copy link

jonas-w commented Aug 22, 2024

@clokep ah okay, I tried to find it on the docker image but only searched for oembed and not providers.json!

Will do!

@jonas-w
Copy link

jonas-w commented Aug 22, 2024

@heftig I tried to debug this via curl and it seems to be IP based. Using curl youtube_link on my machine at home did contain the opengraph meta tags, but using the same command on my server where Synapse is hosted it returned nearly the same page, but without the opengraph data and no oembed discovery link in the meta tags.

@clokep
Copy link
Contributor

clokep commented Aug 23, 2024

I'm not suggesting we shouldn't add YouTube to it, but I believe we didn't initially because it gave worse results if you can properly load the URL. Ideally we could combine the information from both, if available. I don't think that ever got implemented.

@jonas-w
Copy link

jonas-w commented Aug 24, 2024

@clokep might be worth to mention that the https://oembed.com site also states this:

There are currently 319 providers in the registry. Providers and consumers are strongly encouraged to use the discovery mechanism, rather than the registry.

So the registry isn't really recommended, but the problem is if YouTube blocks you as a "bot", you can't even discover the oembed URL.

@clokep
Copy link
Contributor

clokep commented Aug 26, 2024

#3440 is also a bit related.

Just to double check -- does YouTube send the oEmbed discovery via header? I don't think we support that yet.

@jonas-w
Copy link

jonas-w commented Aug 26, 2024

@clokep yes it does, but as I explained in my last comment, if you get blocked as a bot you don't see the discovery URL, so it wouldn't be useful in this case.

@clokep
Copy link
Contributor

clokep commented Aug 26, 2024

Ah I thought you were specifically referring to the HTML discovery URL. My misunderstanding.

@daemontron
Copy link

@clokep @devonh
I just found something hopeful, If I do the following curl in an attempt to manually test the oEmbed Endpoint, it actually returns proper data vs the generic "Youtube: Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube"
curl "https://www.youtube.com/oembed?url=https://www.youtube.com/watch?v=VuIcT-u2Dmo&format=json"

{"title":"Pure Moods Hit Collection Commercial","author_name":"jj vowers","author_url":"https://www.youtube.com/@jjvowers","type":"video","height":150,"width":200,"version":"1.0","provider_name":"YouTube","provider_url":"https://www.youtube.com/","thumbnail_height":360,"thumbnail_width":480,"thumbnail_url":"https://i.ytimg.com/vi/VuIcT-u2Dmo/hqdefault.jpg","html":"\u003ciframe width=\u0022200\u0022 height=\u0022150\u0022 src=\u0022https://www.youtube.com/embed/VuIcT-u2Dmo?feature=oembed\u0022 frameborder=\u00220\u0022 allow=\u0022accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\u0022 referrerpolicy=\u0022strict-origin-when-cross-origin\u0022 allowfullscreen title=\u0022Pure Moods Hit Collection Commercial\u0022\u003e\u003c/iframe\u003e"}

Is it possible/safe to configure providers.json on a currently deployed homeserver? Perhaps to add Youtube itself as Twitter and Youtube Shorts are already there?

Thanks

@clokep
Copy link
Contributor

clokep commented Sep 15, 2024

Is it possible/safe to configure providers.json on a currently deployed homeserver? Perhaps to add Youtube itself as Twitter and Youtube Shorts are already there?

You should be able to configure via https://element-hq.github.io/synapse/latest/usage/configuration/config_documentation.html#oembed to add additional providers if you'd like.

@heftig
Copy link

heftig commented Sep 15, 2024

Here's a script that should output a Synapse-compatible providers.json:

#!/usr/bin/env python3

import json
from collections import OrderedDict
from sys import stdout
from urllib.parse import urlparse

import jsonschema
import requests
from jsonschema.exceptions import ValidationError

output = requests.get("https://oembed.com/providers.json")
output.raise_for_status()
providers = output.json(object_pairs_hook=OrderedDict)

# From synapse/config/oembed.py
_OEMBED_PROVIDER_SCHEMA = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "provider_name": {"type": "string"},
            "provider_url": {"type": "string"},
            "endpoints": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "schemes": {
                            "type": "array",
                            "items": {"type": "string"},
                        },
                        "url": {"type": "string"},
                        "formats": {"type": "array", "items": {"type": "string"}},
                        "discovery": {"type": "boolean"},
                    },
                    "required": ["schemes", "url"],
                },
            },
        },
        "required": ["provider_name", "provider_url", "endpoints"],
    },
}

while True:
    try:
        jsonschema.validate(providers, _OEMBED_PROVIDER_SCHEMA)
    except ValidationError as e:
        del providers[e.absolute_path[0]]
    else:
        break


def valid_url(url):
    return urlparse(url).scheme in ["http", "https"]


def valid_provider(provider):
    for endpoint in provider["endpoints"]:
        if not valid_url(endpoint["url"]):
            return False
        for glob in endpoint["schemes"]:
            if not valid_url(glob):
                return False
    return True


providers = [p for p in providers if valid_provider(p)]
json.dump(providers, stdout, indent=4)
print()

@daemontron
Copy link

daemontron commented Sep 15, 2024

Thanks @clokep and @heftig, Do you think this might warrant a PR to update https://github.com/element-hq/synapse/blob/develop/synapse/res/providers.json?

I actually added the following to my own custom_providers.json and most YouTube Preview URLs now successfully parse!
Probably could do a better job on the schemes, Looks like jonas-w had more above. Probably should include things like m.youtube.com as well?

[
  {
    "provider_name": "YouTube",
    "provider_url": "https://www.youtube.com/",
    "endpoints": [
      {
        "schemes": [
          "https://www.youtube.com/watch*",
          "https://youtube.com/watch*",
          "https://youtu.be/*",
          "https://youtu.be/*?*"
        ],
        "url": "https://www.youtube.com/oembed"
      }
    ]
  }
]

This is what I added to my homeserver.yaml:

oembed:
  additional_providers:
  - /data/custom_providers.json
  disable_default_providers: false

*It's /data/custom_providers.json and not /matrix/synapse/config/custom_providers.json because I use https://github.com/spantaleev/matrix-docker-ansible-deploy/

@salixor FYI incase you want to try it.

I don't know if it's best to have disable_default_providers: to false or true, so kept it false and assumed what I added would be added in addition to the default providers.

@salixor
Copy link
Author

salixor commented Sep 15, 2024

Hey, thanks a lot for the ping.
I hadn't followed this thread much as I thought there wasn't really a way around it.
I just learned about the oembed key in the config!

I just tested it, and it works flawlessly!
disable_default_providers is kept to false on my configuration with no issues.

Thanks a lot for everyone's work ❤️

@clokep
Copy link
Contributor

clokep commented Sep 16, 2024

Do you think this might warrant a PR to update

If this is a widespread issue then it is probably worthwhile. If it is only affecting some people then some comparison to what the scraping gets va oEmbed might be warranted? Depends on what the current team wants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants