Support image previews for Twitter oEmbed URL Previews #8022

anoadragon453 · 2020-08-03T17:29:49Z

To help fix twitter embedding issues, we've just added support for oEmbed in capturing URL previews. We now use this for previewing twitter links (by default), and we can now receive tweet text without problems. However, twitter does not return any image data for a tweet in its oEmbed response:

{
  "url": "https:\\/\\/twitter.com\\/arnaudmez7\\/status\\/1284848614062338053",
  "author_name": "The Uncle Mez",
  "author_url": "https:\\/\\/twitter.com\\/arnaudmez7",
  "html": "\\u003Cblockquote class=\"twitter-tweet\"\\u003E\\u003Cp lang=\"en\" dir=\"ltr\"\\u003EI Absolutely like the new \\u003Ca href=\"https:\\/\\/twitter.com\\/element_hq?ref_src=twsrc%5Etfw\"\\u003E@element_hq\\u003C\\/a\\u003E \\u003Cbr\\u003EBeautiful work !\\u003Cbr\\u003ERun very well on \\u003Ca href=\"https:\\/\\/twitter.com\\/SolusProject?ref_src=twsrc%5Etfw\"\\u003E@SolusProject\\u003C\\/a\\u003E \\u003Ca href=\"https:\\/\\/t.co\\/bLzhmuoFdy\"\\u003Epic.twitter.com\\/bLzhmuoFdy\\u003C\\/a\\u003E\\u003C\\/p\\u003E&mdash; The Uncle Mez (@arnaudmez7) \\u003Ca href=\"https:\\/\\/twitter.com\\/arnaudmez7\\/status\\/1284848614062338053?ref_src=twsrc%5Etfw\"\\u003EJuly 19, 2020\\u003C\\/a\\u003E\\u003C\\/blockquote\\u003E\n\\u003Cscript async src=\"https:\\/\\/platform.twitter.com\\/widgets.js\" charset=\"utf-8\"\\u003E\\u003C\\/script\\u003E\n",
  "width": 550,
  "height": null,
  "type": "rich",
  "cache_age": "3153600000",
  "provider_name": "Twitter",
  "provider_url": "https:\\/\\/twitter.com",
  "version": "1.0"
}

You'll notice that the html key has a pic.twitter.com URL in it. However, this just leads us to the tweet HTML, and extracting it from this HTML is too twitter-specific anyways.

However, the HTML returned here is the exact same (minus being encoded) as what's shown on publish.twitter.com for this tweet. You can see that this HTML renders into a nice little standardised preview of the tweet. Part of this HTML is a JS script that gets loaded (platform.twitter.com/widgets.js) that will actually do most of the magic render the tweet.

Theoretically, after rendering this HTML output locally, we can just run our standard URL preview code over it and extract an image!

Thus my proposal for support Twitter image embeds with oEmbed that is still generic is to:

Check if a response has image information (either photo or video response type is used, or thumbnail* keys are provided.
If an image isn't easily provided, check for an html key.
If html key exists, render securely and run URL preview code over it.
Attempt to extract an image.

At the moment this is all theory, I haven't tested it in code yet.

The text was updated successfully, but these errors were encountered:

erikjohnston · 2020-08-04T13:26:27Z

I'm not really in favour of this for two reasons:

This requires running a JS engine in synapse (or forking out to one), and that scares me.
This is working around what Twitter has intentionally provided, where they clearly intend for this to render client side.

The Twitter API suggests that clients include https://platform.twitter.com/widgets.js and run twttr.widgets.load() on new URL previews, but that is a obviously twitter specific.

erikjohnston · 2020-08-04T14:13:37Z

I'm going to close this because I think we've agreed that this isn't the right approach 🙂

aaronraimist · 2020-08-15T15:23:24Z

It's actually easy to do and doesn't require all of that.

https://matrix.to/#/!XaqDhxuTIlvldquJaV:matrix.org/$bvBYxFl1vc1_FbDz-VxSb2Lqh1V0kFIPrgHD_KHMhog?via=sw1v.org&via=raim.ist&via=matrix.org

https://mau.dev/maunium/synapse/-/commit/fe01ce7cf786378f72f741c80b6183674aeada50

It seems that has been decided against for some reason but I'm just adding a comment here so at least it is mentioned somewhere on the repo.

anoadragon453 · 2020-08-18T14:20:06Z

For those coming here in the future, Synapse already sends a User-Agent string of Synapse/x.xx.x during it's URL preview fetching: #1859

It seems that the solution @aaronraimist works because twitter allows previews by programs with "bot" in their user-agent string. We're not sure whether we want to add this to the user-agent string, especially if it's not standard practice and twitter-specific.

One may suggest allowing the URL preview UA to be configurable, but having to tell users to change this setting to get services like twitter working isn't a great situation to be in.

Given the above there's not an easy path forward here.

aaronraimist · 2020-08-18T15:06:59Z

Bot in the user agent doesn't seem like that much of a hack to me. For example #1859 was asking to put bot in the UA string back in 2017 just to show that it was in fact a bot making the requests.

Right now the current situation will never work so even if it only worked temporarily after making this change that's still an improvement. You don't have to guarantee that Twitter previews are going to continue to work after making this change. It can just happily work, until maybe in the future they change something and it stops working.

anoadragon453 · 2020-08-18T15:38:35Z

We don't want to modify the UA header for a twitter-specific reason. However, if putting "bot" in the URL is something industry-wide, or as you say to indicate that it's a request originating from a bot, then it'd be a good reason to do so. What do other link-fetching services do?

After some discussion in #synapse-dev, I'm more favourable towards the configurable UA option, although I do realise that it wouldn't solve the problem for twitter by default.

aaronraimist · 2020-08-18T16:41:55Z

I don't know if it is a standard but it doesn't seem uncommon. For example most of Google's crawlers have the word bot in the user agent https://support.google.com/webmasters/answer/1061943?hl=en and like the Wikipedia article for user agents says

Automated web crawling tools can use a simplified form, where an important field is contact information in case of problems. By convention the word "bot" is included in the name of the agent.

As a reference for that it is just linking to a blog post but it does seem like something that some people recommend.

https://en.wikipedia.org/wiki/User_agent#Format_for_automated_agents_(bots)

anoadragon453 mentioned this issue Aug 3, 2020

Support oEmbed for media previews. #7920

Merged

erikjohnston closed this as completed Aug 4, 2020

jryans mentioned this issue Jan 25, 2021

Twitter URL previews element-hq/element-web#16257

Closed

AndrewRyanChama mentioned this issue Feb 13, 2022

Fetch images when previewing Twitter #11985

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support image previews for Twitter oEmbed URL Previews #8022

Support image previews for Twitter oEmbed URL Previews #8022

anoadragon453 commented Aug 3, 2020 •

edited

Loading

erikjohnston commented Aug 4, 2020

erikjohnston commented Aug 4, 2020

aaronraimist commented Aug 15, 2020

anoadragon453 commented Aug 18, 2020

aaronraimist commented Aug 18, 2020 •

edited

Loading

anoadragon453 commented Aug 18, 2020

aaronraimist commented Aug 18, 2020 •

edited

Loading

Support image previews for Twitter oEmbed URL Previews #8022

Support image previews for Twitter oEmbed URL Previews #8022

Comments

anoadragon453 commented Aug 3, 2020 • edited Loading

erikjohnston commented Aug 4, 2020

erikjohnston commented Aug 4, 2020

aaronraimist commented Aug 15, 2020

anoadragon453 commented Aug 18, 2020

aaronraimist commented Aug 18, 2020 • edited Loading

anoadragon453 commented Aug 18, 2020

aaronraimist commented Aug 18, 2020 • edited Loading

anoadragon453 commented Aug 3, 2020 •

edited

Loading

aaronraimist commented Aug 18, 2020 •

edited

Loading

aaronraimist commented Aug 18, 2020 •

edited

Loading