Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data URI's get sanitized #101

Closed
Qqwy opened this issue Feb 18, 2016 · 8 comments
Closed

Data URI's get sanitized #101

Qqwy opened this issue Feb 18, 2016 · 8 comments

Comments

@Qqwy
Copy link

Qqwy commented Feb 18, 2016

I just posted this issue at the Rails HTML Sanitizer repo, but after testing I realized that it goes wrong inside Loofah itself.


This is the problem:
When I sanitize a HTML string with an image whose src points to a data URI, its src attribute is removed (even when src is whitelisted):

unclean_html = "A test
<img src=\"http://placehold.it/400x300\">
<img src='data:image/gif;base64,R0lGODlhAQABAPAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw=='/>"

clean_html = Loofah.fragment(unclean_html).scrub!(:strip).to_s
clean_html
# => "A test\n    <img src=\"http://placehold.it/400x300\">\n    <img>"

I presume that this happens because of too strict JS-prevention measures (or the data-URI is just discarded because the sanitizer does not understand the data: protocol?).

I hope this can be fixed.

@Qqwy
Copy link
Author

Qqwy commented Feb 18, 2016

I think I've found the perpetraitor:

Loofah::HTML5::WhiteList::ACCEPTABLE_PROTOCOLS does not include "data".

@flavorjones
Copy link
Owner

@Qqwy can you help me understand why data URLs are safe and shouldn't be sanitized?

There's a workaround available, which is to wrap the URL in url(...):

#! /usr/bin/env ruby

require 'nokogiri'
require 'loofah'

unclean_html = <<-EOHTML
A test
<img src=\"http://placehold.it/400x300\">
<img src='data:image/gif;base64,R0lGODlhAQABAPAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw=='/>"
<img src='url(data:image/gif;base64,R0lGODlhAQABAPAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==)'/>"
EOHTML

clean_html = Loofah.fragment(unclean_html).scrub!(:strip).to_s
puts clean_html

emits

A test
<img src="http://placehold.it/400x300">
<img>"
<img src="url(data:image/gif;base64,R0lGODlhAQABAPAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==)">"

@Qqwy
Copy link
Author

Qqwy commented Mar 6, 2016

@flavorjones I had presumed that data: URI's were only ever able to link to 'downloadable' resources, but you made me doubt my knowledge, so I did some research. And it turns out that I am wrong, to some extent:

  • IE only allows downloadable resources (images, stylesheets) anywhere.
  • Webkit allows text/script and text/html sources, but these are sandboxed.
  • If I understand it correctly, Gecko explicitly does not sandbox data URI's, although only iframes and script tags seem to be able to execute text/html and application/script resources.

(Sources: this question on Stackoverflow, RFC2397, Can I Use Data URIs ).

So, it seems that it indeed is unsafe to allow all data URIs. What seems like a better option to me, is to allow data URIs that specify one of the following media types, as script execution is impossible in them:

  • text/plain (This is also the default, when specifying '' as content type)
  • text/css
  • image/png
  • image/gif
  • image/jpeg
  • image/svg+xml (This is safe as in-SVG JS is only executed when the SVG is included inline as part of the document.)

(Did I miss any important ones in above list?)

What do you think of this proposal?

Using an <img src="url(...)"/> tag does not seem to work here: Webkit and Gecko both show a broken image.

Also see my example CodePen where I tested out various variations of above stuff.

@flavorjones
Copy link
Owner

@Qqwy thanks for the thorough analysis. I'd be fine with the above set of media types. Would you be interested in updating your PR at #102 to conditionally allow this?

@Qqwy
Copy link
Author

Qqwy commented Jan 15, 2017

Yes, I would.
Expect the updated PR tomorrow or the day after that :-).

@aergonaut
Copy link

@Qqwy @flavorjones Just wondering if this is still being worked on? I see #102 is still closed and doesn't look like it's been updated with the content type restrictions outlined above. I'd really love to see data URIs supported in Loofah!

@mrpasquini
Copy link
Contributor

I opened #120 since I hadnt seen any code on this yet.

flavorjones added a commit that referenced this issue Sep 24, 2017
and update CHANGELOG with this feature and a thank-you.

Related to #101, #120.
@flavorjones
Copy link
Owner

This was addressed in v2.1.0 and should have been closed then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants