Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sanitizer making block level elements unreadable #10836

Merged
merged 1 commit into from
Jun 16, 2019

Conversation

Gargron
Copy link
Member

@Gargron Gargron commented May 26, 2019

Fix #10834

@Gargron Gargron added the activitypub Protocol-related changes, federation label May 26, 2019
@Gargron Gargron force-pushed the fix-sanitize-tags-removal branch from 4f1ef47 to e3267af Compare May 26, 2019 00:52
@ClearlyClaire
Copy link
Contributor

I'd still prefer actual support for those tags, but this is an improvement on the current behavior.

@ClearlyClaire
Copy link
Contributor

ClearlyClaire commented May 26, 2019

Thinking about it, it's a good opportunity to translate inline img tags to a link to the original image + use the alt attribute.

expect(Sanitize.fragment('<h1>Foo</h1>', subject)).to eq '<p>Foo</p>'
end

it 'converts li to p' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did i misunderstand the job of UNSUPPORTED_ELEMENTS_TRANSFORMER? to me it read like (h1 h2 h3 h4 h5 h6 blockquote pre li) would be preserved, and anything else would be converted to p

the reason i ask is because converting li to p would insert margins below it, and that causes excessive spacing. essentially, an li element has only 1 visual line break, whereas a p element has 2 visual line breaks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you probably misunderstood. They all get converted to <p>.

I feel like single line breaks can be very hard to notice within longer text, so while for single-line list items it would be best to use <br />, <p> is more fit for anything that has its own wrapping.

Copy link
Member

@trwnh trwnh May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, so it's like this then?

  1. keep (p br span a)
  2. convert (h1 h2 h3 h4 h5 h6 blockquote pre li) to p
  3. strip all other elements

the "unless" was throwing me off.

i still think semantically if you were trying to map ol/ul and li, then li would be most like a br element. most formatting on the rest of the internet (and indeed most browsers' default css) treats lists as single-break, with double-break around the list due to margin css.

i'm still not sure what the point is in explicitly stripping out the list elements, since they render unobtrusively anyway. my idea of converting to p was more for unsupported elements like marquee. as far as i can tell this current code just removes marquee entirely, right? that would cause multiple marquees to become a run-on sentence.

from a purely visual standpoint, everything in (h1 h2 h3 h4 h5 h6 blockquote pre li) except for pre is actually safe due to the basics.scss reset, without any other changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think converting li to p elements might indeed be a problem. For a start, li elements can contain a p already. Also, regarding how it would be displayed:

With my patch

image

Converted as p tags

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise, this is what the following markup looks like without any patches:

<ul><li>item</li><li>item</li><li>item</li><li>item</li><li>item</li></ul>

image

@@ -19,6 +19,11 @@ module Config
node['class'] = class_list.join(' ')
end

UNSUPPORTED_ELEMENTS_TRANSFORMER = lambda do |env|
return unless %w(h1 h2 h3 h4 h5 h6 blockquote pre li).include?(env[:node_name])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw if this line does what i think it does, then pre should be converted instead of maintained (unless you add css along the lines of overflow-x: auto or white-space: pre-wrap.

headers, blockquote, and li are reset by basics.scss regardless. for that matter, ol and ul are also safely reset by the current css.

@Gargron Gargron marked this pull request as ready for review June 4, 2019 15:25
@twilight-sparkle-irl
Copy link

For the record, I don't see why this is necessary when the user opinion has been massively in favor of full support.

@Gargron Gargron force-pushed the fix-sanitize-tags-removal branch from e3267af to 4a20bd4 Compare June 16, 2019 17:48
Copy link
Contributor

@ClearlyClaire ClearlyClaire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still prefer if we accepted those tags directly, but I think this is the best fallback we can have, even though it still loses information (nested lists for instance).

Also, this PR might be a good place for replacing incoming inlined images with links (as in glitch-soc#1071)

@Gargron Gargron merged commit 103a9f4 into master Jun 16, 2019
@Gargron Gargron deleted the fix-sanitize-tags-removal branch June 16, 2019 19:46
hiyuki2578 pushed a commit to ProjectMyosotis/mastodon that referenced this pull request Oct 2, 2019
messenjahofchrist pushed a commit to Origin-Creative/mastodon that referenced this pull request Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
activitypub Protocol-related changes, federation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Keep lists/headings in sanitizer, or convert to <p>
4 participants