Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: escape foreign style tag content when serializing HTML5 #3348

Merged
merged 2 commits into from
Dec 2, 2024

Conversation

flavorjones
Copy link
Member

What problem is this PR intended to solve?

Normally, a style tag is considered to be a raw text element, meaning < is parsed as part of a possible "tag start" token, and is serialized literally (and not rendered as an escaped character reference &lt;).

However, when appearing in either SVG or MathML foreign content, a style tag should not be considered a raw text element, and should be escaped when serialized. libgumbo is parsing this case correctly, but our HTML5 serialization code does not escape the content.

This commit updates the static is_one_of() C function to consider the namespace of the parent node as well as the tag's local name when deciding whether the tag matches the list of HTML elements, so that a style tag in foreign content will not match, but a style tag in HTML content will match.

Have you included adequate test coverage?

Yes.

Does this change affect the behavior of either the C or the Java implementations?

HTML5 is only available in the CRuby impl.

Normally, a `style` tag is considered to be a raw text element,
meaning `<` is parsed as part of a possible "tag start" token, and is
serialized literally (and not rendered as an escaped character
reference `&lt;`).

However, when appearing in either SVG or MathML foreign content, a
`style` tag should *not* be considered a raw text element, and should
be escaped when serialized. libgumbo is parsing this case correctly,
but our HTML5 serialization code does not escape the content.

This commit updates the static `is_one_of()` C function to consider
the namespace of the parent node as well as the tag's local name when
deciding whether the tag matches the list of HTML elements, so that a
`style` tag in foreign content will *not* match, but a `style` tag in
HTML content will match.
for v1.15.7 and v1.16.8
@flavorjones flavorjones merged commit 733ae93 into main Dec 2, 2024
133 of 134 checks passed
@flavorjones flavorjones deleted the flavorjones-svg-style-serialization branch December 2, 2024 21:44
flavorjones added a commit that referenced this pull request Dec 6, 2024
flavorjones added a commit that referenced this pull request Dec 6, 2024
flavorjones added a commit that referenced this pull request Dec 6, 2024
flavorjones added a commit that referenced this pull request Dec 6, 2024
flavorjones added a commit that referenced this pull request Dec 6, 2024
rgrove added a commit to rgrove/sanitize that referenced this pull request Dec 26, 2024
This version of Nokogiri fixes a foreign content escaping issue that
Sanitize previously had to work around manually. To avoid double
escaping, Sanitize's workaround has been removed, which means it's
important to prevent the use of an older Nokogiri that doesn't have the
fix.

See sparklemotion/nokogiri#3348
flavorjones added a commit that referenced this pull request Jan 2, 2025
**What problem is this PR intended to solve?**

In #3348, downstream CI tests for Sanitize were temporarily disabled
because the changes in that PR (intentionally) caused some of Sanitize's
tests to fail.

As of Sanitize 7.0.0, tests are passing again with the latest Nokogiri
and it's safe to re-enable the downstream tests. 🎉
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant