Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML entities in links are converted to symbols even if they do not end with a semi-colon #5717

Closed
jpurpleman opened this issue Mar 20, 2018 · 7 comments
Labels
[Type] Bug An existing feature does not function as intended

Comments

@jpurpleman
Copy link

Issue Overview

When we have content in the classic editor that is a web link with arguments. This link has query string arguments that include "&reg". In the classic editor the web link is fine, but when viewing the page in the Gutenberg editor, we see the "&reg" gets converted to a registered symbol.

Steps to Reproduce (for bugs)

Have a link with query arguments of "&reg"

Our simple example is

https://www.example.com/?foo=bar&reg=baz

  1. Create a page in the classic editor
  2. Put in the following content into text version of editor
  3. https://www.example.com/?foo=bar&reg=baz
  4. Save the page.
  5. Edit the page in Gutenberg editor
  6. You'll see that the "&reg" is now a registered symbol

Using Google Chrome on Linux Version 60.0.3112.90 (Official Build) (64-bit)
Using Gutenberg 2.4
Using WordPress 4.9.4

Expected Behavior

I don't expect the argument to change to a registered symbol

Screenshots / Video

Please see for before and after screenshots.

https://imgur.com/a/bNcb3

@jeffpaul jeffpaul added the [Type] Bug An existing feature does not function as intended label Mar 21, 2018
@ZebulanStanphill
Copy link
Member

ZebulanStanphill commented Mar 23, 2018

It looks like this happens with any string of characters that would be an HTML entity if you added a semicolon. (Not just in links.)

If you put &amp, &lt, &gt, or &#34 into a Gutenberg post using the Code Editor mode and then switch to the Visual Editor mode, the strings get converted to &, <, >, and " respectively.

I think the issue here is something along the lines of strings being parsed as HTML entities regardless of whether they end with a semicolon or not. Perhaps the strings are being considered as having typos and the semicolons are being added by automatic typo-correction?

@mtias mtias added the Needs Testing Needs further testing to be confirmed. label Jul 17, 2018
@designsimply designsimply changed the title &reg in weblink renders registered symbol HTML entities in links are converted to symbols even if they do not end with a semi-colon Jul 17, 2018
@designsimply designsimply added [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f and removed Needs Testing Needs further testing to be confirmed. labels Jul 17, 2018
@designsimply
Copy link
Member

Tested and confirmed that HTML character entities that do not end with a ; are converted directly into symbols for inline URLs and converted from something like&cent to &amp;cent inside an href attribute inside an anchor tag when opened in the Gutenberg editor.

this:

https://www.example.com/?foo=bar&reg=registered&amp=ampersand&quot=quote&cent=cent

<a href="https://www.example.com/?foo=bar&reg=registered&amp=ampersand&quot=quote&cent=cent">link</a>

&reg

&amp

&quot

&cent

becomes:

<p>https://www.example.com/?foo=bar®=registered&amp;=ampersand"=quote¢=cent</p>
<p><a href="https://www.example.com/?foo=bar&amp;reg=registered&amp;amp=ampersand&amp;quot=quote&amp;cent=cent">link</a></p>
<p>®</p>
<p>&amp;</p>
<p>"</p>
<p>¢</p>

screen shot 2018-07-17 at tue jul 17 4 50 08 pm

screen shot 2018-07-17 at tue jul 17 4 52 51 pm

Seen at http://alittletestblog.com/wp-admin/post.php?post=14002&action=edit running WordPress 4.9.7 and Gutenberg 3.2.0 using Firefox 61.0.1 on macOS 10.13.5.

@davewarfel
Copy link

I noticed a similar issue today. Just wanted to document my experience and provide a screencast for further debugging.

Screencast:
https://www.dropbox.com/s/anom2rpsdtqfdvg/gutenberg-code-tags-html-visually.mp4?dl=0

Reproduce issue:

  • Edit Visually: Enter some HTML code in a paragraph block (ex: <span class="">)
  • Switch to Edit HTML: The < gets converted to &lt; but the > stays a >
  • Edit HTML: Wrap <code> tags around the HTML code, and change the > to a &gt;
  • Switch to Edit Visually: Looks fine
  • Switch back to Edit HTML: The &gt; is converted back to >

@mcsf
Copy link
Contributor

mcsf commented Oct 31, 2018

@davewarfel: that is a different matter from the one reported in this issue. That discrepancy between < and > is actually the result of following the W3C specification. Gutenberg conforms to the spec in its serialization stack. See https://github.com/WordPress/gutenberg/blob/master/packages/escape-html/src/index.js#L70-L84.

@jpurpleman, I can't repro this issue. Most likely it was solved once we switched to our own serializer.

@mcsf mcsf closed this as completed Oct 31, 2018
@mcsf mcsf removed the [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f label Oct 31, 2018
@LucCole
Copy link

LucCole commented Feb 7, 2019

This is still an issue.

When I try and add a double quote in my post with &quot; it will convert it to a ".

If I try and add a &quot (no ;) then it will save fine the first time but when I go back to the editor it will convert it to a ".

This is a problem because,

Lets say I have a shortcode in my WP post

[my_shortcode text="This is some example text, and this is a "quote""]

Like I said this works fine if I can use

[my_shortcode text="This is some example text, and this is a &quotquote&quot"]

But the editor keeps changing (&quot) back to (")

@mcsf
Copy link
Contributor

mcsf commented Feb 8, 2019

See #13609.

@adamjrice
Copy link

I'm also getting caught on this in version 5.2.3.

Also, interestingly, when I search posts from the admin interface for &amp;, that gets converted to & -- which makes it that much harder to look for those escaped ampersands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Bug An existing feature does not function as intended
Projects
None yet
Development

No branches or pull requests

9 participants