Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve   entity #102

Closed
nltesown opened this issue Aug 21, 2015 · 8 comments · Fixed by #315
Closed

Preserve   entity #102

nltesown opened this issue Aug 21, 2015 · 8 comments · Fixed by #315

Comments

@nltesown
Copy link

The non-breaking space entity ( ) should not be by a normal space character.
More generally, it may be useful to let the user decide how HTML entities should be dealt with (converted or preserved).

@myakura
Copy link

myakura commented Sep 19, 2015

The non-breaking space entity ( ) should not be by a normal space character.

is it really a "normal" space character? i mean, does   return a character whose code point is U+0020 not U+00A0?

@nltesown
Copy link
Author

I think it doesn't (I may be wrong). Even if it did, the main problem remains: when you see a white space, there's no way to tell them apart, so preserving the   entity would be at least useful.

@myakura
Copy link

myakura commented Sep 19, 2015

so preserving the   entity would be at least useful.

perhaps. I'm not sure if it's possible to "preserve" entities as jsdom converts all named (and numeric) references to their matching character(s).

@wizardforcel
Copy link

It's because node.childNodes[i].data (index.js, line 133) will parse HTML entity.

So I think it should be changed into node.childNodes[i].data.replace(/</g, '&lt;').replace(/>/g, '&gt;'), or some better way.

@Tyriar
Copy link

Tyriar commented Jan 10, 2016

Another problem with not preserving &nbsp;, it can lead to an invalid format conversion:

<i>foo&nbsp;</i>bar

Converts to:

_foo _bar

Because the _ is no longer wrapping the phrase it does not end the italics style.

@kadishmal
Copy link

In my case the following HTML is incorrectly parsed. The output Markdown doesn't have the trailing space.

<p>I have it:&nbsp;</p>

Is converted into I have it: without the trailing space.

@rabeesh
Copy link

rabeesh commented May 12, 2016

I tried to convert

<em>ds &nbsp; &nbsp; </em><strong>fs&nbsp;</strong>

Which is parsed into markdown as

_ds    _ **fs **

Need to keep html entity space  

@martincizek
Copy link
Collaborator

Just for reference - a through analysis on this topic: https://github.com/orchitech/turndown/wiki/Whitespace

A PR will come soon. :)

martincizek added a commit to orchitech/turndown that referenced this issue Mar 31, 2020
Do not merge ASCII and non-ASCII whitespace.
Make sure non-ASCII whitespace is moved out of inline elements to prevent generating broken Markdown.
Fix mixmark-io#102.
Fix mixmark-io#250.
martincizek added a commit to orchitech/turndown that referenced this issue Jul 6, 2020
Do not merge ASCII and non-ASCII whitespace.
Make sure non-ASCII whitespace is moved out of inline elements to prevent generating broken Markdown.
Fix mixmark-io#102.
Fix mixmark-io#250.
michbart pushed a commit to orchitech/turndown that referenced this issue Nov 30, 2020
Do not merge ASCII and non-ASCII whitespace.
Make sure non-ASCII whitespace is moved out of inline elements to prevent generating broken Markdown.
Fix mixmark-io#102.
Fix mixmark-io#250.
michbart pushed a commit to orchitech/turndown that referenced this issue Nov 30, 2020
Do not merge ASCII and non-ASCII whitespace.
Make sure non-ASCII whitespace is moved out of inline elements to prevent generating broken Markdown.
Fix mixmark-io#102.
Fix mixmark-io#250.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants