Remove substitution of non-breakable spaces #897

kephas · 2017-05-22T14:28:47Z

Fixes #640
Fixes #363

joshbruce · 2018-04-13T17:52:19Z

@kephas: Is it possible add tests to this PR??

UziTech

Looks like this was added in c13a7aa to fix #52

Being that there is a work around (using  ) I say we leave it.

davisjam · 2018-06-27T04:17:20Z

@kephas

I don't know much about this part of the code...

Can you comment on the potential impact on the rest of the lexer? I imagine that it will all "just work" and the \u00a0 will be preserved in the generated HTML to be rendered by the browser?
@UziTech If \u00a0 is a distinct Unicode character with special behavior not equivalent to plain whitespace, why would we eliminate it? We could instead replace \u00a0 with &nbsp, I suppose, but that seems worse than the approach @kephas has proposed.

UziTech · 2018-06-27T05:33:54Z

@davisjam
It seems this issue happens when taking input from a content editable div. Typing two or more consecutive spaces turns the extra spaces into \u00a0 which @OscarGodson (#52) believes should be interpreted as whitespace and @kephas (#640) believes should be interpreted as non-breaking space characters.

Martii · 2018-06-27T11:33:37Z

So here's a thought... if someone uses https://github.github.com/gfm/#indented-code-blocks then the "content editable div" will convert those 4 normal typed spaces into 1 normal space and 3 \u00a0 (non-breaking spaces) and then marked currently converts those back to regular spaces (as if they were typed that way and they were from what I understand reading this). Removal of that seems like it would break that part of the spec if they weren't normal spaces. Just something to chew on I hope.

UziTech · 2018-06-27T13:57:54Z

I agree with @Martii, I would think that most people who enter a \u00a0 mean for it to be a regular space and not a non-breaking space character.

If someone is going to enter \u00a0 intentionally they can just as easily enter  

davisjam · 2018-06-27T15:18:05Z

So let's see if I understand the situation: we can't tell with complete certainty whether a \u00a0 was generated automatically from whitespace (so replacing it with whitespace is fine) or whether it was deliberately entered (so replacing it with whitespace is surprising).

If the majority of the \u00a0 we see are auto-generated from whitespace, then I think returning them to whitespace by default (i.e. the current behavior) is fine.

I suspect that most of the \u00a0's we see are auto-generated. Markdown is generally produced by humans typing into text editors, and none of the humans I know inject unusual Unicode characters into their markdown files. And @UziTech is correct that an &nbsp is a viable way to insert non-breaking spaces if someone really wants them.

kephas · 2018-06-29T19:04:43Z

I fear that you are basing your assumptions on english writers. In French, for example, it is a requirement to have a space before double punctuations like [;:?!]. Some text editors actually do this for you and do this, with good reason, not with a normal space but an unbreakable space, because that punctation should not be orphaned, typographically speaking.

UziTech · 2018-06-29T20:09:45Z

Possible solutions that work for both parties:

Leave marked as is and recommend anyone who needs \u00a0 to be a non-breaking space to use
```
marked(markdownString.replace(/\u00a0/g, '&nbsp;'))
```
Commit this PR and recommend anyone who needs \u00a0 to be whitespace to use
```
marked(markdownString.replace(/\u00a0/g, ' '))
```
Create an option to let the user convert \u00a0 to ' ' or not

Martii · 2018-06-29T21:22:52Z

3)) NON-BREAKING Could work but a default of 1)) 👍

1)) NON-BREAKING ❓ Couldn't those few languages that punctuate uniquely write their own Lexer.prototype.lex or is that not possible? If it is possible it would make more sense for them to utilize that with the French language.

2)) BREAKING 👎 Mentioned at #897 (comment) and that's probably not the only one that needs real spaces.

Misc note: Russian, Spanish, etc. doesn't use an extra space and it's definitely not English. So moot point.

A)) NON-BREAKING ❓ Seems like CSS and/or the browser default language via should handle this although I've never seen a rule like this.

B)) NON-BREAKING ❓ Some html metadata... Ex. http://nimbupani.com/declaring-languages-in-html-5.html

davisjam · 2018-06-30T17:55:52Z

@kephas You are correct that I was only considering English-language practices in my summary.

@UziTech Thanks for the proposals.

I like the "offer an option for this" approach (3), and I think @Martii's suggestion of using substitution as the default is the right choice.

I don't want to ask users to supply a mix of options and ad hoc substitutions, as suggested in (1).

mccraveiro · 2018-12-20T16:30:41Z

Option 1 means we would never be 100% spec compliant as stated on #1398
Option 2 makes a lot of sense and goes in the same direction as #252. I mean: the fewer config options we have the better; and everything thats better done outside the marked core shouldn't be included on the core.

Also, we could create a FAQ section on README to include this case and #252.

Feder1co5oave · 2019-01-04T13:27:26Z

tl;dr let's just stop replacing \u00a0 with spaces, maybe on the next development branch

I understand the wish for backward compatibility, even for existing "broken" practices like typing nbspaces instead of regular spaces (even though it seems we wouldn't bother as many people as we might think, as @UziTech pointed out in the linked issue), but I'm more in favor of letting the user decide when to use nbspaces, without having to use any hack like String.replace (which tie him to html). This makes us more typography friendly, binary transparent, Unicode aware, and commonmark compliant. And I definitely wouldn't add an option only for this.

If we'd like to retain some backward compatibility, I'm wondering if we should branch the development process into two parallel lines, whereas we would introduce fixes and breaking changes on the e.g. 1.0-dev branch, and cherry pick only bugfixes and non-breaking improvements into 0.5-dev or 0.6-dev. Obviously this puts some additional burden on the maintainers...

Remove substitution of non-breakable spaces

0c97bd8

Feder1co5oave mentioned this pull request Mar 8, 2018

[commonmark] make html comments, html inlines, html blocks and links compliant #1135

Merged

5 tasks

joshbruce added the Missing Tests label Apr 13, 2018

styfle requested review from UziTech and davisjam April 16, 2018 13:08

UziTech reviewed Apr 16, 2018

View reviewed changes

Add test containing multiple Unicode spaces

02577dc

davisjam removed the Missing Tests label Jun 27, 2018

UziTech mentioned this pull request Dec 5, 2018

Marked removes non-breaking spaces in the original text #363

Closed

UziTech mentioned this pull request Dec 20, 2018

Fix no-break space on lists #1398

Closed

6 tasks

UziTech mentioned this pull request Mar 11, 2019

Non-breakable spaces don't work in marked #640

Closed

UziTech mentioned this pull request Apr 19, 2019

fix commonmark example 353 #1477

Closed

5 tasks

This was referenced Aug 5, 2019

␤ actually becomes a newline when it shouldn't #1531

Closed

remove substitutions #1532

Merged

joshbruce closed this in #1532 Aug 5, 2019

shabbyrobe mentioned this pull request Mar 3, 2022

Remove subsitutions as per markedjs/marked#1532 ubersl0th/markdown#15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove substitution of non-breakable spaces #897

Remove substitution of non-breakable spaces #897

kephas commented May 22, 2017 •

edited by styfle

Loading

joshbruce commented Apr 13, 2018

UziTech left a comment •

edited

Loading

davisjam commented Jun 27, 2018

UziTech commented Jun 27, 2018

Martii commented Jun 27, 2018

UziTech commented Jun 27, 2018

davisjam commented Jun 27, 2018

kephas commented Jun 29, 2018

UziTech commented Jun 29, 2018 •

edited

Loading

Martii commented Jun 29, 2018 •

edited

Loading

davisjam commented Jun 30, 2018

mccraveiro commented Dec 20, 2018

Feder1co5oave commented Jan 4, 2019

Remove substitution of non-breakable spaces #897

Remove substitution of non-breakable spaces #897

Conversation

kephas commented May 22, 2017 • edited by styfle Loading

joshbruce commented Apr 13, 2018

UziTech left a comment • edited Loading

Choose a reason for hiding this comment

davisjam commented Jun 27, 2018

UziTech commented Jun 27, 2018

Martii commented Jun 27, 2018

UziTech commented Jun 27, 2018

davisjam commented Jun 27, 2018

kephas commented Jun 29, 2018

UziTech commented Jun 29, 2018 • edited Loading

Martii commented Jun 29, 2018 • edited Loading

davisjam commented Jun 30, 2018

mccraveiro commented Dec 20, 2018

Feder1co5oave commented Jan 4, 2019

kephas commented May 22, 2017 •

edited by styfle

Loading

UziTech left a comment •

edited

Loading

UziTech commented Jun 29, 2018 •

edited

Loading

Martii commented Jun 29, 2018 •

edited

Loading