Skip to content

Commit

Permalink
Consistently handle inline elements with spaces
Browse files Browse the repository at this point in the history
This resolves some odd situations that can occur when there are inline
elements that contain spaces in sentences.

The first situation is when there is an element that includes a space
between words, for example 'Test<span> </span>content'. This would
previously have produced a two space result: 'Test  content' because
this element would have matched both leading and trailing whitespace
tests.

The second situation is when there is an element that includes a space
outside the tests, which is the case of a non-breaking space character
(unicode U+00A0), then the space is removed. An example of this is
'Test<span>&nbsp;</span>content' which would result in 'Testcontent' as
this wouldn't match the tests for leading/trailing whitespace.

This resolves these problems by changing the whitespace tests to use \s
rather than a subset of space characters (which is consistent with the
blank test [1]) and only allows a leading space if the test for both
leading and trailing whitespace passes on a blank element.

[1]: https://github.com/domchristie/turndown/blob/80297cebeae4b35c8d299b1741b383c74eddc7c1/src/node.js#L14
  • Loading branch information
kevindew committed Apr 3, 2019
1 parent 80297ce commit b73068c
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 3 deletions.
8 changes: 5 additions & 3 deletions src/node.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,15 @@ function flankingWhitespace (node) {
var trailing = ''

if (!node.isBlock) {
var hasLeading = /^[ \r\n\t]/.test(node.textContent)
var hasTrailing = /[ \r\n\t]$/.test(node.textContent)
var hasLeading = /^\s/.test(node.textContent)
var hasTrailing = /\s$/.test(node.textContent)
var blankWithSpaces = node.isBlank && hasLeading && hasTrailing

if (hasLeading && !isFlankedByWhitespace('left', node)) {
leading = ' '
}
if (hasTrailing && !isFlankedByWhitespace('right', node)) {

if (!blankWithSpaces && hasTrailing && !isFlankedByWhitespace('right', node)) {
trailing = ' '
}
}
Expand Down
14 changes: 14 additions & 0 deletions test/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -888,6 +888,20 @@ <h2>This is a header.</h2>
<pre class="expected">![](http://example.com/logo.png)</pre>
</div>

<div class="case" data-name="text separated by a space in an element">
<div class="input">
<p>Foo<span> </span>Bar</p>
</div>
<pre class="expected">Foo Bar</pre>
</div>

<div class="case" data-name="text separated by a non-breaking space in an element">
<div class="input">
<p>Foo<span>&nbsp;</span>Bar</p>
</div>
<pre class="expected">Foo Bar</pre>
</div>

<!-- /TEST CASES -->

<script src="turndown-test.browser.js"></script>
Expand Down

0 comments on commit b73068c

Please sign in to comment.