Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix the mysterious case of spaces being removed
This was a rather hard problem to solve, so brace yourself as this message might be complex. We were having cases of HTML pasted from browsers adding span elements in for spaces. So we could end up with HTML like: `Some text<span> </span><a href="">Link</a>` where, depending on the environment, the contents of the span could be `<span> </span>` or `<span> </span>`. In browsers it was the former, whereas we experienced the latter on JSDOM. This causes some poor behaviour in turndown where the presence of a nbsp; character (which has an ASCII code of 160) causes the whole space to be stripped out, resulting in `Some text[Link]()` as output. On the other hand the presence of a normal space (ASCII code 32) causes a different problem of two spaces, resulting in output such as `Some text [Link]()`. This problem is due to Turndowns whitespace rules where they only match a normal space character: https://github.com/domchristie/turndown/blob/80297cebeae4b35c8d299b1741b383c74eddc7c1/src/node.js#L25-L33 With our change to blankReplacement we can fix the case for (which is the more common on we're witnessing) however we can't easily fix the double space issue. We have left tests for both cases to help any future developers that may venture into this hole. We've raised a PR with turndown to fix both of these issues, mixmark-io/turndown#281, so hopefully in the non-too-distant future we can remove this code.
- Loading branch information