Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

escaping double asterisks and underscores #233

Closed
yareeba opened this issue Jun 8, 2018 · 8 comments
Closed

escaping double asterisks and underscores #233

yareeba opened this issue Jun 8, 2018 · 8 comments

Comments

@yareeba
Copy link
Contributor

yareeba commented Jun 8, 2018

Hello,

I'm using your library to convert HTML to markdown. I am running into problems whenever there are double asterisks (**) next to/within <strong> tags or when text within <em> tags includes an underscore (_). For example, when converting:

<p>This is <strong>**bold</strong> text and this is <em>italic_text</em></p> 

the markdown returned is:

This is ****bold** text and this is _italic_text_

Is there any way to escape these characters during the conversion?

So the markdown returned is more like:

This is **\*\*bold** text and this is _italic\_text_ 

Thanks

@domchristie
Copy link
Collaborator

Thanks for this. I think it's related to #220 (escaping is hard!). I will mark this as a bug. Thanks again.

@olih
Copy link

olih commented Jun 21, 2018

Hi,

I work on the same team as Areeba, and eventually we decided to use an escaping that would be quite aggressive as the markdown does not have to be really user friendly.

We realised that we could overload the escape method in the prototype:

const turndown = new Turndown({ headingStyle: 'atx' });
turndown.escape = escapeMarkdown;

In this case, we have written a custom escaping method that takes care of most special characters in markdown:

const markdownReplacements = [
  [/\*/g, '\\*'],
  [/_/g, '\\_'],
  [/-/g, '\\-'],
  [/\+/g, '\\+'],
  [/=/g, '\\='],
  [/#/g, '\\#'],
  [/`/g, '\\`'],
  [/~/g, '\\~'],
  [/&/g, '&amp;'],
  [/\|/g, '\\|'],
  [/\(/g, '\\('],
  [/\)/g, '\\)'],
  [/\[/g, '\\['],
  [/\]/g, '\\]'],
  [/</g, '&lt;'],
  [/>/g, '&gt;'],
  [/(\d+)\./g, '$1\\.'],
];

const escapeMarkdown = text =>
  markdownReplacements.reduce(
    (search, replacement) => search.replace(replacement[0], replacement[1]),
    text,
  );

I hope this helps.
Hopefully, overriding the escape methods will still be possible in the future ...

Best,
Olivier, @Paul-Ladyman

@domchristie
Copy link
Collaborator

Thanks @olih

I think the "aggressive" approach to escaping by default is probably the right way to go, with the option of overriding the behaviour using the public escape method. If you fancy putting a pull request together that'd be great, otherwise I will get something up as soon as I can.

Thanks again!

@olih
Copy link

olih commented Jun 26, 2018

Hi @domchristie

We are happy to give the pull request a try next week if that works for you.

Best

@yareeba
Copy link
Contributor Author

yareeba commented Jun 28, 2018

Hi @domchristie,

We have been working on putting a pull request together. Could you grant me write access to the repository?

Thanks

@domchristie
Copy link
Collaborator

Are you able to fork to repo (to ayusaf1992/turndown)?

@yareeba
Copy link
Contributor Author

yareeba commented Jun 28, 2018

done, thanks @domchristie

@domchristie
Copy link
Collaborator

Closed by #242

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants