Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add support for localhost urls in markdown #535

Merged
merged 3 commits into from
May 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions __tests__/ExpensiMark-HTML-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -159,8 +159,12 @@ test('Test markdown replacement for emails wrapped in bold/strikethrough/italic

// Markdown style links replaced successfully
test('Test markdown style links', () => {
const testString = 'Go to [Expensify](https://www.expensify.com) to learn more. [Expensify](www.expensify.com) [Expensify](expensify.com) [It\'s really the coolest](expensify.com) [`Some` Special cases - + . = , \'](expensify.com/some?query=par|am)';
const resultString = 'Go to <a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">Expensify</a> to learn more. <a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">Expensify</a> <a href="https://expensify.com" target="_blank" rel="noreferrer noopener">Expensify</a> <a href="https://expensify.com" target="_blank" rel="noreferrer noopener">It&#x27;s really the coolest</a> <a href="https://expensify.com/some?query=par|am" target="_blank" rel="noreferrer noopener"><code>Some</code> Special cases - + . = , &#x27;</a>';
let testString = 'Go to [Expensify](https://www.expensify.com) to learn more. [Expensify](www.expensify.com) [Expensify](expensify.com) [It\'s really the coolest](expensify.com) [`Some` Special cases - + . = , \'](expensify.com/some?query=par|am)';
let resultString = 'Go to <a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">Expensify</a> to learn more. <a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">Expensify</a> <a href="https://expensify.com" target="_blank" rel="noreferrer noopener">Expensify</a> <a href="https://expensify.com" target="_blank" rel="noreferrer noopener">It&#x27;s really the coolest</a> <a href="https://expensify.com/some?query=par|am" target="_blank" rel="noreferrer noopener"><code>Some</code> Special cases - + . = , &#x27;</a>';
expect(parser.replace(testString)).toBe(resultString);

testString = 'Go to [Localhost](http://localhost:3000) to learn more. [Expensify](www.expensify.com) [Expensify](expensify.com) [It\'s really the coolest](expensify.com) [`Some` Special cases - + . = , \'](expensify.com/some?query=par|am)';
resultString = 'Go to <a href="http://localhost:3000" target="_blank" rel="noreferrer noopener">Localhost</a> to learn more. <a href="https://www.expensify.com" target="_blank" rel="noreferrer noopener">Expensify</a> <a href="https://expensify.com" target="_blank" rel="noreferrer noopener">Expensify</a> <a href="https://expensify.com" target="_blank" rel="noreferrer noopener">It&#x27;s really the coolest</a> <a href="https://expensify.com/some?query=par|am" target="_blank" rel="noreferrer noopener"><code>Some</code> Special cases - + . = , &#x27;</a>';
expect(parser.replace(testString)).toBe(resultString);
});

Expand All @@ -181,7 +185,8 @@ test('Test critical markdown style links', () => {
+ '[link with an @ in it](https://google.com) '
+ '[link with [brackets] inside of it](https://google.com) '
+ '[link with smart quotes ‘’“”](https://google.com) '
+ '[link with someone@expensify.com email in it](https://google.com)';
+ '[link with someone@expensify.com email in it](https://google.com)'
+ '[Localhost](http://a:3030)';
const resultString = 'Testing '
+ '<a href="https://expensify.com" target="_blank" rel="noreferrer noopener"><del>strikethrough</del> <strong>bold</strong> <em>italic</em></a> '
+ '<a href="https://expensify.com" target="_blank" rel="noreferrer noopener"><del>strikethrough</del> <strong>bold</strong> <em>italic</em> test.com</a> '
Expand All @@ -197,7 +202,8 @@ test('Test critical markdown style links', () => {
+ '<a href="https://google.com" target="_blank" rel="noreferrer noopener">link with an @ in it</a> '
+ '<a href="https://google.com" target="_blank" rel="noreferrer noopener">link with [brackets] inside of it</a> '
+ '<a href="https://google.com" target="_blank" rel="noreferrer noopener">link with smart quotes ‘’“”</a> '
+ '<a href="https://google.com" target="_blank" rel="noreferrer noopener">link with someone@expensify.com email in it</a>';
+ '<a href="https://google.com" target="_blank" rel="noreferrer noopener">link with someone@expensify.com email in it</a>'
+ '<a href="http://a:3030" target="_blank" rel="noreferrer noopener">Localhost</a>';
expect(parser.replace(testString)).toBe(resultString);
});

Expand Down
15 changes: 11 additions & 4 deletions __tests__/ExpensiMark-Markdown-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -590,8 +590,11 @@ test('Test html to heading1 markdown when there are adjacent h1 tags in the line
});

test('Test link with multiline text do not loses markdown', () => {
const testString = '<a href="https://google.com/">multiline\ntext</a>';
const resultString = '[multiline\ntext](https://google.com/)';
let testString = '<a href="https://google.com/">multiline\ntext</a>';
let resultString = '[multiline\ntext](https://google.com/)';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
testString = '<a href="http://localhost:3000/">multiline\ntext</a>';
resultString = '[multiline\ntext](http://localhost:3000/)';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});

Expand All @@ -614,8 +617,12 @@ test('Map html list item to newline with two prefix spaces', () => {
});

test('Test list item replacement when there is an anchor tag inside <li> tag', () => {
const testString = '<ul><li><a href="https://example.com">Coffee</a> -- Coffee</li><li><a href="https://example.com">Tea</a> -- Tea</li><li><a href="https://example.com">Milk</a> -- Milk</li></ul>';
const resultString = ' [Coffee](https://example.com) -- Coffee\n [Tea](https://example.com) -- Tea\n [Milk](https://example.com) -- Milk';
let testString = '<ul><li><a href="https://example.com">Coffee</a> -- Coffee</li><li><a href="https://example.com">Tea</a> -- Tea</li><li><a href="https://example.com">Milk</a> -- Milk</li></ul>';
let resultString = ' [Coffee](https://example.com) -- Coffee\n [Tea](https://example.com) -- Tea\n [Milk](https://example.com) -- Milk';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);

testString = '<ul><li><a href="http://localhost">Coffee</a> -- Coffee</li><li><a href="http://localhost/">Tea</a> -- Tea</li><li><a href="http://localhost/">Milk</a> -- Milk</li></ul>';
resultString = ' [Coffee](http://localhost) -- Coffee\n [Tea](http://localhost/) -- Tea\n [Milk](http://localhost/) -- Milk';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});

Expand Down
80 changes: 50 additions & 30 deletions __tests__/URL-test.js
Original file line number Diff line number Diff line change
@@ -1,39 +1,59 @@
import {URL_REGEX_WITH_REQUIRED_PROTOCOL, URL_REGEX} from '../lib/Url';
import {URL_REGEX_WITH_REQUIRED_PROTOCOL, URL_REGEX, LOOSE_URL_REGEX} from '../lib/Url';

describe('Mandatory protocol for URL', () => {
it('correctly tests valid urls', () => {
const regexToTest = new RegExp(`^${URL_REGEX_WITH_REQUIRED_PROTOCOL}$`, 'i');
expect(regexToTest.test('https://google.com/')).toBeTruthy();
expect(regexToTest.test('http://google.com/')).toBeTruthy();
expect(regexToTest.test('ftp://google.com/')).toBeTruthy();
expect(regexToTest.test('https://we.are.expensify.com/how-we-got-here')).toBeTruthy();
expect(regexToTest.test('https://google.com:12')).toBeTruthy();
expect(regexToTest.test('https://google.com:65535')).toBeTruthy();
expect(regexToTest.test('https://google.com:65535/path/my')).toBeTruthy();
describe('Strict URL validation', () => {
describe('Mandatory protocol for URL', () => {
it('correctly tests valid urls', () => {
const regexToTest = new RegExp(`^${URL_REGEX_WITH_REQUIRED_PROTOCOL}$`, 'i');
expect(regexToTest.test('https://google.com/')).toBeTruthy();
expect(regexToTest.test('http://google.com/')).toBeTruthy();
expect(regexToTest.test('ftp://google.com/')).toBeTruthy();
expect(regexToTest.test('https://we.are.expensify.com/how-we-got-here')).toBeTruthy();
expect(regexToTest.test('https://google.com:12')).toBeTruthy();
expect(regexToTest.test('https://google.com:65535')).toBeTruthy();
expect(regexToTest.test('https://google.com:65535/path/my')).toBeTruthy();
});
it('correctly tests invalid urls', () => {
const regexToTest = new RegExp(`^${URL_REGEX_WITH_REQUIRED_PROTOCOL}$`, 'i');
expect(regexToTest.test('google.com')).toBeFalsy();
expect(regexToTest.test('https://google.com:02')).toBeFalsy();
expect(regexToTest.test('https://google.com:65536')).toBeFalsy();
expect(regexToTest.test('smtp://google.com')).toBeFalsy();
});
});
it('correctly tests invalid urls', () => {
const regexToTest = new RegExp(`^${URL_REGEX_WITH_REQUIRED_PROTOCOL}$`, 'i');
expect(regexToTest.test('google.com')).toBeFalsy();
expect(regexToTest.test('https://google.com:02')).toBeFalsy();
expect(regexToTest.test('https://google.com:65536')).toBeFalsy();
expect(regexToTest.test('smtp://google.com')).toBeFalsy();

describe('Optional protocol for URL', () => {
it('correctly tests valid urls', () => {
const regexToTest = new RegExp(`^${URL_REGEX}$`, 'i');
expect(regexToTest.test('google.com/')).toBeTruthy();
expect(regexToTest.test('https://google.com/')).toBeTruthy();
expect(regexToTest.test('ftp://google.com/')).toBeTruthy();
expect(regexToTest.test('we.are.expensify.com/how-we-got-here')).toBeTruthy();
expect(regexToTest.test('google.com:12')).toBeTruthy();
expect(regexToTest.test('google.com:65535')).toBeTruthy();
expect(regexToTest.test('google.com:65535/path/my')).toBeTruthy();
});
it('correctly tests invalid urls', () => {
const regexToTest = new RegExp(`^${URL_REGEX}$`, 'i');
expect(regexToTest.test('google.com:02')).toBeFalsy();
expect(regexToTest.test('google.com:65536')).toBeFalsy();
});
});
});

describe('Optional protocol for URL', () => {
it('correctly tests valid urls', () => {
const regexToTest = new RegExp(`^${URL_REGEX}$`, 'i');
expect(regexToTest.test('google.com/')).toBeTruthy();
expect(regexToTest.test('https://google.com/')).toBeTruthy();
expect(regexToTest.test('ftp://google.com/')).toBeTruthy();
expect(regexToTest.test('we.are.expensify.com/how-we-got-here')).toBeTruthy();
expect(regexToTest.test('google.com:12')).toBeTruthy();
expect(regexToTest.test('google.com:65535')).toBeTruthy();
expect(regexToTest.test('google.com:65535/path/my')).toBeTruthy();
describe('Loose URL validation', () => {
it('correctly tests urls that can be valid on local machine', () => {
const regexToTest = new RegExp(`^${LOOSE_URL_REGEX}$`, 'i');
expect(regexToTest.test('http://localhost:3000')).toBeTruthy();
expect(regexToTest.test('https://local.url')).toBeTruthy();
expect(regexToTest.test('http://a.b')).toBeTruthy();
expect(regexToTest.test('http://expensify')).toBeTruthy();
expect(regexToTest.test('http://google.com/abcd')).toBeTruthy();
});

it('correctly tests invalid urls', () => {
const regexToTest = new RegExp(`^${URL_REGEX}$`, 'i');
expect(regexToTest.test('google.com:02')).toBeFalsy();
expect(regexToTest.test('google.com:65536')).toBeFalsy();
const regexToTest = new RegExp(`^${LOOSE_URL_REGEX}$`, 'i');
expect(regexToTest.test('localhost:3000')).toBeFalsy();
expect(regexToTest.test('local.url')).toBeFalsy();
expect(regexToTest.test('https://otherexample.com links get rendered first')).toBeFalsy();
});
});
8 changes: 4 additions & 4 deletions lib/ExpensiMark.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import _ from 'underscore';
import Str from './str';
import {URL_REGEX} from './Url';
import {MARKDOWN_URL_REGEX, LOOSE_URL_REGEX, URL_REGEX} from './Url';
import {CONST} from './CONST';

const SLACK_SPAN_NEW_LINE_TAG = '<span class="c-mrkdwn__br" data-stringify-type="paragraph-break" style="box-sizing: inherit; display: block; height: unset;"></span>';
Expand Down Expand Up @@ -107,7 +107,7 @@ export default class ExpensiMark {
name: 'link',
process: (textToProcess, replacement) => {
const regex = new RegExp(
`\\[([^\\][]*(?:\\[[^\\][]*][^\\][]*)*)]\\(${URL_REGEX}\\)(?![^<]*(<\\/pre>|<\\/code>))`,
`\\[([^\\][]*(?:\\[[^\\][]*][^\\][]*)*)]\\(${MARKDOWN_URL_REGEX}\\)(?![^<]*(<\\/pre>|<\\/code>))`,
'gi',
);
return this.modifyTextForUrlLinks(regex, textToProcess, replacement);
Expand Down Expand Up @@ -144,7 +144,7 @@ export default class ExpensiMark {

process: (textToProcess, replacement) => {
const regex = new RegExp(
`(?![^<]*>|[^<>]*<\\/)([_*~]*?)${URL_REGEX}\\1(?!((?:(?!<a).)+)?<\\/a>|[^<]*(<\\/pre>|<\\/code>))`,
`(?![^<]*>|[^<>]*<\\/)([_*~]*?)${MARKDOWN_URL_REGEX}\\1(?!((?:(?!<a).)+)?<\\/a>|[^<]*(<\\/pre>|<\\/code>))`,
'gi',
);
return this.modifyTextForUrlLinks(regex, textToProcess, replacement);
Expand Down Expand Up @@ -448,7 +448,7 @@ export default class ExpensiMark {
if (abort) {
replacedText = replacedText.concat(textToCheck.substr(match.index, (match[0].length)));
} else {
const urlRegex = new RegExp(`^${URL_REGEX}$`, 'i');
const urlRegex = new RegExp(`^${LOOSE_URL_REGEX}$|^${URL_REGEX}$`, 'i');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use MARKDOWN_URL_REGEX here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Santhosh-Sellavel We can't use it because ^${LOOSE_URL_REGEX}$|^${URL_REGEX}$ is different from ^${LOOSE_URL_REGEX}|${URL_REGEX}$. We want it to EITHER start and end with loose url regex OR start and end with URL_REGEX.


// `match[1]` contains the text inside the [] of the markdown e.g. [example](https://example.com)
// At the entry of function this.replace, text is already escaped due to the rules that precede the link
Expand Down
9 changes: 9 additions & 0 deletions lib/Url.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ const URL_REGEX = `((${URL_WEBSITE_REGEX})${URL_PATH_REGEX}(?:${URL_PARAM_REGEX}

const URL_REGEX_WITH_REQUIRED_PROTOCOL = URL_REGEX.replace(`${URL_PROTOCOL_REGEX}?`, URL_PROTOCOL_REGEX);

const LOOSE_URL_WEBSITE_REGEX = `${URL_PROTOCOL_REGEX}([-\\w]+(\\.[-\\w]+)*)(?:\\:${ALLOWED_PORTS}|\\b|(?=_))`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this chunk of the regex please? How did you come up with it and what is it supposed to do?

([-\\w]+(\\.[-\\w]+)*)

My understanding is that it should match the domain name, however it isn't explained that way on regex101. Maybe I'm missing something...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neil-marcellini You need to replace the \\ with \ when testing it on regex101. The mentioned regex matches any word character along with a hyphen followed by an option . and any word character (hyphens included) zero or more times. This is added to cover domain names on local machines. These could be: localhost, a.b.c etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it thanks!

You need to replace the \\ with \ when testing it on regex101

Can you explain why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure but I think it might be due to the fact that we're having the regex in string form in our code and then we're converting it using new Regex.

const LOOSE_URL_REGEX = `((${LOOSE_URL_WEBSITE_REGEX})${URL_PATH_REGEX}(?:${URL_PARAM_REGEX}|${URL_FRAGMENT_REGEX})*)`;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We allowed the URL to start by hyphen which is not allowed according to the RFC1034. This caused Expensify/App#27080



const MARKDOWN_URL_REGEX = `(${LOOSE_URL_REGEX}|${URL_REGEX})`;

export {
URL_WEBSITE_REGEX,
URL_PATH_REGEX,
Expand All @@ -19,4 +25,7 @@ export {
URL_REGEX,
URL_REGEX_WITH_REQUIRED_PROTOCOL,
URL_PROTOCOL_REGEX,
LOOSE_URL_REGEX,
LOOSE_URL_WEBSITE_REGEX,
MARKDOWN_URL_REGEX,
};