Incorrect parsing behaviour of inline script content: strings containing tag opening (/ closing?) characters #45

dannya · 2020-05-24T19:05:12Z

Take the following inline style block:

<script>
    var str = 'hey <form';

    if (!str.match(new RegExp('<(form|iframe)', 'g'))) {
        // ...
    }
</script>

... an array of 3 content strings is parsed from this script content, but I would expect this to be a single parsed content string, since any tag opening characters are within strings inside the inline script block.

Here is a complete minimal example:
(I'm using htmlnano, but I traced the behaviour to the posthtml-parser dependency of htmlnano)

const htmlnano = require('htmlnano');

return htmlnano
  .process(
    `<!DOCTYPE html>
      <html>
        <head>
          <title>Test</title>
        </head>

        <body>
          <script>
            var str = 'hey <form';

            if (!str.match(new RegExp('<(form|iframe)', 'g'))) {
              // ...
            }
          </script>
        </body>
      </html>`,
    {
      custom: [
        (tree, options) => {
          tree.match({ tag: 'script' }, (node) => {
            // node is passed in via the tree parsed by posthtml-parser

            console.log(node.content);

            // console.log output:
            // [ '\n            var str = \'hey ',
            //   '<form\';\n\n            if (!str.match(new RegExp(\'',
            //   '<(form|iframe|meta|frameset|script|link|object|embed)\', \'g\'))) {\n              //\n            }\n          ' ]

            // an array of 3 content strings is parsed, but I would 
            // expect this to be a single parsed content string, 
            // since any tag opening characters are within strings 
            // inside the inline script block

            return node;
          });

          return tree;
        },
      ]
    },
  )
  .then((result) => {
    // ...
  });

(A similar kind of issue as seen in #18)

The text was updated successfully, but these errors were encountered:

anikethsaha · 2020-05-24T19:53:15Z

It does looks like a bug with posthtml-parser as htmlparser5 seems to be working as expected here.

SukkaW · 2020-10-14T10:30:20Z

It does looks like a bug with posthtml-parser as htmlparser2 seems to be working as expected here.

No, htmlparser2 is working that way:

https://runkit.com/sukkaw/5f86d1ee0f32d6001a5d75c0

        const script = `<script>
            var str = 'hey <form';

            if (!str.match(new RegExp('<(form|iframe)', 'g'))) {
                // ...
            }
        </script>`;
        
const htmlparser2 = require("htmlparser2@3.9.2");
const parser = new htmlparser2.Parser({
    onopentag(name, attribs) {
        if (name === "script" && attribs.type === "text/javascript") {
            console.log("JS! Hooray!");
        }
    },
    ontext(text) {
        console.log("-->", text);
    },
    onclosetag(tagname) {
        if (tagname === "script") {
            console.log("That's it?!");
        }
    },
});
parser.write(
    script
);
parser.end();

"-->"
"\n            var str = 'hey "
"-->"
"<form';\n\n            if (!str.match(new RegExp('"
"-->"
"<(form|iframe)', 'g'))) {\n                // ...\n            }\n        "
"That's it?!"

anikethsaha added the type: bug label May 24, 2020

Scrum added scope: htmlparser2 status: blocked labels May 25, 2020

Scrum added this to the 0.5.1 milestone Oct 27, 2020

Scrum added a commit that referenced this issue Oct 27, 2020

test: contents are split with '<' in comment, issue #45

74169cd

Scrum closed this as completed in 8e64082 Oct 27, 2020

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect parsing behaviour of inline script content: strings containing tag opening (/ closing?) characters #45

Incorrect parsing behaviour of inline script content: strings containing tag opening (/ closing?) characters #45

dannya commented May 24, 2020

anikethsaha commented May 24, 2020

SukkaW commented Oct 14, 2020

Incorrect parsing behaviour of inline script content: strings containing tag opening (/ closing?) characters #45

Incorrect parsing behaviour of inline script content: strings containing tag opening (/ closing?) characters #45

Comments

dannya commented May 24, 2020

anikethsaha commented May 24, 2020

SukkaW commented Oct 14, 2020