Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callback Support (alternative implementation) #908

Open
wants to merge 68 commits into
base: gh-pages
Choose a base branch
from
Open

Callback Support (alternative implementation) #908

wants to merge 68 commits into from

Conversation

RebeccaStevens
Copy link

Closes: #907
Fixes: #906

Adds callback support to minify, minifyCSS and minifyJS functions.
Does not break any existing functionality.

Example Use:

var minify = require('html-minifier').minify;
var html = '<html><head><style>body { margin: 0; display: flex; }</style></head></html>';

var config = {
  minifyCSS: function(css, cb) {
    doSomethingAsync(css, function(updatedCss) {
      cb(updatedCss);
    });
  },
  minifyJS: function(js, inline, cb) {
    doSomethingElseAsync(js, function(updatedJs) {
      cb(updatedJs);
    });
  }
};

minify(html, config, function(error, result) {
  if (error) {
    // Handle error.
  } else {
    // Do something with the results.
  }
});

@alexlamsl
Copy link
Collaborator

We may be able to simplify the logic if we restrict use of cb usage of minify{CS,J}S if and only if minify(..., cb) is used. That way we don't need to rely on undefined as return value to switch to asynchronous paths, and we don't need to make extraneous calls to getAsyncTaskCallback() during synchronous mode.

* @param {AsyncTextPlaceholder} placeholder
* @returns {Function}
*/
function getAsyncTaskCallback(placeholder) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move var runningAsyncTask = true; into here and remove the two references above to avoid allocation in the synchronous case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

if (isExecutableScript(currentTag, currentAttrs)) {
var minifyJSPlaceholder = new AsyncTextPlaceholder();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever use of buffer - unlike custom fragments, scripts and style sheets are guaranteed to take up the whole text 👍

May I suggest storing the index, i.e. buffer.length then push null or whatever to it, instead of introducing a new class?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@alexlamsl
Copy link
Collaborator

Have we handled these cases of minify{CS,J}S() for the asynchronous mode?

function cleanAttributeValue(tag, attrName, attrValue, options, attrs) {
if (attrValue && isEventAttribute(attrName, options)) {
attrValue = trimWhitespace(attrValue).replace(/^javascript:\s*/i, '');
return options.minifyJS(attrValue, true);
}

else if (attrName === 'style') {
attrValue = trimWhitespace(attrValue);
if (attrValue) {
if (/;$/.test(attrValue) && !/&#?[0-9a-zA-Z]+;$/.test(attrValue)) {
attrValue = attrValue.replace(/\s*;$/, ';');
}
attrValue = unwrapInlineCSS(options.minifyCSS(wrapInlineCSS(attrValue)));
}
return attrValue;
}

else if (isMediaQuery(tag, attrs, attrName)) {
attrValue = trimWhitespace(attrValue);
return unwrapMediaQuery(options.minifyCSS(wrapMediaQuery(attrValue)));
}

They deal with the miinification of attribute values, e.g. <div style="display: none;">

@alexlamsl
Copy link
Collaborator

Thanks for addressing #908 (comment) & #908 (comment) - here are the test cases for #908 (comment):

// test.js
var minify = require('.').minify;
[
  // covered by this PR
  '<script>alert(1 + 2);</script>',
  '<style>p{display: none;}</style>',
  // TODOs
  '<p onclick="alert(1 + 2);">',
  '<p style="display: none;">',
  '<style media="{max-width: 100px}"></style>'
].forEach(function(input) {
  var expected = minify(input, {
    customEventAttributes: [/^on*/],
    minifyCSS: function() {
      return 'CSS';
    },
    minifyJS: function() {
      return 'JS';
    }
  });
  minify(input, {
    customEventAttributes: [/^on*/],
    minifyCSS: function(text, cb) {
      cb('CSS');
    },
    minifyJS: function(text, inline, cb) {
      cb('JS');
    }
  }, function(err, actual) {
    if (err) {
      console.error(input);
      console.error(err);
      console.error();
    }
    else if (expected !== actual) {
      console.error('Mismatch:', input);
      console.error('Sync: ', expected);
      console.error('Async:', actual);
      console.error();
    }
  });
});

With the current PR:

$ cat test.js | node
<p onclick="alert(1 + 2);">
TypeError: cb is not a function
    at Object.minifyJS ([stdin]:26:7)
    at cleanAttributeValue

<p style="display: none;">
TypeError: cb is not a function
    at Object.minifyCSS ([stdin]:23:7)
    at cleanAttributeValue

<style media="{max-width: 100px}"></style>
TypeError: cb is not a function
    at Object.minifyCSS ([stdin]:23:7)
    at cleanAttributeValue

@alexlamsl
Copy link
Collaborator

alexlamsl commented Apr 13, 2018

Here's a quick hack (based on current gh-pages) that passes all the tests in this PR:

`src/htmlminifier.js`
--- a/src/htmlminifier.js
+++ b/src/htmlminifier.js
@@ -285,7 +285,7 @@ function cleanAttributeValue(tag, attrName, attrValue, options, attrs) {
       if (/;$/.test(attrValue) && !/&#?[0-9a-zA-Z]+;$/.test(attrValue)) {
         attrValue = attrValue.replace(/\s*;$/, ';');
       }
-      attrValue = unwrapInlineCSS(options.minifyCSS(wrapInlineCSS(attrValue)));
+      attrValue = options.minifyCSS(attrValue, 'inline');
     }
     return attrValue;
   }
@@ -322,7 +322,7 @@ function cleanAttributeValue(tag, attrName, attrValue, options, attrs) {
   }
   else if (isMediaQuery(tag, attrs, attrName)) {
     attrValue = trimWhitespace(attrValue);
-    return unwrapMediaQuery(options.minifyCSS(wrapMediaQuery(attrValue)));
+    return options.minifyCSS(attrValue, 'media');
   }
   return attrValue;
 }
@@ -613,93 +613,100 @@ function identity(value) {
   return value;
 }

-function processOptions(options) {
-  ['html5', 'includeAutoGeneratedTags'].forEach(function(key) {
-    if (!(key in options)) {
-      options[key] = true;
+function processOptions(values, async) {
+  var asyncWrap = async ? function(fn) {
+    return function(text, type, callback) {
+      callback(fn(text, type));
+    };
+  } : identity;
+  var options = {
+    canCollapseWhitespace: canCollapseWhitespace,
+    canTrimWhitespace: canTrimWhitespace,
+    html5: true,
+    ignoreCustomComments: [/^!/],
+    ignoreCustomFragments: [
+      /<%[\s\S]*?%>/,
+      /<\?[\s\S]*?\?>/
+    ],
+    includeAutoGeneratedTags: true,
+    log: identity,
+    minifyCSS: asyncWrap(identity),
+    minifyJS: asyncWrap(identity),
+    minifyURLs: identity
+  };
+  Object.keys(values).forEach(function(key) {
+    var value = values[key];
+    if (key === 'log') {
+      if (typeof value === 'function') {
+        options.log = value;
       }
-  });
-
-  if (typeof options.log !== 'function') {
-    options.log = identity;
     }
-
-  if (!options.canCollapseWhitespace) {
-    options.canCollapseWhitespace = canCollapseWhitespace;
+    else if (key === 'minifyCSS' && typeof value !== 'function') {
+      if (!value) {
+        return;
       }
-  if (!options.canTrimWhitespace) {
-    options.canTrimWhitespace = canTrimWhitespace;
+      if (typeof value !== 'object') {
+        value = {};
       }
-
-  if (!('ignoreCustomComments' in options)) {
-    options.ignoreCustomComments = [/^!/];
+      options.minifyCSS = asyncWrap(function(text, type) {
+        text = text.replace(/(url\s*\(\s*)("|'|)(.*?)\2(\s*\))/ig, function(match, prefix, quote, url, suffix) {
+          return prefix + quote + options.minifyURLs(url) + quote + suffix;
+        });
+        try {
+          if (type === 'inline') {
+            text = wrapInlineCSS(text);
           }
-
-  if (!('ignoreCustomFragments' in options)) {
-    options.ignoreCustomFragments = [
-      /<%[\s\S]*?%>/,
-      /<\?[\s\S]*?\?>/
-    ];
+          else if (type === 'media') {
+            text = wrapMediaQuery(text);
           }
-
-  if (!options.minifyURLs) {
-    options.minifyURLs = identity;
+          text = new CleanCSS(value).minify(text).styles;
+          if (type === 'inline') {
+            text = unwrapInlineCSS(text);
           }
-  if (typeof options.minifyURLs !== 'function') {
-    var minifyURLs = options.minifyURLs;
-    if (typeof minifyURLs === 'string') {
-      minifyURLs = { site: minifyURLs };
+          else if (type === 'media') {
+            text = unwrapMediaQuery(text);
           }
-    else if (typeof minifyURLs !== 'object') {
-      minifyURLs = {};
-    }
-    options.minifyURLs = function(text) {
-      try {
-        return RelateUrl.relate(text, minifyURLs);
+          return text;
         }
         catch (err) {
           options.log(err);
           return text;
         }
-    };
+      });
     }
-
-  if (!options.minifyJS) {
-    options.minifyJS = identity;
+    else if (key === 'minifyJS' && typeof value !== 'function') {
+      if (!value) {
+        return;
       }
-  if (typeof options.minifyJS !== 'function') {
-    var minifyJS = options.minifyJS;
-    if (typeof minifyJS !== 'object') {
-      minifyJS = {};
+      if (typeof value !== 'object') {
+        value = {};
       }
-    (minifyJS.parse || (minifyJS.parse = {})).bare_returns = false;
-    options.minifyJS = function(text, inline) {
+      (value.parse || (value.parse = {})).bare_returns = false;
+      options.minifyJS = asyncWrap(function(text, inline) {
         var start = text.match(/^\s*<!--.*/);
         var code = start ? text.slice(start[0].length).replace(/\n\s*-->\s*$/, '') : text;
-      minifyJS.parse.bare_returns = inline;
-      var result = UglifyJS.minify(code, minifyJS);
+        value.parse.bare_returns = inline;
+        var result = UglifyJS.minify(code, value);
         if (result.error) {
           options.log(result.error);
           return text;
         }
         return result.code.replace(/;$/, '');
-    };
+      });
     }
-
-  if (!options.minifyCSS) {
-    options.minifyCSS = identity;
+    else if (key === 'minifyURLs' && typeof value !== 'function') {
+      if (!value) {
+        return;
       }
-  if (typeof options.minifyCSS !== 'function') {
-    var minifyCSS = options.minifyCSS;
-    if (typeof minifyCSS !== 'object') {
-      minifyCSS = {};
+      if (typeof value === 'string') {
+        value = { site: value };
       }
-    options.minifyCSS = function(text) {
-      text = text.replace(/(url\s*\(\s*)("|'|)(.*?)\2(\s*\))/ig, function(match, prefix, quote, url, suffix) {
-        return prefix + quote + options.minifyURLs(url) + quote + suffix;
-      });
+      else if (typeof value !== 'object') {
+        value = {};
+      }
+      options.minifyURLs = function(text) {
         try {
-        return new CleanCSS(minifyCSS).minify(text).styles;
+          return RelateUrl.relate(text, value);
         }
         catch (err) {
           options.log(err);
@@ -707,6 +714,11 @@ function processOptions(options) {
         }
       };
     }
+    else {
+      options[key] = value;
+    }
+  });
+  return options;
 }

 function uniqueId(value) {
@@ -803,10 +815,9 @@ function createSortFns(value, options, uidIgnore, uidAttr) {
   }
 }

-function minify(value, options, partialMarkup) {
-  options = options || {};
+function minify(value, options, partialMarkup, callback) {
+  options = processOptions(options || {}, callback);
   var optionsStack = [];
-  processOptions(options);
   if (options.collapseWhitespace) {
     value = collapseWhitespace(value, options, true, true);
   }
@@ -824,6 +835,8 @@ function minify(value, options, partialMarkup) {
       t = Date.now(),
       ignoredMarkupChunks = [],
       ignoredCustomMarkupChunks = [],
+      asyncResults = [],
+      uidAsync,
       uidIgnore,
       uidAttr,
       uidPattern;
@@ -848,11 +861,13 @@ function minify(value, options, partialMarkup) {
     return token;
   });

-  function escapeFragments(text) {
-    return text.replace(uidPattern, function(match, prefix, index) {
+  function escapeFragments(fn) {
+    return function(text, type, callback) {
+      return fn(text.replace(uidPattern, function(match, prefix, index) {
         var chunks = ignoredCustomMarkupChunks[+index];
         return chunks[1] + uidAttr + index + chunks[2];
-    });
+      }), type, callback);
+    };
   }

   var customFragments = options.ignoreCustomFragments.map(function(re) {
@@ -865,17 +880,11 @@ function minify(value, options, partialMarkup) {
       if (!uidAttr) {
         uidAttr = uniqueId(value);
         uidPattern = new RegExp('(\\s*)' + uidAttr + '([0-9]+)(\\s*)', 'g');
-        var minifyCSS = options.minifyCSS;
-        if (minifyCSS) {
-          options.minifyCSS = function(text) {
-            return minifyCSS(escapeFragments(text));
-          };
+        if (options.minifyCSS) {
+          options.minifyCSS = escapeFragments(options.minifyCSS);
         }
-        var minifyJS = options.minifyJS;
-        if (minifyJS) {
-          options.minifyJS = function(text, inline) {
-            return minifyJS(escapeFragments(text), inline);
-          };
+        if (options.minifyJS) {
+          options.minifyJS = escapeFragments(options.minifyJS);
         }
       }
       var token = uidAttr + ignoredCustomMarkupChunks.length;
@@ -884,6 +893,33 @@ function minify(value, options, partialMarkup) {
     });
   }

+  function toSyncSignature(fn) {
+    return function(text, type) {
+      var index = asyncResults.length++;
+      var called = false;
+      fn(text, type, function(value) {
+        if (called) {
+          throw new Error('Async completion has already occurred.');
+        }
+        called = true;
+        asyncResults[index] = value;
+        taskCompleted();
+      });
+      return uidAsync + index + '_' + text + uidAsync;
+    };
+  }
+
+  if (callback) {
+    asyncResults.completed = 0;
+    uidAsync = uniqueId(value);
+    if (options.minifyJS) {
+      options.minifyJS = toSyncSignature(options.minifyJS);
+    }
+    if (options.minifyCSS) {
+      options.minifyCSS = toSyncSignature(options.minifyCSS);
+    }
+  }
+
   if (options.sortAttributes && typeof options.sortAttributes !== 'function' ||
       options.sortClassName && typeof options.sortClassName !== 'function') {
     createSortFns(value, options, uidIgnore, uidAttr);
@@ -1245,6 +1281,7 @@ function minify(value, options, partialMarkup) {

   var str = joinResultSegments(buffer, options);

+  function done() {
     if (uidPattern) {
       str = str.replace(uidPattern, function(match, prefix, index, suffix) {
         var chunk = ignoredCustomMarkupChunks[+index][0];
@@ -1273,6 +1310,23 @@ function minify(value, options, partialMarkup) {
     return str;
   }

+  function taskCompleted() {
+    if (++asyncResults.completed > asyncResults.length) {
+      str = str.replace(new RegExp(uidAsync + '([0-9]+)_[\\s\\S]*?' + uidAsync, 'g'), function(match, index) {
+        return asyncResults[+index];
+      });
+      callback(null, done());
+    }
+  }
+
+  if (callback) {
+    taskCompleted();
+  }
+  else {
+    return done();
+  }
+}
+
 function joinResultSegments(results, options) {
   var str;
   var maxLineLength = options.maxLineLength;
@@ -1300,6 +1354,16 @@ function joinResultSegments(results, options) {
   return options.collapseWhitespace ? collapseWhitespace(str, options, true, true) : str;
 }

-exports.minify = function(value, options) {
+exports.minify = function(value, options, callback) {
+  if (callback) {
+    try {
+      minify(value, options, null, callback);
+    }
+    catch (err) {
+      callback(err);
+    }
+  }
+  else {
     return minify(value, options);
+  }
 };

@alexlamsl
Copy link
Collaborator

... except it doesn't quite handle attribute value wrapping correctly:

Actual:
<p style="*{font: 12pt 'bar'}"></p>
Expected:
<p style="font: 12pt 'bar'"></p>

@alexlamsl
Copy link
Collaborator

Another issue with attribute values - optimisation of quotes relies on the minified content:

Actual:
<div style="font: "monospace"">foo$</div>
Expected:
<div style='font: "monospace"'>foo$</div>

@alexlamsl
Copy link
Collaborator

alexlamsl commented Apr 13, 2018

These cases are discovered by adding the following to tests/minifier.js

function test_minify(assert, input, options, output, description) {
  if (typeof options === 'string') {
    description = output;
    output = options;
    options = null;
  }
  assert.equal(minify(input, options), output, description || input);
  if (options) {
    if (typeof options.minifyJS === 'function') {
      var minifyJS = options.minifyJS;
      options.minifyJS = function(text, inline, callback) {
        callback(minifyJS(text, inline));
      };
    }
    if (typeof options.minifyCSS === 'function') {
      var minifyCSS = options.minifyCSS;
      options.minifyCSS = function(text, type, callback) {
        callback(minifyCSS(text, type));
      };
    }
  }
  var done = assert.async();
  assert.notOk(minify(input, options, function(error, result) {
    assert.notOk(error);
    assert.equal(result, output);
    done();
  }));
}

... followed by changing anything of the form:

assert.equal(minify(input[, options]), output[, description])

into:

test_minify(assert, input[, options], output[, description])

@alexlamsl
Copy link
Collaborator

tests/minifier.js that exercises most of the existing test cases in both synchronous and asynchronous modes.

@RebeccaStevens
Copy link
Author

All the node tests are now passing 😁
I'm not sure why one of the web tests isn't 😕

@RebeccaStevens
Copy link
Author

🎉

* @param {string} output
* @param {string} [description]
*/
function test_minify_sync(assert, input, options, output, description) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to switch the order of options and output so that all optional params are together?

(Same would apply to test_minify_async)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diffstat is too large as it is for this file to ensure correctness as is already - I'll probably make another PR as I said back in #908 (comment)

That aside, I don't understand why there is a distinction between test_minify_sync and test_minify_async:

  • the use of existing API should always be synchronous
  • the use of callbacks should always be asynchronous (whether some of them may complete immediately or not)

So the requirement is that all existing test cases should always go through both synchronous & asynchronous modes, the only difference should only be whether they throw/return an error or not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are free to use setTimeout() within minifyCSS & minifyJS to emulate asynchronous behaviour - I have tried and it works on both node and web testing modes.

@RebeccaStevens
Copy link
Author

I believe everything should be working now.

With regard to test_minify_sync and test_minify_async; the distinction was because synchronous tests would fail when running a test that can only compete asynchronously. I have since merged these two functions into one; a parameter asyncOnly states whether or not the test can only complete asynchronously.

With regard to the minifier.js file; I created a NodeJS script that will automatically convert the original file to use the new test functions. Once that is done, the test functions can be added along with the couple of new tests. This script seems to work correctly except for one single line that I had to change manually.

@alexlamsl
Copy link
Collaborator

There shouldn't be any tests that only work in asynchronous mode - wrapper for minifyCSS and minifyJS is fairly straightforward and can be automated to make sure every test covers both modes of operation.

@alexlamsl
Copy link
Collaborator

This doesn't work: Amazon.html

$ node
> require('.').minify(fs.readFileSync('Amazon.html', 'utf8'));
RangeError: Maximum call stack size exceeded
    at isExecutableScript

@RebeccaStevens
Copy link
Author

There shouldn't be any tests that only work in asynchronous mode

That is true for the existing tests but I have added a few extra tests that only apply to the async case.
For example, testing that an error occurs if a callback is called more than once. (These new tests can be removed if they are considered unneeded).

I'll look into what is going on with Amazon.html tomorrow.

@RebeccaStevens
Copy link
Author

So with regard to the Maximum call stack size exceeded error that occurs with Amazon.html; the error occurs because there are too many callbacks being put onto the stack. When I increase the stack size it works fine.

A simple fix to this issue would be to make all the callback asynchronous. However this would break the current synchronous behaviour. Another approach a long the same lines would be to catch the error and recover from it by making the next callback asynchronous; however this has the same issue as before.

In order to keep the current synchronous behaviour, I need to make it so the changes in this PR don't effect the way the synchronous code executes (i.e synchronous code cannot use a large callback chains). In the asynchronous case, one of the two methods proposed above should be implemented.

@RebeccaStevens
Copy link
Author

RebeccaStevens commented Apr 26, 2018

The Maximum call stack size exceeded error should no longer occur.

If a callback is provided to minify all internal Task callbacks will now be made async. This unfortunately has the downside of increasing the execution time when running in async mode.

I initially tried the approach of recovering from the Maximum call stack size exceeded error but JavaScript started acting weird when I tried to do this. (I could catch the error as long as I didn't do anything with it. As soon as I tried to do something with the error, the error would no longer being caught by the try catch 😕) - Turns out it was just my debugger acting up.

Minify execution times for Amazon.html:

Before latest changes:

Run Type Execution Time
Sync ~50ms
Async* ~50ms

*With increased stack size.

After latest changes:

Run Type Execution Time
Sync ~50ms
Async with process.nextTick ~110ms
Async with setTimeout ~7000ms

@RebeccaStevens
Copy link
Author

Both sync and async now run the above test in ~50ms.

@RebeccaStevens
Copy link
Author

Unfortunately with that last change, the sync and async versions of minify are outputting different results. It seems that difference is caused when the Maximum call stack size exceeded error occurs.

I think I know what's going on (something to do with the buffer changes not be recovered) but I'll leave it until tomorrow.

@RebeccaStevens
Copy link
Author

Everything should now be working correctly.

In order to fix the issue with the buffer changes not being rolled back in the event of a Maximum call stack size exceeded error, I had to make it so that the buffer is cloned between each internal task. This adds a lot of overhead resulting in the async version of minify taking ~320ms to complete the minification of Amazon.html (sync version still takes ~50ms).

I'm not sure if it would be best to go with the make every callback async approach or the recover from an error approach. The first options is faster when in NodeJS as process.nextTick can be used, but it is a lot slower when having to rely on setTimeout which browsers will have to do.

Ideally it would be best to detect if a Maximum call stack size exceeded error in advance and only make the next callback async in that case. Unfortunately I don't think it's possible to detect this error until after it already occurred. 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants