Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: support multi-dot file extensions #23416

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions lib/internal/modules/cjs/loader.js
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,22 @@ function tryExtensions(p, exts, isMain) {
return false;
}

// find the longest (possibly multi-dot) extension registered in
// Module._extensions
function findLongestRegisteredExtension(filename) {
const name = path.basename(filename);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're only interested in the extension, is calling path.basename() necessary? You could just scan backwards for dots, checking that there are no path separators in between:

let dot = -1;
let sep = -1;

for (let i = filename.length; --i > 0; /* empty */) {
  const c = filename[i];
  if (c === '.')
    dot = i;
  else if (c === '/') { // FIXME handle backslash on windows
    sep = i;
    break;
  }
}

if (dot > sep + 1) {
  const ext = filename.slice(dot);
  if (Module._extensions[ext]) return ext;
}

return '.js';  // dot file or no extension

Only one slice == less garbage to collect.

Copy link
Member Author

@GeoffreyBooth GeoffreyBooth Oct 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current algorithm scans forwards to short-circuit at the longest matched extension. So say you have a file like mr.robot.coffee.md, and a loader registered for .coffee.md. The current algorithm will iterate like:

  • Is .robot.coffee.md registered? No, continue.
  • Is .coffee.md registered? Yes, break.

This way, even if .md is also registered, the .coffee.md loader takes precedence. This allows multi-dot extensions like .es6.js, for example.

If I’m reading your code correctly, it’s checking only the longest possible multi-dot extension, which for this example would be .robot.coffee.md, a false match. We want multi-dot extensions to work without prohibiting periods elsewhere in the filename.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps naming the function findRegisteredExtension would be more self-descriptive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardlau Yes, or perhaps findLongestRegisteredExtension would be even more descriptive (because we’re returning .coffee.md instead of .md, if both of those happened to be registered).

let currentExtension;
let index;
let startIndex = 0;
while ((index = name.indexOf('.', startIndex)) !== -1) {
startIndex = index + 1;
if (index === 0) continue; // Skip dotfiles like .gitignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are to skip the dotfiles, shouldn't this just break out of here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thefourtheye

If we are to skip the dotfiles, shouldn't this just break out of here?

No, because it's not skipping the whole dotfile, just the first dot of the dot file.

path.extname(".dotfile")
// => ""
path.extname(".dotfile.ext")
// => ".ext"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test was added for this case.

currentExtension = name.slice(index);
if (Module._extensions[currentExtension]) return currentExtension;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we just store and return the function instead to avoid a second, unnecessary lookup?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mscdex Would you clarify your question (maybe with pseudo code)?

Copy link
Contributor

@mscdex mscdex Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of Module._extensions[currentExtension] is thrown away here. However after this function returns, Module._extensions[extension] is done a second time. It would be nice if we could avoid this duplicate lookup.

So instead of:

currentExtension = name.slice(index);
if (Module._extensions[currentExtension]) return currentExtension;

we could do:

const loader = Module._extensions[name.slice(index)];
if (loader)
  return loader;

and change the calling code appropriately:

findExtension(filename)(this, filename);

The function name and the default return value would need to be changed of course.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote some code to do a performance test:

const path = require('path');

// Mock Module for testing
const Module = {
  _extensions: {
    '.js': function () {
      const makeWork = JSON.parse('{"foo": "bar", "baz": 123}');
      return makeWork;
    },
    '.json': function () {
      const makeWork = JSON.parse('{"foo": "bar", "baz": 123}');
      return makeWork;
    },
    '.node': function () {
      const makeWork = JSON.parse('{"foo": "bar", "baz": 123}');
      return makeWork;
    }
  }
}

// Generate 100000 filenames
const filenames = [];
for (let i = 0; i < 100000; ++i) {
  filenames.push(`${i}.${Module._extensions[i % 3]}`);
}


// Current version
function findExtension(filename) {
  const name = path.basename(filename);
  let currentExtension;
  let index;
  let startIndex = 0;
  while ((index = name.indexOf('.', startIndex)) !== -1) {
    startIndex = index + 1;
    if (index === 0) continue; // Skip dotfiles like .gitignore
    currentExtension = name.slice(index);
    if (Module._extensions[currentExtension]) return currentExtension;
  }
  return '.js';
}

// Altered version that returns the loader itself rather than the extension,
// so that there’s only one lookup
function findLoaderByExtension(filename) {
  const name = path.basename(filename);
  let currentExtension;
  let index;
  let startIndex = 0;
  let loader;
  while ((index = name.indexOf('.', startIndex)) !== -1) {
    startIndex = index + 1;
    if (index === 0) continue; // Skip dotfiles like .gitignore
    currentExtension = name.slice(index);
    if (loader = Module._extensions[currentExtension]) return loader;
  }
  return Module._extensions['.js'];
}


const run = {
  test1: function() {
    console.time('test1');
    filenames.forEach((filename) => {
      var extension = findExtension(filename);
      Module._extensions[extension](this, filename);
    });
    console.timeEnd('test1');
  },
  test2: function() {
    console.time('test2');
    filenames.forEach((filename) => {
      var loader = findLoaderByExtension(filename);
      loader(this, filename);
    });
    console.timeEnd('test2');
  }
}

run[process.argv[2]]();

Here are my results:

 ✦ node -v
v10.12.0
 ✦ node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1 && node test.js test1
test1: 85.108ms
test1: 85.046ms
test1: 87.216ms
test1: 85.767ms
test1: 85.926ms
test1: 85.650ms
test1: 84.461ms
test1: 85.011ms
test1: 83.525ms
test1: 87.337ms
 ✦ node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2 && node test.js test2
test2: 90.877ms
test2: 86.902ms
test2: 86.268ms
test2: 85.143ms
test2: 90.064ms
test2: 86.636ms
test2: 84.882ms
test2: 86.778ms
test2: 89.278ms
test2: 85.963ms
  • Test 1 average time: 85.505ms
  • Test 2 average time: 87.279ms

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffreyBooth My suggestion was not so much about a huge performance improvement as it is removing duplicated effort.

Also, if you really want to do benchmarks, they should be done via the official benchmark mechanism for the best comparison.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first version has two lookups, but the second version has an extra assignment (loader = Module._extensions[currentExtension]). So the “return the loader” version saves a second lookup at the cost of an extra assignment, which I think explains why the benchmarks are more or less equivalent between the two versions. Personally I think a helper that returns an extension is easier to grasp (and more potentially useful in the future) than a helper that returns a loader, but that’s a subjective call.

}
return '.js';
}

var warned = false;
Module._findPath = function(request, paths, isMain) {
if (path.isAbsolute(request)) {
Expand Down Expand Up @@ -600,8 +616,7 @@ Module.prototype.load = function(filename) {
this.filename = filename;
this.paths = Module._nodeModulePaths(path.dirname(filename));

var extension = path.extname(filename) || '.js';
if (!Module._extensions[extension]) extension = '.js';
var extension = findLongestRegisteredExtension(filename);
Module._extensions[extension](this, filename);
this.loaded = true;

Expand Down
18 changes: 0 additions & 18 deletions test/known_issues/test-module-deleted-extensions.js

This file was deleted.

94 changes: 94 additions & 0 deletions test/parallel/test-module-multi-extensions.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
'use strict';

// Refs: https://github.com/nodejs/node/issues/4778

const common = require('../common');
const assert = require('assert');
const fs = require('fs');
const path = require('path');
const Module = require('module');
const tmpdir = require('../common/tmpdir');
const file = path.join(tmpdir.path, 'test-extensions.foo.bar');
const dotfile = path.join(tmpdir.path, '.bar');
const dotfileWithExtension = path.join(tmpdir.path, '.foo.bar');

tmpdir.refresh();
fs.writeFileSync(file, 'console.log(__filename);', 'utf8');
fs.writeFileSync(dotfile, 'console.log(__filename);', 'utf8');
fs.writeFileSync(dotfileWithExtension, 'console.log(__filename);', 'utf8');

{
require.extensions['.bar'] = common.mustNotCall();
require.extensions['.foo.bar'] = common.mustCall();
const modulePath = path.join(tmpdir.path, 'test-extensions');
require(modulePath);
require(file);
delete require.cache[file];
delete require.extensions['.bar'];
delete require.extensions['.foo.bar'];
Module._pathCache = Object.create(null);
}

{
require.extensions['.foo.bar'] = common.mustCall();
const modulePath = path.join(tmpdir.path, 'test-extensions');
require(modulePath);
assert.throws(
() => require(`${modulePath}.foo`),
new Error(`Cannot find module '${modulePath}.foo'`)
);
require(`${modulePath}.foo.bar`);
delete require.cache[file];
delete require.extensions['.foo.bar'];
Module._pathCache = Object.create(null);
}

{
const modulePath = path.join(tmpdir.path, 'test-extensions');
assert.throws(
() => require(modulePath),
new Error(`Cannot find module '${modulePath}'`)
);
delete require.cache[file];
Module._pathCache = Object.create(null);
}

{
require.extensions['.bar'] = common.mustNotCall();
require.extensions['.foo.bar'] = common.mustCall();
const modulePath = path.join(tmpdir.path, 'test-extensions.foo');
require(modulePath);
delete require.cache[file];
delete require.extensions['.bar'];
delete require.extensions['.foo.bar'];
Module._pathCache = Object.create(null);
}

{
require.extensions['.foo.bar'] = common.mustNotCall();
const modulePath = path.join(tmpdir.path, 'test-extensions.foo');
assert.throws(
() => require(modulePath),
new Error(`Cannot find module '${modulePath}'`)
);
delete require.extensions['.foo.bar'];
Module._pathCache = Object.create(null);
}

{
require.extensions['.bar'] = common.mustNotCall();
require(dotfile);
delete require.cache[dotfile];
delete require.extensions['.bar'];
Module._pathCache = Object.create(null);
}

{
require.extensions['.bar'] = common.mustCall();
require.extensions['.foo.bar'] = common.mustNotCall();
require(dotfileWithExtension);
delete require.cache[dotfileWithExtension];
delete require.extensions['.bar'];
delete require.extensions['.foo.bar'];
Module._pathCache = Object.create(null);
}