Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: validate that strip-types does not insert any code #54141

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 50 additions & 1 deletion lib/internal/modules/helpers.js
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ const internalFS = require('internal/fs/utils');
const path = require('path');
const { pathToFileURL, fileURLToPath } = require('internal/url');
const assert = require('internal/assert');
const { Buffer: { from: BufferFrom } } = require('buffer');

const { getOptionValue } = require('internal/options');
const { setOwnProperty } = require('internal/util');
Expand Down Expand Up @@ -307,10 +308,58 @@ function lazyLoadTSParser() {
return parseTS;
}

function doesTSStripTypesResultMatchSource(code, source) {
const sourceBuf = BufferFrom(source);
const codeBuf = BufferFrom(code);
if (sourceBuf.length !== codeBuf.length) { return false; }
Comment on lines +312 to +314
Copy link
Contributor

@aduh95 aduh95 Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, have you considered doing a comparison of the JS strings? It seems to me it would be much simpler (and maybe more performant) than comparing buffers, especially given the parameters are already strings. wdyt?

for (let i = 0; i < source.length; i++) {
  const char = StringPrototypeCodePointAt(code, i);
  if (char !== 0x3b && char !== 0x20 && char !== StringPrototypeCodePointAt(source, i)) {
     return false;
  }
}
return true;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aduh95 will recheck!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't use that because the original impl which this rechecks is byte-level based
But it can be done, yes

and maybe more performant

perf results are inconclusive on this even after initial optimization of this for fast paths -- this is faster for multi-byte but slower for regular ascii sources

will try another version shortly

Copy link
Member Author

@ChALkeR ChALkeR Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, the code I used was:

  function doesTSStripTypesResultMatchSource2(code, source) {
    if (code.length !== source.length) { return false; }
    for (let i = 0; i < code.length; i++) {
      const a = code.codePointAt(i)
      if (a === source.codePointAt(i)) continue
      if (a !== 0x20 && a !== 0x3b && a !== 0xa0 && a !== 0x2002 && a !== 0xfeff) return false
    }
    return true
  }
  1. The set of chars is different
  2. This also misses a check for now that 0xfeff can come only after a space

I think we can land Buffer-based impl and optimize it later if needed (hopefully not)

for (let i = 0; i < codeBuf.length; i++) {
// Should match either the source buffer or spaces (sometimes multi-byte) or semicolon
const val = codeBuf[i];
if (val === sourceBuf[i]) {
// Source match
continue;
}
// https://github.com/nodejs/amaro/blob/e533394f576f946add41dd8816816435e8100c3b/deps/swc/crates/swc_fast_ts_strip/src/lib.rs#L400-L414
if (val === 0x3b) {
// Semicolon 3b
continue;
}
// https://github.com/nodejs/amaro/blob/e533394f576f946add41dd8816816435e8100c3b/deps/swc/crates/swc_fast_ts_strip/src/lib.rs#L200-L226
if (val === 0x20) {
// Space 20
continue;
}
if (val === 0xc2) {
// No-Break Space 00A0
if (i + 1 >= codeBuf.length) { return false; }
if (codeBuf[++i] !== 0xa0) { return false; }
continue;
}
if (val === 0xe2) {
// En Space 2002
if (i + 2 >= codeBuf.length) { return false; }
if (codeBuf[++i] !== 0x80) { return false; }
if (codeBuf[++i] !== 0x82) { return false; }
continue;
}
if (val === 0xef) {
// ZWNBSP FEFF
if (i < 1 || codeBuf[i - 1] !== 0x20) { return false; } // Only can get insterted after a space
if (i + 2 >= codeBuf.length) { return false; }
if (codeBuf[++i] !== 0xbb) { return false; }
if (codeBuf[++i] !== 0xbf) { return false; }
continue;
}
return false;
}
return true;
}

function tsParse(source) {
if (!source || typeof source !== 'string') { return; }
const transformSync = lazyLoadTSParser();
const { code } = transformSync(source);
const { code } = transformSync(source, { __proto__: null, mode: 'strip-only' });
assert(doesTSStripTypesResultMatchSource(code, source), 'Unexpected strip-types result');
return code;
}

Expand Down
Loading