-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
module: validate that strip-types does not insert any code #54141
base: main
Are you sure you want to change the base?
Conversation
Review requested:
|
31545ec
to
46fd9dd
Compare
46fd9dd
to
08ecab7
Compare
8e3566e
to
4ab7ef5
Compare
I'm ok with 346469c, n2 can we do that in amaro with a flag? |
294585f
to
239e6bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the wrong place to solve this.
This adds non-null overhead for loading every single file to defend against code that we control. Let's fix the reproducible build problem.
I'm ok if we run this check in CI either in the form of a test or an optional check.
239e6bb
to
fc9fa66
Compare
fc9fa66
to
ae27328
Compare
@mcollina not "every single file", but every local typescript file under This does not affect existing setups Running this in CI only won't give any sufficient gurantees
We don't No one here reviewed those 500 MiBs of deps in /amaro repo No one even noticed those are not being in fact used from that repo but are coming from elsewhere |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, regardless of reproducable builds it seems useful to have some guarentee that V8 will indeed execute users' code and nothing else. It only slow down files with types to strip, not all files IIUC.
nit: Any chance we can avoid the Buffer -> string -> Buffer
conversion in e.g.
node/lib/internal/modules/esm/translators.js
Line 312 in 35f92d9
const code = tsParse(stringify(source)); |
Maybe we could supply another parameter that contains the original source in addition to the stringified version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this solves? We don't trust the strip code and another layer during runtime to make sure it does? Shouldn't this just be a test that runs in CI prior to release? Can download x number of repositories run the type stripping and test that it does what it should.
@ronag a supply chain attack like the one on ESLint can hide itself and activate itself under specific conditions Testing on CI does not help validate blackboxes against intentional attacks (only against unintentional mistakes) Sandboxing + validating in runtime does |
Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
Instead of doing checks at runtime, maybe we can just have a flag that spits out the type-stripped code to a directory (or just to stdout and let users redirect as they like?) That might also be useful for those who try to debug the type stripping, in case there are any bugs. |
As for the perf concern: this adds 3% overhead to pure I.e. this check is ~33x faster than (and affects only files transformed from typescript under --experimental-strip-types flag by wasm) |
@joyeecheung that seems much more complex (user options/format/output decisions etc) and doesn't solve the problem of a potentally carefully hidden exploit in the supply chain that activates only after specific conditions We can't expect users to check all code generated at runtime Especially given that it's immediately executed either way |
We also have to consider that this might make future progress and iteration on this feature more difficult (possibly turning away contributors). For example moving beyond just type stripping. |
How does bun and deno solve this? |
@ronag landing this would fix the immediate concern, it doesn't mean that it can't be removed once we figure out {build chain / what gets into the bundle / how do we review it} process Which should be done by the time something except typestripping is added |
const sourceBuf = BufferFrom(source); | ||
const codeBuf = BufferFrom(code); | ||
if (sourceBuf.length !== codeBuf.length) { return false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, have you considered doing a comparison of the JS strings? It seems to me it would be much simpler (and maybe more performant) than comparing buffers, especially given the parameters are already strings. wdyt?
for (let i = 0; i < source.length; i++) {
const char = StringPrototypeCodePointAt(code, i);
if (char !== 0x3b && char !== 0x20 && char !== StringPrototypeCodePointAt(source, i)) {
return false;
}
}
return true;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aduh95 will recheck!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't use that because the original impl which this rechecks is byte-level based
But it can be done, yes
and maybe more performant
perf results are inconclusive on this even after initial optimization of this for fast paths -- this is faster for multi-byte but slower for regular ascii sources
will try another version shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, the code I used was:
function doesTSStripTypesResultMatchSource2(code, source) {
if (code.length !== source.length) { return false; }
for (let i = 0; i < code.length; i++) {
const a = code.codePointAt(i)
if (a === source.codePointAt(i)) continue
if (a !== 0x20 && a !== 0x3b && a !== 0xa0 && a !== 0x2002 && a !== 0xfeff) return false
}
return true
}
- The set of chars is different
- This also misses a check for now that
0xfeff
can come only after a space
I think we can land Buffer-based impl and optimize it later if needed (hopefully not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I think we can remove this at a later step.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #54141 +/- ##
=======================================
Coverage 87.06% 87.06%
=======================================
Files 643 643
Lines 181576 181625 +49
Branches 34894 34895 +1
=======================================
+ Hits 158088 158140 +52
- Misses 16759 16777 +18
+ Partials 6729 6708 -21
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the tsc meeting, I believe this check belongs inside amaro behind a configuration flag that should be enabled by node
Yes, that seems to solve the concerns as well! |
One thing that is degraded from moving into amaro directly is that it' now not possible to use safe methods that were cached prior to user code being loaded Which means that there will be an easy way for the users to circumvent this check when it's moved into ... which seems to be an acceptable risk / out of scope |
Possibly, yes. But, according to our threat model, this is ok. |
Yeah, just mentioning for visibility/transparency |
We can do this while the mode is strip-types only
The wasm binary is not expected to insert any other changes in the sourcecode except for the allowed symbols / patterns
We can use this to increase trust in the code that wasm binary outputs
It is still possible to do unexpected things under this limitation, but it makes anything funny much harder for a supply chain attack
Now does this if internal swc lib is e.g. switched to a codegen mode unexpectedly:
Existing tests should pass, this is just a safeguard
Should also add some multi-byte stripping tests