process: introduce codeGenerationFromString event #35157

vdeturckheim · 2020-09-11T17:34:31Z

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

This is an alternative to #34863 following a suggestion from @addaleax.

As for #34863, the goal is to provide a way to listen to unsafe code generation from strings as it is not possible to monkeypatch eval.

The event will only be emitted if there is at least one listener on it and removing all listeners on this event will result in the handler in V8 to never be called. In other words, if there is no listener on this event, there should be no performance impact on calling eval or the Function constructor.

doc/api/process.md

test/parallel/test-process-codegen-from-string.js

lib/internal/bootstrap/node.js

devsnek · 2020-09-11T23:30:42Z

This still feels like the wrong abstraction level. What if we did --code-generation-handler ./file.js instead?

mmarchini · 2020-09-11T23:39:12Z

A flag won't be usable on many serverless environments, being able to set it at runtime (like this PR proposes) seems more appropriate.

devsnek · 2020-09-11T23:45:20Z

My main concern here is that some random library has no business knowing what i'm passing to eval and new Function. My secondary concern is that the event model is wonkey.

How about this:

const release = v8.setCodeGenerationFromStringHandler((code) => {});
// ...
release();

Once called, it can't be called again until release is called.

mmarchini · 2020-09-12T00:15:47Z

This is an overall problem with the events we have on process, even uncaughtException and unhandledRejections have use cases where the application doesn't want modules to attach handlers to it. Kinda outside the scope of this PR, but could we somehow use policies to limit which modules can attach process handlers?

guybedford · 2020-09-12T00:25:02Z

Agreed it would be great to see this kind of stuff moved out into loaders or a similar API as a higher level execution context. We previously discussed disabling high-level permissions on process, I would still like to see us deprecate more and more features from the process global over time. Even a new builtin module import for these features could allow policy scoping for who has access to it.

But however it is cut, likely does seem to require a new API away from process and deprecating the old on process.

devsnek · 2020-09-12T00:27:28Z

@mmarchini do you have a constraint that this needs to be a process event? If not, I think we should go with the single-callback api.

mmarchini · 2020-09-12T00:36:54Z

No constraints, as long as the handler can be defined during runtime I'm fine with it.

vdeturckheim · 2020-09-14T15:51:24Z

@devsnek

My main concern here is that some random library has no business knowing what i'm passing to eval and new Function.

That's a very fair point. As of today, almost everything in Node.js can be monkeypatched and leak implementation details. For instance. This is true for core modules, but also for built-in such as RegExp.prototype.exec. eval is the only one I know that is un-mokeypatchable. Most of the time (I'd say 90%), using eval is a bad idea. This helps actually monitor how/why it is used and by who.

I am not sure I understand your secondary concern. This current PR does not change the v8 module.

devsnek · 2020-09-14T15:53:21Z

@vdeturckheim are you saying you want random libraries to be able to tune into all evals and whatnot in the main context?

vdeturckheim · 2020-09-14T15:57:38Z

@devsnek well right now, any library can spy on most of the things other pieces of code do. That's pretty much how APMs work. eval being the main exception to that rule and being at the same time, often something that should be avoided. Providing runtime visibility on calls made to eval to multiple moniroting tool is pretty useful.

devsnek · 2020-09-14T15:59:47Z

@vdeturckheim I can't tell if you're saying you need the event model or not. can you be a bit more direct? afaict the api I proposed above should work.

vdeturckheim · 2020-09-14T16:01:46Z

@devsnek gotcha, sorry I missunderstood the question ^^. I only need to place one callback for my use case. But I am afraid there might be multiple libraries needing it (say, one RASP for security, one enterprise compliance tool and an APM), and a single callback would make the use of them exclusive. That's why I used an event emitter here.

devsnek · 2020-09-14T16:04:32Z

I guess in my ideal world I would like to see this:

setCallback((code) => {
  security(code);
  compliance(code);
  apm(code);
})

do people think this is unworkable?

vdeturckheim · 2020-09-14T16:10:07Z

It would require the end user to set this up and assume no library does it without the end user knowing.
Another solution would be to expose the callback so libraries could composes multiple ones but this would end up witht the same result as the event emitter.

Overall, I found out that in the scope of instrumentation, end users don't really want to know about implementation and vendors often do their own things without considering the other ones really.

devsnek · 2020-09-15T01:54:29Z

Well I won't block on this, but it is certainly not ideal.

vdeturckheim · 2020-09-15T07:58:38Z

@cjihrig thanks for the review, it should be good :)

Trott · 2020-09-16T13:39:42Z

@nodejs/process

benjamingr · 2020-09-16T20:44:11Z

I don't understand the use case for this - why would I use this?

vdeturckheim · 2020-09-17T08:41:22Z

@benjamingr there are multiple reasons why one could want to listen to eval. The one I see and would use is to monitor if some pieces of code use eval and dynamically compare it with inputs (say http request) to ensure there is no exploitable/exploited code injection though this method.

lib/internal/bootstrap/node.js

vdeturckheim · 2020-09-21T09:24:07Z

Adding WIP flag as I need to fix a build issue (warning emitted when compiling the callback)

watson · 2021-11-19T13:41:52Z

test/parallel/test-process-codegen-from-string.js

+process.on('codeGenerationFromString', common.mustCall((code) => {
+  assert.strictEqual(code, 'item.foo++');
+}));


As @vdeturckheim pointed out in a comment on this PR, there's something not working as intend in this callback:

If an error is thrown inside of this callback, it doesn't always get picked up. In this example code, the callback will get called, but the error will be silenced:

process.on('codeGenerationFromString', (code) => { console.log('Hello from callback!') throw new Error('boom!!!') // error is silenced }) eval('1+1')

However, this code will not silence the error:

process.on('codeGenerationFromString', (code) => { console.log('Hello from callback!') throw new Error('boom!!!') // error is not silenced }) console.log('result of eval:', eval('1+1'))

I found that queing a microtask after the call to eval is also sufficient:

process.on('codeGenerationFromString', (code) => { console.log('Hello from callback!') throw new Error('boom!!!') // error is not silenced }) eval('1+1') queueMicrotask(() => {})

However, process.nextTick(() => {}) is not.

On top of this, if the listener is removed before the microtask is queued, the error is once again swallowed:

process.on('codeGenerationFromString', (code) => { console.log('Hello from callback!') throw new Error('boom!!!') // error is silenced }) eval('1+1') process.removeAllListeners('codeGenerationFromString') queueMicrotask(() => {})

My initial guess was that the error was handled asynchronously and hence if the program ended or the listener was removed before Node.js had a chance to handle the error then it would be lost. However, if I swap the removeAllListeners and the queueMicrotask calls in the last code-example above, the error is actually retained, which I don't think it should according to that theory. So the very act of queueing the microtask - not actually waiting for it to be executed - seems to be enough for it to retain the error before the listener is removed.

New discovery: If I add a line of invalid code after removing the listeners, I don't get any error complaining about that line either. So it looks like it doesn't get to that line at all:

process.on('codeGenerationFromString', (code) => { console.log('Hello from callback!') throw new Error('boom!!!') // error is silenced }) eval('1+1') process.removeAllListeners('codeGenerationFromString') bad_code

I tried to compile Node.js with debug mode enabled (./configure --debug && make -j8) and ran it via lldb with breakpoints set to _exit and exit. And it looked like the program just ended normally like a regular program (which also explains the zero exit code).

Ok, with a lot of help from @addaleax I finally understand why the error was "silenced":

If V8 doesn't expect a callback given to it to call into JS-land, any JS exception thrown will not be properly handled. So an explicit try-catch has to be added to the C++ code that's calling into JS to ensure that the exception is handled. We have the same problem here:

node/src/node_task_queue.cc

Lines 153 to 159 in fa7cdd6

// V8 does not expect this callback to have a scheduled exceptions once it

// returns, so we print them out in a best effort to do something about it

// without failing silently and without crashing the process.

if (try_catch.HasCaught() && !try_catch.HasTerminated()) {

fprintf(stderr, "Exception in PromiseRejectCallback:\n");

PrintCaughtException(isolate, env->context(), try_catch);

}

I've updated this PR accordingly.

However, this unfortunately means that Node.js will not exit with an uncaught exception and so this isn't easily tested in our tests (except if we check the output to stderr I guess). And it also can be quite confusing for users if they except Node.js to crash.

Would process.on('uncaughtexception') still fire?
Thanks a lot for digging on this bug. this is amazing finding!

@vdeturckheim No it will not. The exception is now technically handled

guybedford · 2021-11-24T17:44:18Z

Should dynamic import of data: URLs fall under this capacity?

nodejs-github-bot · 2021-11-24T17:47:50Z

CI: https://ci.nodejs.org/job/node-test-pull-request/41081/

vdeturckheim · 2021-11-24T17:48:37Z

@guybedford maybe 🤔 tbh, this mostly mimics the hook in V8 right now. In theory, but we could try to align on trusted types sinks. But it depends what V8 exposes. It is already a good first step in it's current form imho.

devsnek · 2021-11-24T18:23:54Z

I think v8 should be updated to handle these exceptions rather than the current hack. Looking at the code I don't think it would be very complex and we should be able to backport it.

watson · 2021-11-24T20:22:54Z

@devsnek that would be preferable sure (though we already use this hack in other parts of core - so I'm not sure how important it is??).

Do you know what the turnaround time for this sort of thing to be implemented, released and backported in V8 would normally be?

devsnek · 2021-11-24T20:32:16Z

not sure how the thanksgiving holiday changes things but for a small change usually a day or two to get it into v8 and then you just have to backport it into node.

watson · 2021-11-25T09:59:14Z

@devsnek haha that's great. I was just hoping we were not talking several months ☺️

The 'codeGenerationFromString' event is emitted when a call is made to `eval` or `new Function`. Co-authored-by: Thomas Watson <w@tson.dk>

watson · 2021-11-25T11:34:32Z

@devsnek what's the process for getting that work started in V8 and do you know if there's anything I can do to help?

watson · 2021-11-25T12:53:12Z

Small discovery which might just be related to crossing the JS/C++ boundary:

If I want to get the filename, line and column number from where eval was called in the JS source, the line/col number is not always 100% correct. Example:

process.on('codeGenerationFromString', (code) => {
  // print the frame in the stack trace where `eval` was called
  console.log('Detected code generation from string', new Error().stack.split('\n')[3].trim())
})

eval('1 + 1') // should be line 6, col 1 - this is also what is printed
someFn(eval('1 + 1')) // should be line 7, col 8 - is actually line 7, col 1
someFn(
   eval('1 + 1') // should be line 9, col 3 - is actually line 8, col 1
)
someFn(42, eval('1 + 1')) // should be line 11, col 12 - is actually line 11, col 1
someFn(
  42,
  eval('1 + 1') // should be line 14, col 3 - is actually line 12, col 1
)

Am I correct in my assumption that this is just an unfortunate side effect of crossing the JS/C++ boundary and that nothing can be done about it?

For reference, this is what the complete stack trace looks like for the first eval call in the code above:

Error
    at process.<anonymous> (my-app.js:3:55)
    at process.emit (node:events:390:28)
    at Object.<anonymous> (my-app.js:6:1)
    at Module._compile (node:internal/modules/cjs/loader:1097:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1149:10)
    at Module.load (node:internal/modules/cjs/loader:975:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
    at node:internal/main/run_main_module:17:47

Update: I found that this is only an issue for eval. If using new Function, Function, Function.apply, Function.call or even indirect eval (where I assign eval to another variable before calling the function), the line/column numbers are correct.

devsnek · 2021-11-25T14:41:06Z

if you want to wait for someone from v8 to do this it probably will be several months. it should be a pretty simple change if you want to do it though, you just basically need to go around adding ASSIGN_RETURN_FAILURE_ON_EXCEPTION to Compiler::ValidateDynamicCompilationSource calls and then maybe a few RETURN_EXCEPTION_IF_SCHEDULED_EXCEPTION elsewhere. i'm available on irc/discord/etc if you want to walk through it.

guybedford · 2021-11-25T17:20:35Z

@vdeturckheim the issue is just in terms of it being a security property ensuring the property is exhaustive is important, and it could be possible to write a custom interception that shares the same hook as they are the same thing from a user perspective.

jasnell · 2021-11-25T17:58:10Z

the goal is to provide a way to listen to unsafe code generation from strings as it is not possible to monkeypatch

Can I step this back just a bit further and ask why we'd want to do this? What are the use cases here outside of logging or preventing eval by throwing in the handler?

doc/api/process.md

watson · 2021-11-25T21:54:20Z

@jasnell in our case we want to audit code generation from strings as simply using --disallow-code-generation-from-strings will block all usage while in reality you might want to allow some occurrences based on calling frame

jasnell · 2021-11-25T22:43:53Z

Ok. Do you actually need access to code string being evaluated for that? Or just a stack to get the calling frame?

vdeturckheim · 2021-11-26T10:49:24Z

@guybedford That what I meant by mentioning trusted types. There are multiple sinks for code generation from strings and the API should/could cover them all, so +1 :D

@jasnell in my use case, I need to know the code to identify its origin, for instance, this is used to detect RCEs by comparing the argument in the evil eval call with an incoming HTTP request to identify wether we are facing injected payloads.

watson · 2021-11-30T12:57:48Z

@jasnell In my case all I need is the calling stack frame

Co-authored-by: James M Snell <jasnell@gmail.com>

nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Sep 11, 2020

cjihrig reviewed Sep 11, 2020

View reviewed changes

devsnek closed this Sep 11, 2020

devsnek reopened this Sep 11, 2020

cjihrig reviewed Sep 17, 2020

View reviewed changes

lib/internal/bootstrap/node.js Outdated Show resolved Hide resolved

cjihrig approved these changes Sep 19, 2020

View reviewed changes

vdeturckheim mentioned this pull request Sep 21, 2020

v8: introduce v8.emitCodeGenFromStringEvent() #34863

Closed

4 tasks

vdeturckheim added the wip Issues and PRs that are still a work in progress. label Sep 21, 2020

vdeturckheim removed the wip Issues and PRs that are still a work in progress. label Sep 22, 2020

vdeturckheim marked this pull request as draft October 2, 2020 08:27

watson reviewed Nov 19, 2021

View reviewed changes

watson marked this pull request as ready for review November 24, 2021 17:54

github-actions bot mentioned this pull request Nov 25, 2021

CI Reliability 2021-11-25 nodejs/reliability#128

Open

38 tasks

process: introduce codeGenerationFromString event

4c5a5c6

The 'codeGenerationFromString' event is emitted when a call is made to `eval` or `new Function`. Co-authored-by: Thomas Watson <w@tson.dk>

jasnell reviewed Nov 25, 2021

View reviewed changes

doc/api/process.md Outdated Show resolved Hide resolved

github-actions bot mentioned this pull request Nov 26, 2021

CI Reliability 2021-11-26 nodejs/reliability#129

Open

33 tasks

This was referenced Nov 27, 2021

CI Reliability 2021-11-27 nodejs/reliability#130

Open

CI Reliability 2021-11-28 nodejs/reliability#131

Open

CI Reliability 2021-11-29 nodejs/reliability#132

Open

Update doc/api/process.md

daeb37b

Co-authored-by: James M Snell <jasnell@gmail.com>

Trott force-pushed the main branch from 2d76238 to ca3ed36 Compare November 12, 2022 01:49

vdeturckheim closed this by deleting the head repository Mar 30, 2023

watson mentioned this pull request May 25, 2023

Audit usage of server-side code generation from strings elastic/kibana#84816

Open

watson mentioned this pull request Jun 2, 2023

process: introduce codeGenerationFromString event #48295

Draft

3 tasks

	// V8 does not expect this callback to have a scheduled exceptions once it
	// returns, so we print them out in a best effort to do something about it
	// without failing silently and without crashing the process.
	if (try_catch.HasCaught() && !try_catch.HasTerminated()) {
	fprintf(stderr, "Exception in PromiseRejectCallback:\n");
	PrintCaughtException(isolate, env->context(), try_catch);
	}

process: introduce codeGenerationFromString event #35157

process: introduce codeGenerationFromString event #35157

Conversation

vdeturckheim commented Sep 11, 2020

Checklist

devsnek commented Sep 11, 2020

mmarchini commented Sep 11, 2020

devsnek commented Sep 11, 2020 • edited Loading

mmarchini commented Sep 12, 2020

guybedford commented Sep 12, 2020

devsnek commented Sep 12, 2020

mmarchini commented Sep 12, 2020

vdeturckheim commented Sep 14, 2020

devsnek commented Sep 14, 2020

vdeturckheim commented Sep 14, 2020

devsnek commented Sep 14, 2020

vdeturckheim commented Sep 14, 2020

devsnek commented Sep 14, 2020

vdeturckheim commented Sep 14, 2020

devsnek commented Sep 15, 2020

vdeturckheim commented Sep 15, 2020

Trott commented Sep 16, 2020

benjamingr commented Sep 16, 2020

vdeturckheim commented Sep 17, 2020

vdeturckheim commented Sep 21, 2020

watson Nov 19, 2021 • edited Loading

Choose a reason for hiding this comment

watson Nov 19, 2021

Choose a reason for hiding this comment

watson Nov 24, 2021

Choose a reason for hiding this comment

watson Nov 24, 2021

Choose a reason for hiding this comment

watson Nov 24, 2021

Choose a reason for hiding this comment

vdeturckheim Nov 24, 2021 • edited Loading

Choose a reason for hiding this comment

watson Nov 24, 2021

Choose a reason for hiding this comment

guybedford commented Nov 24, 2021

nodejs-github-bot commented Nov 24, 2021

vdeturckheim commented Nov 24, 2021 • edited Loading

devsnek commented Nov 24, 2021

watson commented Nov 24, 2021

devsnek commented Nov 24, 2021

watson commented Nov 25, 2021

watson commented Nov 25, 2021

watson commented Nov 25, 2021 • edited Loading

devsnek commented Nov 25, 2021

guybedford commented Nov 25, 2021

jasnell commented Nov 25, 2021

watson commented Nov 25, 2021

jasnell commented Nov 25, 2021

vdeturckheim commented Nov 26, 2021

watson commented Nov 30, 2021

devsnek commented Sep 11, 2020 •

edited

Loading

watson Nov 19, 2021 •

edited

Loading

vdeturckheim Nov 24, 2021 •

edited

Loading

vdeturckheim commented Nov 24, 2021 •

edited

Loading

watson commented Nov 25, 2021 •

edited

Loading