Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: command line option to preload modules #881

Closed
wants to merge 1 commit into from

Conversation

ofrobots
Copy link
Contributor

Here's a proposal, along with a PR, I would like to get feedback from the community on.

It would be nice if there was a standard mechanism to allow vendors, such as a cloud host, to customize startup of node applications.

Basically, I want to be able to inject functionality into a node application without requiring an explicit require. This could be useful to automatically enable, e.g., vendor specific tracing and application performance monitoring APIs when applications are deployed.

There are a few ways this could be done:
0. (Status quo) Ask the developers to modify their applications and add an explicit require of a vendor provided npm module. This adds friction to the deployment experience and possibly prevents people from being able to easily try different cloud vendors.

  1. The vendor can maintain their own fork of node.js/io.js but with extra functionality added. This package would be offered instead of the standard node.js/io.js packages. This is potentially high overhead for the vendor. Effort may be duplicated by different vendors.
  2. Build a standard mechanism into node.js/io.js that allows vendors to inject some functionality into the node startup process.

Here's a proposal on how to do option 2 (EDIT: updated as per suggestions by @piscisaureus):

  • On startup, node checks if process.env.NODE_VENDOR_STARTUP is set. If so, the file node_prefix + '/lib/vendor/index.js' is loaded (if it exists).
  • Vendors could package up their distributions of node/io by adding lib/vendor.
    Another alternative would be for NODE_VENDOR_STARTUP to be a path. It gives more flexibility, but is probably not as secure.

EDIT: 20150313: change to command line option.

iojs -r foo -r /bar/baz.js -r quux

translates to:

require('foo');
require('/bar/baz.js');
require('quux');

Thoughts?

@cjihrig
Copy link
Contributor

cjihrig commented Feb 18, 2015

Isn't that the purpose of _third_party_main?

EDIT: Although that appears to be for compile time.

@sam-github
Copy link
Contributor

I've wanted a feature like this, too. Some node core features (such as cluster) basically do this, but they do it by using fixed env variables and check for them and trigger specific code. So in some sense its a "feature" available to core, but no way has been found to expose it to user-land. In order to factor cluster out of core, something like this would be needed.

I don't think this implementation is great, particularly the hard-coded path.

What would make more sense to me is an env variable like NODE_PRELOAD which can be set to any path, or perhaps a node --preload=FILE path, and then a NODE_OPT variable that could be set to any of the command line options.

@ofrobots
Copy link
Contributor Author

@sam-github: NODE_PRELOAD is a much better name for this, thanks. I can re-work the PR to make it an arbitrary path instead of hard-coded. I still like the idea of the env. var. more, but if there is strong appetite for a command line options, perhaps it can be in-addition-to the env. var. with the semantics that the command-line option overrides the env. var.

@cjihrig: _third_party_main lets you build stand-alone executables of your javascript program, provided as _third_party_main at node compile time. The output isn't node but rather an executable for your javascript program.

@piscisaureus
Copy link
Contributor

On startup, node checks if process.env.NODE_VENDOR_STARTUP is set. If so, the file node_prefix + '/lib/vendor/index.js' is loaded (if it exists).

The node_prefix plus something hard-coded logic seems illogical to me. Something I'd find more appealing:

export NODE_PRELOAD=foo,/bar/baz.js,quux

translates to

require('foo');
require('/bar/baz.js');
require('quux');

@ofrobots
Copy link
Contributor Author

@piscisaureus Updated as per your suggestion.

@meandmycode
Copy link

As a vendor, if you have enough control to change the command line then why not just node invoke to a proxy script that invokes the original script?

@bnoordhuis
Copy link
Member

@meandmycode asks a good question.

As to the implementation, loading arbitrary code based on an environment variable is insecure when the iojs binary is setuid root.

On the C++ side, secure_getenv() in src/node.cc protects against that. There is no JS equivalent at the moment though, due to there being no geteuid() and getegid() bindings.

@ofrobots
Copy link
Contributor Author

@meandmycode I want the preload modules to be loaded uniformly across all uses of node, including node -e 'console.log("hello world");' and uses of the repl. I think the proxy script for node/iojs would end up duplicating a lot of the command line parsing logic from node. IMO, my current patch, ignoring whitespace, is more elegant at 4-5 lines of code.

Also, I think the NODE_PRELOAD mechanism is going to be more generally useful beyond the vendor tracing/apm use-case that I describe. For example, JVM offers JVM Tool Interface (JVMTI) as a generally useful mechanism to people to implement various functionality on top of the JVM. It would be nice to be able to extend node beyond the status quo of explicitly adding code into the user's application.

See also the use-case @sam-github describes above.

@bnoordhuis Yes, you're right that setuid is a concrete problem. I had vague feelings of security issues which is why I had originally proposed a hard-coded path within the installation directory. I can look into secure_getenv and what might be involved in adding geteuid() and getegid() bindings.

@meandmycode
Copy link

Right, but how will node -e 'console.log("hello world");' know to do this if you don't pass a node_preload? are are you expecting the node_preload command to be run before hand to persist configuration permanently?

@ofrobots
Copy link
Contributor Author

NODE_PRELOAD would be an exported environment variable.

@meandmycode
Copy link

Ah sorry, I was confused on that aspect and the preload parameter. I feel like environmental variables are a bad story.

How do you target a specific io.js/node? How would it affect things like atom etc? looking adjacent to the exec path seems like a safer option but I'm obviously not as informed in this space.

@ofrobots
Copy link
Contributor Author

Environment variables are the easiest starting point for this feature, but a well rounded implementation would probably offer both environment variables and a --node-preload command line option as I proposed above.

@sam-github
Copy link
Contributor

I think there are some legitimate concerns, I'm not totally sold on this, but I do think its worth considering hard.

A couple comments:

  • just use a proxy script: currently doing so requires using undocumented node APIs, though I guess you could call that different problem.
  • env vs. command line: modifying environment allows changing the behaviour of node instances that are NOT directly started by you. This is pretty much the point of configuration via environment variables! Many of the things that can be done with a env variable can be done via command line, but we still have env vars.
  • security, setuid bits, and env: I'd be interested in hearing more about what makes node unique in this regard, and hope not to hear that its because people are regularly making it setuid! In particular, NODE_PRELOAD=X is equivalent to RUBYOPT=-rX. Ruby inherited this feature from perl, see PERL5OPT, and the various perl flags that can be passed. So, its possible these interpreters have a security hole that we shouldn't emulate... or that this isn't considered a security hole. Also, note that NODE_PATH already allows some control over the modules loaded by node via environment.

/cc @rmg

@rmg
Copy link
Contributor

rmg commented Feb 19, 2015

@sam-github thanks for the mention!

I have generally been in favour of the proxy command route, but it definitely doesn't mix well with some really useful use-cases, like -e, -p, and the REPL. Someone should probably document that module loading API..

I also question the setuid case. If you've done that to your node binary then how is loading flags from an environment variable increasing the risk when you've already decided to let people run arbitrary code with those privileges? Maybe I'm missing something.

Another option, which could be more broadly useful, is to make -e/-p/<path..> NOT mutually exclusive. The "right" way to do this would probably be adding an additional flag similar to -e that evaluates the inline script in the same context as the script being run or the REPL being presented.

For the sake of examples, I'll say -E is that -e++ flag:

  • pre-load wrapper module: node -E "require('wrapper')" my_app.js
  • super-powered repl: node -E "require('shelljs/global')"
  • browser-ish REPL: node -E 'global.window = global'

After typing up those examples, it definitely feels clunky if pre-loading modules is the problem being solved.. that clunkiness might discourage over-use, which is good or bad depending on your preferences. Maybe a -r/--require flag would be better. It is certainly useful in Ruby!

Something like a NODE_OPTS environment variably would be really useful for doing things like modifying the node/v8 flags for globally installed npm modules NODE_OPTS=--trace-deprecations foo.

@trevnorris
Copy link
Contributor

+1 for NODE_PRELOAD.

EDIT: Conceptual +1. Of course have to work through the practical issues, like security.

@rvagg
Copy link
Member

rvagg commented Feb 19, 2015

-1 to using an environment variable to inject functionality into the trusted core, think through for a moment where those variables could come from and how unlikely you are to ever check that NODE_PRELOAD exists before running the process

@sam-github
Copy link
Contributor

  • /etc/profile
  • ~/.bashrc, ~/.bash_profile, etc. (or zsh, etc., )
  • typed on the command line...

You control your environment... you better, if not, someone can just set your PATH to their version of node, or ls, right?

Again, this is something ruby and perl have, so its hard to call an all-caps security flaw.

@meandmycode
Copy link

I think those scenarios seem fine, although it feels super hacky, almost asking for a backdoor in because it's convenient for you as a host of what is someone else's app most likely. My only concern is third parties using (embedding) iojs- I don't think this behavior should exist for those instances of iojs.

@ofrobots
Copy link
Contributor Author

@rmg I like the ideas about -r and NODE_OPTS – those are indeed useful on ruby. -E sounds more flexible, but you can always create a one-liner file and -r require that instead. I guess the question is whether the most common realistic use-cases are going to one-liners (-E is better) or not (-r is better)

@rvagg I can already inject functionality into the trusted core using LD_PRELOAD. Plus what @sam-github said.

@meandmycode If the embedding scenario is an important concern, it can be dealt with a --disable-preload configure flag. As a user, though, maybe I do want to debug an embedded node.. it would be nice if I could do it without having direct access to the command line being passed to the embedded node.

Environment variables are a different kind of a tool than a command line option. Both have their valid use-cases.

@trevnorris
Copy link
Contributor

@ofrobots I believe the difference between NODE_PRELOAD and LD_PRELOAD is that the latter only injects the functions (so even if a bad function is loaded it won't necessarily be run). While the former would require actually executing JS every time. More easily allowing for exploit.

@rmg
Copy link
Contributor

rmg commented Feb 21, 2015

@trevnorris on systems that use ld for binary loading LD_PRELOAD lets you pre-resolve any function normally loaded from a dynamic library, so it's pretty easy to inject a function that is almost guaranteed to be called. This is how valgrind tracks memory leaks by replacing malloc, free, etc. Other environments have similar means of injecting dynamic libraries.

That said, NODE_PRELOAD would lower the barrier to entry for writing such malware by removing the binary requirement as well as making the attack vector cross platform.

@rvagg
Copy link
Member

rvagg commented Feb 22, 2015

As a compromise can I suggest just sticking with a commandline flag instead of env var for now, perhaps extend it later? Otherwise we might get bogged down arguing about this. I like the feature, I just don't like magic environment variables.

@cjihrig
Copy link
Contributor

cjihrig commented Feb 22, 2015

+1 for an explicit command line argument.

@trevnorris
Copy link
Contributor

I'm cool with a command line argument.

@ofrobots
Copy link
Contributor Author

Sounds good to me. I will summarize the discussion and adjust the PR in a little bit (need to be away from the 'net for a couple of days).

@sam-github
Copy link
Contributor

I'm fine with a command line argument, as well, though it doesn't address the use-case of effecting node child invocations transparently. Still, we can try it out a bit and get feedback.

@ofrobots
Copy link
Contributor Author

@sam-github Indeed. However, we don't necessarily have to conflate the need to effect node child invocations with the need to preload modules. I think an independent separate argument can be made for a more general NODE_OPT as you and @rmg have proposed.

@ofrobots
Copy link
Contributor Author

I have updated the pull request to implement preloading modules via a -r/--require command line similar to Ruby's -r. The option takes a path-delimiter-separated list of modules that are preloaded before the main script starts.

The general consensus in the discussion above seems to be that the ability to preload modules is a useful feature. There was however debate on the topic of whether this should be done via command-line options or via environment variables. Upon reflection, it seems environment variables are an orthogonal concern. It make sense to decouple these discussions. Since some valid use-cases for environment variables came up, an independent issue can be be opened on the topic of a more generic NODE_OPT environment variable.

We also discussed a non-exclusive variant of the -e option, say -E. I didn't implement it this way as, IMO, -r is a better fit if the primary use-case is going to be preloading functionality that's more than a one-liner.

@@ -2935,6 +2942,12 @@ static void PrintHelp() {
" -p, --print evaluate script and print result\n"
" -i, --interactive always enter the REPL even if stdin\n"
" does not appear to be a terminal\n"
#ifdef _WIN32
" -r, --require ';'-separated list of modules to"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this... you can use : in unix or win32 package names (unadvised, though, and if it fails to work your bug reports will be laughed at). Stick with :. Thoughts, @piscisaureus, or @orangemocha, am I too unix-centric? :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following what we do for NODE_PATH. The other alternative would be to use a comma, but I think that's even more likely to be in a package name. I'm open to suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a colon for this doesn't work on windows.
+1 to what the PR makes of it.

@sam-github
Copy link
Contributor

Looks OK to me as a feature, I didn't read the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues opened for discussions and feedbacks. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.