Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache accesses to process.env? #3104

Closed
STRML opened this issue Sep 28, 2015 · 22 comments
Closed

Cache accesses to process.env? #3104

STRML opened this issue Sep 28, 2015 · 22 comments
Labels
process Issues and PRs related to the process subsystem.

Comments

@STRML
Copy link
Contributor

STRML commented Sep 28, 2015

I've found in multiple projects that accessing process.env within a hot section of the code leads to major slowdown.

This really hurts in React server-side rendering (issue) and has caused them to rearrange how they access the env.

It would sense to cache already-accessed properties rather than reaching out to the actual environment. And of course, update the cache on assignment.

I would be happy to contribute a patch if anyone could point me in the right direction to get started, as I'm new to Node core dev.

@ChALkeR
Copy link
Member

ChALkeR commented Sep 28, 2015

See #1648.

@ChALkeR ChALkeR added process Issues and PRs related to the process subsystem. feature request Issues that request new features to be added to Node.js. and removed feature request Issues that request new features to be added to Node.js. labels Sep 28, 2015
@STRML
Copy link
Contributor Author

STRML commented Sep 28, 2015

Thanks @ChALkeR. I understand the argument for keeping it as-is, but if it is kept this way, I think it would help to document that values should be cached whenever possible as serious performance problems can result.

Alternatively, a function like process.env[freeze | unfreeze]() would be great to create this cache from inside our programs in case of misbehaving modules.

@Fishrock123
Copy link
Contributor

Is it actually possible to change the env externally once the process is running? If not, why not cache this?

@ChALkeR
Copy link
Member

ChALkeR commented Sep 28, 2015

@Fishrock123 Any native module (or a library that is used by it) could call setenv directly. Though the likelihood of this is near zero, it's still possible. And it should affect process.env.

process.env behaves as it should, and caching it on the Node.js side will break it. There is no actual problem — anything that uses it in hot code path could cache it if it's fine with that.

-1 on caching, +1 on the minor note in the docs.

While we are here, the docs for the process.env should state things a bit more clear about how does the env work, because now «changes won't be reflected outside of your process» could be misunderstood. As well as the normal env, changes affect new child processes.

@silverwind
Copy link
Contributor

Also keep in mind that there are modules that assign to process.env.

@STRML why not clone process.env to a local object once in your application?

@STRML
Copy link
Contributor Author

STRML commented Sep 28, 2015

@silverwind It's really more of a problem with child modules, like React itself (which is finally caching process.env properly new versions). And I've seen many more that do this in hot paths.

I think @ChALkeR has the right approach; I underestimated how much potential breakage there is. But documenting it properly and making it known that process.env accesses are slow would be great.

Perhaps it should have been function getter/setters, which would have been more expected to have slow behavior. Too late now of course.

@ChALkeR
Copy link
Member

ChALkeR commented Sep 28, 2015

@silverwind

Also keep in mind that there are modules that assign to process.env.

If that was the only possible way to set the env from node, then it could be covered by updating the cache in the setter. But it's not.

@Fishrock123
Copy link
Contributor

Doesn't seem worthwhile. Closing, please cache it yourself if need be. :)

@STRML
Copy link
Contributor Author

STRML commented Dec 2, 2015

Just wanted to ping this, we're having quite a hard time handling this in React, and caching NODE_ENV results in a 100% (!) speedup in benchmark code. In real-world testing, I see roughly 50-70%. It's significant.

I'm getting sick of issuing PRs to repos caching access to NODE_ENV, and there really isn't a good solution for modules that need to hide features behind debug flags and want them eliminated in bundling. If we cache the environment lookup, UglifyJS is unable to eliminate the dead code in bundling unless we use a const, which then breaks older browsers.

In reality, is it really useful to rely on external changes to the env (say, from native code)? What proportion of users really need this, and is it a valuable tradeoff compared to the performance benefits we could get ecosystem-wide?

Would the Node core contributors consider a semver-major patch? My thoughts are to make process.env a plain object of the env available at boot (still global and able to be set internally), then added either a process.refreshEnv() method or a process.liveEnv object that had the old semantics.

@Fishrock123
Copy link
Contributor

@STRML could you detail what problems you are having caching it locally?

@Fishrock123
Copy link
Contributor

@STRML Another option, make a module that does this and load it in -r/--require:

const env = process.env
delete process.env // might not be necessary
process.env = env

@STRML
Copy link
Contributor Author

STRML commented Dec 2, 2015

I've been doing:

process.env = JSON.parse(JSON.stringify(process.env));

Which seems to work fine.

Okay - so the issue with a library like React caching locally is that there are three main usage paths and two goals.

The goals:

  1. Eliminate the runtime penalty from accessing process.env in Node environments.
  2. Successfully optimize out dev/debug flags so entire branches can be removed during a minification step.

As a library author, you usually deliver pre-bundled source (dist.js, dist.min.js), built source (say, with babel, in a lib/ folder), and sometimes pre-built source (ES2015 etc).

There are three main usage paths:

  • Path 1: Pre-bundled source in the browser (or even via require('mylibrary/dist/lib.min.js')); this is the easiest, we can easily stub process.env.NODE_ENV or __DEV__ flags with constants in our own build steps. Goal 1 and 2 are easily attained.
  • Path 2. Inside NodeJS. We need to have a build step from src/ to lib/ that's replacing __DEV__ flags or calls to process.env.NODE_ENV to cache them so process.env.NODE_ENV is not accessed in hot functions. That's what this PR does. You can do the same by simply hoisting yourself in smaller projects. Goal 1 can be attained with some fiddling, goal 2 is irrelevant.
  • Path 3. Through browserify/webpack by bundling source. This presents a problem in combination with Path 2 because we've just hoisted our env accesses, so the minifier needs to be smart enough to still be able to do dead code elimination. UglifyJS is smart enough to do this, but only if you use a const declaration, which is not portable to < IE11, so it's not viable in most situations. It will not be able to figure out:
// This would work if it were a const
var NODE_ENV = process.env.NODE_ENV;
//...
if(NODE_ENV !== 'production') { // this will not be eliminated
  // debug code
}

Closure compiler has a /** @const */ comment which works. I want to port this to UglifyJS which would solve the problem.

As a result, if we solve Path 2 (NodeJS), we produce larger bundles for Path 3 (Webpack/Browserify) with UglifyJS. In the case of React, the bundle can be as much as 40% larger gzipped, so it's very significant (38 vs 53kb).

If process.env didn't have this performance penalty, then we could just go on writing if (process.env.NODE_ENV !== 'production'), as most library authors already do, envify and Uglify can handle this, and there'd be no problem.

@vkurchatkin
Copy link
Contributor

Is there a real reason to use env for this? You can use global.__REACT_ENV the same way

@STRML
Copy link
Contributor Author

STRML commented Dec 2, 2015

It's possible. It appears to be the attitude of the devs that they want to avoid polluting globals whenever possible.

Despite that, this is a more widespread problem that doesn't just affect React.

@jtlapp
Copy link

jtlapp commented Jun 1, 2016

The performance penalty is still not mentioned in the docs. On the contrary, because changes don't affect the shell, the docs seem to imply that env is just like any other JS object.

@bnoordhuis
Copy link
Member

because changes don't affect the shell

They do, child processes created with the child_process module inherit the modified environment unless told otherwise.

Are you thinking of seeing changes to process.env reflected in the parent shell? That's not how setenv(3) and friends work. No program works like that.

@jtlapp
Copy link

jtlapp commented Jun 1, 2016

Sorry, I'm not much of a shell programmer. I just would rather not have to discover the performance penalty when it's a problem. Why can't the docs say reading process.env makes system calls (if that's what it's doing)? Why do we have to discover that process.env is not a normal JS object the hard way?

@bnoordhuis
Copy link
Member

We accept pull requests, you're welcome to have a stab at improving the documentation.

Having said that, any JS object can be "magic" and really be a bunch of C++ code; process.env is not special in that respect.

@jtlapp
Copy link

jtlapp commented Jun 1, 2016

I'd have to investigate the node.js code to have confidence making a statement. Several people in this discussion are already claiming to know how it works. I'm hoping that someone who wants to be helpful wouldn't mind taking a minute to update the repo they already have checked out.

gcanti pushed a commit to gcanti/io-ts that referenced this issue Nov 28, 2017
Just a small documentation update, if you're open to it. Suggesting that `NODE_ENV` be "cached" in the example `unsafeValidate` method.

Reasoning / discussions here:
* nodejs/node#3104
* nodejs/node#1648

Unless you'd expect `NODE_ENV` to be editable while the application is running (debugging on a live server, perhaps?)
@JDiPierro
Copy link

It looks like Node is caching process.env nowadays:

⌚ 15:46:00
$ export FOO=one

⌚ 15:46:07
$ export BAR=two

⌚ 15:46:09
$ export BAZ=three

⌚ 15:46:12
$ node
> console.time("logEnv"); console.log(process.env.FOO); console.timeEnd("logEnv");
one
logEnv: 0.849ms
undefined

> console.time("logEnv"); console.log(process.env.BAR); console.timeEnd("logEnv");
two
logEnv: 0.224ms
undefined

> console.time("logEnv"); console.log(process.env.BAZ); console.timeEnd("logEnv");
three
logEnv: 0.076ms
undefined

> console.time("logEnv"); console.log(process.env.FOO); console.timeEnd("logEnv");
one
logEnv: 0.085ms
undefined

> console.time("logEnv"); console.log(process.env.BAZ); console.timeEnd("logEnv");
three
logEnv: 0.077ms
undefined

> console.time("logEnv"); console.log(process.env.BAR); console.timeEnd("logEnv");
two
logEnv: 0.085ms
undefined
$ node --version
v10.6.0

@a-stepanenko
Copy link

I'd like to warn you that previous comment is not the case. Node.js does not cache env vars, so each time you use process.env.VAR you actually call getnev(). This might be a performance issue if you have lots of env vars (this is especially important in case you run your node.js app in k8s since k8s creates env vars for all endpoints in a namespace, see kubernetes/kubernetes#60099 for details)

Proof (running node.js which prints some env var in an endless loop, attaching to the node.js process via gdb and changing this var):

root@it100msk:~# nodejs -v
v10.11.0
root@it100msk:~# cat test.js
while (true) {
    console.log(process.env.var1);
    Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, 3000);
}
root@it100msk:~# export var1=1
root@it100msk:~# nodejs test.js
1
1
1
1
1

another console on the same host
root@it100msk:~# ps auxf | grep node
root     20226  0.0  0.0  15648  1032 pts/2    S+   02:16   0:00  |       \_ grep --color=auto node
root     20200  0.6  0.1 557612 30580 pts/1    Sl+  02:16   0:00          \_ nodejs test.js
root@it100msk:~# gdb
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) attach 20200
Attaching to process 20200
[New LWP 20201]
[New LWP 20202]
[New LWP 20203]
[New LWP 20204]
[New LWP 20205]
[New LWP 20206]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f8cc38b4ed9 in futex_reltimed_wait_cancelable (private=<optimized out>, reltime=0x7ffe46e9cd00, expected=0, futex_word=0x4498250) at ../sysdeps/unix/sysv/linux/futex-internal.h:142
142	../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) call putenv("var1=2")
$1 = 0
(gdb) detach
Detaching from program: /usr/bin/node, process 20200
(gdb) quit

back to the first console
2
2
2
2
2

dirkdev98 added a commit to compasjs/compas that referenced this issue Oct 29, 2020
Removes the 'log' export from 'insight'.

Accessing process.env.XXX is relatively slow in Node.js.
Benchmark of a plain object property access and accessing process.env.NODE_ENV:

```
property access       500000000  iterations     0  ns/op
process.env access      5000000  iterations   246  ns/op
```

See this thread nodejs/node#3104 for some more context
dirkdev98 added a commit to compasjs/compas that referenced this issue Oct 29, 2020
Removes the 'log' export from 'insight'.

Accessing process.env.XXX is relatively slow in Node.js.
Benchmark of a plain object property access and accessing process.env.NODE_ENV:

```
property access       500000000  iterations     0  ns/op
process.env access      5000000  iterations   246  ns/op
```

See this thread nodejs/node#3104 for some more context
Aschen added a commit to kuzzleio/kuzzle that referenced this issue Jan 29, 2021
Highlight user code in internal stacktrace.

Other Changes

    Avoid to use process.env at runtime because it does an expensive syscall (CF: nodejs/node#3104 (comment))

Boyscout

    export Global in a TS file to avoid the necessity of building the app to run it
@AlexandriaOL
Copy link

AlexandriaOL commented Jan 2, 2022

Sorry, I'm not much of a shell programmer. I just would rather not have to discover the performance penalty when it's a problem. Why can't the docs say reading process.env makes system calls (if that's what it's doing)? Why do we have to discover that process.env is not a normal JS object the hard way?

I'm writing here just to clarify this fundamental misunderstanding of how UNIX Environments work, for people who read this in the future (I came to this from a thread on an internet forum).

The Shell Environment is defined in the SYSV ABI Specification, next to or in the same section where the auxillary (auxv) and argument (argv) vectors are defined. Upon shell creation, Enviroments are copied from the parent process into a C-array called envp, which is logically after the arguments vector. You can subvert this when you create a process using the exceve (Called such because it is exec, with arg(V) and (E)nvp) system call. Typically, programs pass their environment down to their children. Environments are an array of strings, with the form NAME=VAR.

Typically, because of it's inherent immutability (disregarding setenv), and it's stringly-typed nature, you will want to access this once at program start, and then never again. setenv (which sets an environment value) is probably the most useless function there is in the POSIX standard, because there are better ways to do whatever you are trying to do !

e.g. if you want to communicate an environment to your child, then pass an environment explicitly to your child; if you want to communicate to other threads, keep in mind that setenv is not Atomic and therefore you are likely to get a corrupted read at some point, and use a different (and safer) method of threadful communication; if you want to communicate to yourself, use a global variable -- they're free and come with type safety, too.

EDIT: An addendum. One typical example of wanting to use setenv is to do something like:

setenv("some_internal_value", to_json(some_data), 1);
(your fork/execvp code here)

and then in the child:

some_data = getenv("some_internal_value");

However this is a security risk detailed in the CERT C Coding Guidelines (Which are equally applicable in this instance! Probably one of the few instances where the semantics of Windows/Linux, C, POSIX, and Javascript are all applicable !). https://wiki.sei.cmu.edu/confluence/display/c/ENV03-C.+Sanitize+the+environment+when+invoking+external+programs and https://wiki.sei.cmu.edu/confluence/display/c/ENV02-C.+Beware+of+multiple+environment+variables+with+the+same+effective+name :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
process Issues and PRs related to the process subsystem.
Projects
None yet
Development

No branches or pull requests

10 participants