Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy stat dates #12818

Closed
wants to merge 5 commits into from
Closed

Lazy stat dates #12818

wants to merge 5 commits into from

Conversation

sciolist
Copy link
Contributor

@sciolist sciolist commented May 3, 2017

See #12607

In order to improve performance when using fs *stat-functions, we can delay creation of the 4 date-objects until they are used.

This is a semver-major change:

  • Object.keys(statResult), Object.getOwnPropertyNames/Descriptors(statResult) will no longer return atime, mtime, ctime, birthtime, and WILL return _atime, _mtime, _ctime, _birthtime, _atim_msec, _mtim_msec, _ctim_msec, _birthtim_msec
  • for(x in statResult){} will iterate over _atime, _mtime, _ctime, _birthtime, _atim_msec, _mtim_msec, _ctim_msec, _birthtim_msec
Benchmark
Accessing zero time fields:
                                                 improvement confidence      p.value
 fs/bench-statSync.js kind="fstatSync" n=1000000    222.82 %        *** 5.526432e-46
 fs/bench-statSync.js kind="lstatSync" n=1000000     57.40 %        *** 1.980913e-30
 fs/bench-statSync.js kind="statSync" n=1000000      57.91 %        *** 4.996535e-43

Accessing one time field:
                                                 improvement confidence      p.value
 fs/bench-statSync.js kind="fstatSync" n=1000000    112.30 %        *** 1.836060e-48
 fs/bench-statSync.js kind="lstatSync" n=1000000     33.09 %        *** 4.143412e-29
 fs/bench-statSync.js kind="statSync" n=1000000      34.87 %        *** 6.003936e-33

Accessing two time fields:
                                                 improvement confidence      p.value
 fs/bench-statSync.js kind="fstatSync" n=1000000     54.61 %        *** 4.500304e-40
 fs/bench-statSync.js kind="lstatSync" n=1000000     22.35 %        *** 6.087184e-21
 fs/bench-statSync.js kind="statSync" n=1000000      21.28 %        *** 1.345810e-21

Accessing three time fields:
                                                 improvement confidence      p.value
 fs/bench-statSync.js kind="fstatSync" n=1000000     31.65 %        *** 7.849068e-35
 fs/bench-statSync.js kind="lstatSync" n=1000000     10.82 %        *** 2.023683e-18
 fs/bench-statSync.js kind="statSync" n=1000000      12.19 %        *** 6.572484e-27

Accessing four time fields:
                                                 improvement confidence      p.value
 fs/bench-statSync.js kind="fstatSync" n=1000000      4.47 %        *** 1.026716e-16
 fs/bench-statSync.js kind="lstatSync" n=1000000      1.16 %            1.381009e-01
 fs/bench-statSync.js kind="statSync" n=1000000       1.91 %         ** 6.071880e-03
Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

lib

@nodejs-github-bot nodejs-github-bot added the fs Issues and PRs related to the fs subsystem / file system. label May 3, 2017
@vsemozhetbyt
Copy link
Contributor

CI: https://ci.nodejs.org/job/node-test-pull-request/7840/

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented May 3, 2017

Linter:

fs.js
  266:2  error  Missing semicolon  semi

@mscdex mscdex added performance Issues and PRs related to the performance of Node.js. semver-major PRs that contain breaking changes and should be released in the next major version. labels May 3, 2017
@mscdex
Copy link
Contributor

mscdex commented May 3, 2017

Perhaps if we used symbols instead of underscore-prefixed properties this wouldn't be semver-major?

@refack
Copy link
Contributor

refack commented May 4, 2017

Perhaps if we used symbols instead of underscore-prefixed properties this wouldn't be semver-major?

It definatly needs further thinking 🤔 since all test pass, and it is semantically equivalent...
IMHO we should expose the raw numbers, and that would be semver-minor and solve #8276

@refack
Copy link
Contributor

refack commented May 4, 2017

@sciolist 1 test fails on Windows

test\parallel\test-fs-stat.js
stating:  c:\workspace\node-test-binary-windows\RUN_SUBSET\0\VS_VERSION\vcbt2015\label\win10\test\parallel\test-fs-stat.js
Stats {
  dev: 2329316316,
  mode: 16822,
  nlink: 1,
  uid: 0,
  gid: 0,
  rdev: 0,
  blksize: undefined,
  ino: 119908340078909120,
  size: 0,
  blocks: undefined,
  _atim_msec: 1493853008513,
  _mtim_msec: 1493853008513,
  _ctim_msec: 1493853008513,
  _birthtim_msec: 1492182300137 }
Stats {
  dev: 2329316316,
  mode: 33206,
  nlink: 1,
  uid: 0,
  gid: 0,
  rdev: 0,
  blksize: undefined,
  ino: 41939771530044640,
  size: 3910,
  blocks: undefined,
  _atim_msec: 1493852676143,
  _mtim_msec: 1493852676150,
  _ctim_msec: 1493852676150,
  _birthtim_msec: 1492182460203 }
isDirectory: false
isFile: true
isSocket: false
isBlockDevice: false
isCharacterDevice: false
isFIFO: false
isSymbolicLink: false
(node:168) [DEP0013] DeprecationWarning: Calling an asynchronous function without callback is deprecated.

assert.js:86
throw new assert.AssertionError({
^
AssertionError: false == true
at c:\workspace\node-test-binary-windows\RUN_SUBSET\0\VS_VERSION\vcbt2015\label\win10\test\parallel\test-fs-stat.js:116:12
at Array.forEach (native)
at c:\workspace\node-test-binary-windows\RUN_SUBSET\0\VS_VERSION\vcbt2015\label\win10\test\parallel\test-fs-stat.js:115:8
at c:\workspace\node-test-binary-windows\RUN_SUBSET\0\VS_VERSION\vcbt2015\label\win10\test\common.js:462:15
at FSReqWrap.oncomplete (fs.js:153:5)

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI & CITGM are happy, and I’d prefer semver-major too

lib/fs.js Outdated
this._atim_msec = atim_msec;
this._mtim_msec = mtim_msec;
this._ctim_msec = ctim_msec;
this._birthtim_msec = birthtim_msec;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might as well make these public? ¯\_(ツ)_/¯ If not, I agree that using Symbols may be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the Symbol approach for these.

@mscdex
Copy link
Contributor

mscdex commented May 4, 2017

IMHO we should expose the raw numbers

I would rather replace rather than add duplicate public properties.

@sciolist
Copy link
Contributor Author

sciolist commented May 4, 2017

@mscdex It would still be semver-major unless we can force the atime/mtime/etc properties to show up in Object.keys.. I think that's going to be a bigger issue. I have tried using Symbols and wind up with (compared to underscores):

                                                      improvement confidence      p.value
 fs-stat/bench-statSync.js kind="fstatSync" n=1000000    -39.85 %        *** 7.313265e-10
 fs-stat/bench-statSync.js kind="lstatSync" n=1000000    -24.85 %        *** 1.384484e-10
 fs-stat/bench-statSync.js kind="statSync" n=1000000     -24.53 %        *** 5.654291e-09

(compared to underscore-prefixed), so it's doable, but it does erase about half the performance improvement.

The preferable approach would be just changing.. this.atime = new Date(atim_msec); to this.atime = atim_msec;, and then the caller could make a date out of it if they wanted.. but, that's a pretty significant change in behavior.

return this._atime !== undefined ?
this._atime :
(this._atime = new Date(this._atim_msec + 0.5));
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can eliminate some code duplication here by having a factory function for the getters... e.g.

function makeGetter(name, src) {
  return function() {
    return this[name] !== undefined ?
      this[name] : (this[name] = new Date(this[src + 0.5));
  };
}

// ...

Object.defineProperties(Stats.prototype, {
  atime: {
    configurable: true,
    enumerable: true,
    get: makeGetter('_atime', '_atim_msec'),
    // etc
  }
});

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea here would be to use another defineProperty within the getter to replace the value once the date is created...

var m = {};
Object.defineProperty(m, 'a', {
  configurable: true,
  enumerable: true,
  get() {
    const val = 1;
    delete this.a;
    Object.defineProperty(this, 'a', { configurable: true, enumerable: true, value: val } );
    return val;
  }
});

The getter is called once, during which time it is deleted, and the static value is set... and will be returned every time after. Should have a slightly better performance profile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TimothyGu suggested that in the previous PR, it wasn't performant enough for the "single access" case - #12607 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's fine then. I'd still suggest the factory function to avoid the code duplication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, yeah I have tried these variations.

Unfortunately a single defineProperty is more expensive than 4 date creations, so you'd lose the benefits of the laziness if you need to access even a single date.

this[somevariable] is slow enough compared to this.mtime that the difference becomes pretty significant in a function like stat, that isn't very heavy to begin with. That drops accessing one time-field twice (my test is if (stat.ctime.getTime() !== stat.ctime.getTime()) throw err;) to a 65% improvement over the non-lazy code, down from a 105% improvement if done the ugly way. But I can definitely change to that if it's preferable.. I didn't write it this way because I think it's beautiful. :)

@sciolist
Copy link
Contributor Author

sciolist commented May 4, 2017

@mscdex If you'd like to run the test suite on it, I can push in a change where we just change to.. this.mtime = mtim_msec;, etc.. I have a feeling it would cause a few issues.

@mscdex
Copy link
Contributor

mscdex commented May 4, 2017

My current preference is this:

For v8.x and older: use symbols to hide the raw values better and use the getters to return Date instances like we currently have.

For v9.x (or whatever future major version) and newer: remove the getters and change the symbols to normal properties, using the same names as the old Date-based properties. This means the atime, mtime, etc. values are now all number values.

@TimothyGu
Copy link
Member

TimothyGu commented May 4, 2017

I have tried using Symbols

Can you detail what changes (git diff) you have tried with symbols? I might be wrong of course, but from my experiences symbols are pretty much as fast as string properties these days.


As the PR currently stands, there are multiple issues I have with it. In fact, I think after resolving all the issues below it might just be better for us to keep the code as it is.

  1. Unless it is for compatibility or absolute necessity, I prefer not to see any more underscored properties in core, worse of all enumerable ones. It is very tempting to use the underscored properties when there are no real alternatives to those private APIs, and those properties have caused us all sorts of compatibility problems in the past. Alternatives such as symbols should be used as much as possible, which is why I want to be sure there is really nothing we can do to use symbols before calling it off.

  2. There are many code fragments out there that call Object.keys(stat), e.g. to clone stat objects. Now instead of seeing birthtime etc. they would see _birthtime_msec and _birthtime (the latter only some of the time; see issue 3 below). Not only can this be unexpected, it is ironically necessary to expose those private properties for the code to keep working, on the flip side from issue 1 above.

  3. If we were to go forward with this PR despite the issues above, and if it is impossible to use Object.defineProperty due to performance concerns, I would like to see _*time properties used by the setters initialized to undefined in the constructor, so that V8 can more efficiently optimize hidden class accesses. It would also remove the nondeterminism in calling Object.keys(stat).

  4. if we were to go forward with this PR, we should rethink how best we can provide stat(3) API. In fact, I think of using Date for the timestamp properties to be a mistake in the first place. IMO the job of core is to provide system APIs in a bare-minimum way, while preserving as much precision as possible but also make it consumable in JS, much like Web APIs. In this case, Web APIs such as File.prototype.lastModified and the DOMTimeStamp Web IDL type consistently use integers instead of Dates. I do think @addaleax's proposal on making the msec properties public as a good idea.


There are alternatives out there from lazy-creation of Date objects in JS. One way is using V8 API's v8::ObjectTemplate API to create the entire object in C++. Using v8::Template::SetLazyDataProperty() we can accomplish much of the same things we are doing in JS in C++, but also while hiding the guts (i.e. the underscore-prefixed properties). However, I am not sure at all if or how this is going to improve performance. Tried it, is slower.

@mscdex
Copy link
Contributor

mscdex commented May 4, 2017

@TimothyGu FWIW in the past when working on a node addon I observed ObjectTemplate usage actually slowing things down compared to just instantiating a new Object directly. Perhaps that has changed in recent V8 versions though...

@mscdex
Copy link
Contributor

mscdex commented May 4, 2017

If the symbols usage really does cause that much of a performance regression, then I would rather not do this at all and just wait until v9.x or later to switch from Dates to number values.

@TimothyGu
Copy link
Member

@mscdex

I observed ObjectTemplate usage actually slowing things down compared to just instantiating a new Object directly

Interesting. I've never really used it myself before so I'll take your word for it, though for this issue I'm really talking about the (new) v8::Template::SetLazyDataProperty() which only calls C++ accessor function the first time, and caches the result in JS thereafter.

For v9.x (or whatever future major version) and newer: remove the getters and change the symbols to normal properties, using the same names as the old Date-based properties. This means the atime, mtime, etc. values are now all number values.

I'd be more comfortable if those properties are suffixed with Msec and the current properties are removed, but would that cause more breakage than just altering the type? Hmm…

@mscdex
Copy link
Contributor

mscdex commented May 4, 2017

but would that cause more breakage than just altering the type?

I doubt it. It would probably be about the same I imagine.

@refack
Copy link
Contributor

refack commented May 4, 2017

@mscdex why not add the raw number as full properties (no underscore) and have it semver-minor.
Orthogonality is we feel it's duplicate data, deprecate the Dates for removal or for being non-enumerated-just-halper-getters?

@sciolist
Copy link
Contributor Author

sciolist commented May 4, 2017

@TimothyGu I agree that using symbols is usually not an issue.. and in this case it's not the symbols that are a problem, it's using bracket notation to access the properties vs dot notation (using strings with brackets gives nearly exactly the same results.) Diff:

diff --git a/lib/fs.js b/lib/fs.js
index e834f25..d8acf2e 100644
--- a/lib/fs.js
+++ b/lib/fs.js
@@ -170,6 +170,16 @@ function isFd(path) {
   return (path >>> 0) === path;
 }
 
+const atim_msec_sym = Symbol('_atim_msec');
+const mtim_msec_sym = Symbol('_mtim_msec');
+const ctim_msec_sym = Symbol('_ctim_msec');
+const birthtim_msec_sym = Symbol('_birthtim_msec');
+
+const atime_sym = Symbol('_atime');
+const mtime_sym = Symbol('_mtime');
+const ctime_sym = Symbol('_ctime');
+const birthtime_sym = Symbol('_birthtime');
+
 // Constructor for file stats.
 function Stats(
     dev,
@@ -196,54 +206,37 @@ function Stats(
   this.ino = ino;
   this.size = size;
   this.blocks = blocks;
-  this._atim_msec = atim_msec;
-  this._mtim_msec = mtim_msec;
-  this._ctim_msec = ctim_msec;
-  this._birthtim_msec = birthtim_msec;
+  this[atim_msec_sym] = atim_msec;
+  this[mtim_msec_sym] = mtim_msec;
+  this[ctim_msec_sym] = ctim_msec;
+  this[birthtim_msec_sym] = birthtim_msec;
+  this[atime_sym] = undefined;
+  this[mtime_sym] = undefined;
+  this[ctime_sym] = undefined;
+  this[birthtime_sym] = undefined;
 }
 fs.Stats = Stats;
 
-Object.defineProperties(Stats.prototype, {
-  atime: {
-    configurable: true,
-    enumerable: true,
-    get() {
-      return this._atime !== undefined ?
-          this._atime :
-          (this._atime = new Date(this._atim_msec + 0.5));
-    },
-    set(value) { return this._atime = value; }
-  },
-  mtime: {
-    configurable: true,
-    enumerable: true,
-    get() {
-      return this._mtime !== undefined ?
-          this._mtime :
-          (this._mtime = new Date(this._mtim_msec + 0.5));
-    },
-    set(value) { return this._mtime = value; }
-  },
-  ctime: {
-    configurable: true,
-    enumerable: true,
-    get() {
-      return this._ctime !== undefined ?
-        this._ctime :
-        (this._ctime = new Date(this._ctim_msec + 0.5));
-    },
-    set(value) { return this._ctime = value; }
-  },
-  birthtime: {
+function makeStatTime(cacheSym, msecSym) {
+  return {
     configurable: true,
     enumerable: true,
     get() {
-      return this._birthtime !== undefined ?
-        this._birthtime :
-        (this._birthtime = new Date(this._birthtim_msec + 0.5));
+      let value = this[cacheSym];
+      if (value === undefined) {
+        value = this[cacheSym] = new Date(this[msecSym] + 0.5);
+      }
+      return value;
     },
-    set(value) { return this._birthtime = value; }
-  },
+    set(value) { return this[cacheSym] = value; }
+  };
+}
+
+Object.defineProperties(Stats.prototype, {
+  atime: makeStatTime(atime_sym, atim_msec_sym),
+  mtime: makeStatTime(mtime_sym, mtim_msec_sym),
+  ctime: makeStatTime(ctime_sym, ctim_msec_sym),
+  birthtime: makeStatTime(birthtime_sym, birthtim_msec_sym),
 });
 
 Stats.prototype.toJSON = function toJSON() {

After some testing, if I use a local variable to avoid hitting this[cacheSym] twice it's only 15-20% slower than the dot-notation version (when accessing two of the time fields, twice each).. and it's a bit less ugly than the ternary.

I agree with all your 4 points, using underscores is ugly, Date should probably never have been used here. I would expect this PR as is to break a fair amount of user code.

It'd be interesting to see if SetLazyDataProperty() would solve this problem better. Would require a bit more effort to solve, since Stats has its weird global mutating array now. I have also tried replacing the Dates on Stats with another class that only creates the actual Date instance when you attempt to use a Date function.. that way everything behaved the same as it does today with Object.keys and no underscore-props or anything, but, it's awfully ugly to fake a date class. The performance of that was about the same as using symbols, i think.

I'd be more comfortable if those properties are suffixed with Msec and the current properties are removed, but would that cause more breakage than just altering the type? Hmm…

I'd say adding properties throwing deprecation errors could work. just removing the current properties would probably lead to a lot of problems as well.. let mtime = +statresult.mtime; would give you NaN, that's a pretty common way of using the stat results in my experience, and NaN bugs aren't that much fun to track down.

@refack
Copy link
Contributor

refack commented May 4, 2017

@TimothyGu

There are many code fragments out there that call Object.keys(stat)

I read through a 100 or so instances, there's mainly cloning using https://www.npmjs.com/package/clone-stats. I did not see anything that would obviously break. We also ran CITGM, and it seemed fine (still digging through the logs)

In fact, I think of using Date for the timestamp properties to be a mistake in the first place.

👍

I do think @addaleax's proposal on making the msec properties as a good idea.

Is there reason not to just add them in addition to the "legacy" Date values?

@mscdex
Copy link
Contributor

mscdex commented May 4, 2017

why not add the raw number as full properties (no underscore) and have it semver-minor

Because I don't think that having the duplicate data is useful. It especially doesn't make much sense if the plan is to eventually remove the Date values anyway. We may as well just change the existing property values in a semver-major and be done with it in a single change. I also prefer the existing names as opposed to having properties like mtimeMsec or whatever.

@addaleax
Copy link
Member

addaleax commented May 4, 2017

@mscdex I really doubt it’s feasible to remove the Date fields in any major change – that’s going to break too many people’s code :/

@mscdex
Copy link
Contributor

mscdex commented May 4, 2017

@addaleax ok, what if we add doc and then runtime deprecations at some point to notify of the upcoming data type change?

@addaleax
Copy link
Member

addaleax commented May 5, 2017

@mscdex We’d have to do it anyway, but I still think this would cause too much trouble :/ Sorry for being so negative here, I know “this won’t work” isn’t helpful, but I really think it won’t.

@refack
Copy link
Contributor

refack commented May 5, 2017

I'm also pessimistic... that's why I'd rather have the new members, and compatibility lazy getters, it's a reasonable compromise.... We get the raw number, and don't pay for the Dates unless needed...

@TimothyGu
Copy link
Member

FWIW I tried using ObjectTemplate for Stats objects, and indeed it's not very performant even compared to the status quo.

@addaleax addaleax reopened this May 31, 2017
@addaleax
Copy link
Member

This needs a rebase. 😄

@sciolist
Copy link
Contributor Author

sciolist commented Jun 2, 2017

rebased.

So if this is going to happen, we'd need a way of keeping the fields around in Object.keys and Object.getOwnProp*.

The somewhat reasonable options I can think of:

  • Look further into using v8::Template to see if there's a way to speed that up. I have absolutely no idea if that would work or not.
  • Make a wrapping class or proxy for Dates that only instantiates when they're called. This would work and ought to reduce the overhead for anyone who uses the *timeMs-fields, but is isnt pretty.
  • Beg v8 to make a faster Date constructor. :)

It would be nice with some real world benchmark thats particularly stat heavy, to see if any potential optimization seems worthwhile.

@refack
Copy link
Contributor

refack commented Jun 2, 2017

My intuition is that v8::Template may help, I'll look into it.
As for a "real-world" use case #12818 (comment) suggests node-tar. Although I'd be happy even with a 0% performance change since it will give us a code point to deprecate those fields.

@TimothyGu
Copy link
Member

FWIW I already tried ObjectTemplate in https://gist.github.com/TimothyGu/c70e7cb557d1290b5c2150111eb95a01 with unsatisfactory results. Of course (generic) you are welcome to explore further but from @mscdex's experiences it might not be a fruitful venture.

refack added a commit to refack/node that referenced this pull request Jun 4, 2017
* convert ’ to ' to turn md file to ASCII

Fixes: nodejs#8276
Refs: nodejs#12607
Refs: nodejs#12818
Refs: nodejs#13256
@refack refack self-assigned this Jun 5, 2017
@refack refack dismissed stale reviews from addaleax, jasnell, and themself June 5, 2017 22:29

PR reopened after landed & reverted

refack added a commit to refack/node that referenced this pull request Jun 7, 2017
PR-URL: nodejs#13173
Fixes: nodejs#8276
Refs: nodejs#12607
Refs: nodejs#12818
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Brian White <mscdex@mscdex.net>
jasnell pushed a commit that referenced this pull request Jun 7, 2017
PR-URL: #13173
Fixes: #8276
Refs: #12607
Refs: #12818
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Brian White <mscdex@mscdex.net>
@TimothyGu
Copy link
Member

I think it's pretty clear that at this point there isn't a way to do lazy dates without compromising compatibility or performance. I'll close this PR.

@sciolist Thanks a lot for creating this PR and following through on our requests. Sorry it didn't work out at the end, but I guess we all learned something from this process 🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. performance Issues and PRs related to the performance of Node.js. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants