Performance regression when enumerating objects (for shallow cloning) v12.15.0 to v12.16.1 #32049

dominykas · 2020-03-02T16:52:02Z

Version: 12.16.1
Platform: Linux; OSX
Subsystem: V8

What steps will reproduce the bug?

The following code became slightly slower going from v12.15.0 to v12.16.1 (likely 12.16.0).

'use strict';

const NUM_METHODS = 250;
const ITERATIONS = 10000;

const obj = {};

for (let i = 0; i < NUM_METHODS; ++i) {

    obj[`item${i}`] = {};
}

console.time();

for (let i = 0; i < ITERATIONS; ++i) {
    const res = Object.assign({}, obj); // { ...obj } performs worse by a similar margin
}

console.timeEnd();

Additional information

I'm still not 100% sure this accounts for all of the slowdowns while looping through object keys that I'm seeing and I'm working on a further reduced test case, but the results here are very consistent.

Adding more items into the object makes things slower, seemingly linearly.

Note that this is quite similar to #31961, however the fix on master in 3d894d0 resolves #31961, but not this issue.

The text was updated successfully, but these errors were encountered:

mmarchini · 2020-03-03T00:28:30Z

I ran some tests and found about 5-7% overhead on v12.16.1 (compared to v12.15.0). Is that coherent with your results? If so, this is a small regression and it would be hard to perceive it in a real-world application (in comparison, #31961 will make the operations hundreds of times slower, which will be easier to perceive in an application).

Have you noticed a regression on an application after upgrading from v12.15.0 to v12.16.1? If so, can you share how much overhead you noticed? It might be something else (not covered by this test case).

dominykas · 2020-03-03T10:03:40Z

Have you noticed a regression on an application after upgrading from v12.15.0 to v12.16.1? If so, can you share how much overhead you noticed? It might be something else (not covered by this test case).

Yes, we have noticed a slow down in an application, which I have pin pointed to looping through object keys for shallow cloning. When I run one of the unit tests, the unit test has a slowdown from ~2100ms to ~2800ms. So the difference is really perceivable and it seems like it's orders of magnitude.

That said, the test case that I produced here is really simple, and the result that I'm seeing can be influenced by a multitude of other factors, which I'm still trying to work out - so yeah, there definitely can be something else, but it is also definitely related to shallow cloning as changing that to an assignment makes the issue go away.

I'll be working to build up a better reduced case...

mmarchini · 2020-03-03T15:40:14Z

the unit test has a slowdown from ~2100ms to ~2800ms

That's a 33% slowdown, so even if you're affected by this issue, it's unlikely to be causing all the overhead.

If you can provide a test case closer to your unit tests (without sensitive code), we can help pinpoint the issue. Or maybe a CPU profile of your unit tests (either with --cpu-prof, --prof or Linux perf).

dominykas · 2020-03-05T11:10:19Z

That's a 33% slowdown, so even if you're affected by this issue, it's unlikely to be causing all the overhead.

Not exactly. The unit test (call it an integration test if you will - it exercises an endpoint and goes through the whole app) in question does a lot lot more than just the shallow cloning, and there's definitely slow downs in other parts of it.

However, there is one method, which does shallow cloning of two objects (each with 100-200 keys). That method gets called ~1000 times. The grand total (sum of durations based on process.hrtime()) for the time spent just inside that method is:

12.15.0: 35ms
12.16.1: 170ms

(For comparison: if I remove the shallow cloning, the method in both cases takes 2-3ms total)

So that's a 5x slowdown just there. I think the rest of the slowdown is fairly spread out across the rest of the system as it isn't obvious from the flame charts, but I'll have to investigate that too at some point.

If you can provide a test case closer to your unit tests

That's the tricky part, right? :) I can't figure out what is special about the objects being cloned, because the reduced case does not exhibit the same behavior. There is no deopt in the method, nothing to grab onto... It might also be a consequence of some other conditions? Possibly GC kicking in aggressively or something like that?

Anyways, working on this...

Or maybe a CPU profile of your unit tests

I'm definitely not able to share the CPU profile - any tips on what I should be looking for inside? That's the same data as you can see in dev tools or clinic flame, right?

dominykas · 2020-03-05T17:02:17Z

OK, so --trace_gc does log 3x more messages on 12.16.1 and memoryUsage() is also higher:

15:

{
  rss: 188 964 864,
  heapTotal: 133 545 984,
  heapUsed: 111 016 104,
  external: 1 729 661
}

16:

{
  rss: 271 081 472,
  heapTotal: 225 595 392,
  heapUsed: 200 679 632,
  external: 1 771 952
}

If I skip the shallow cloning and return the same instance, I get roughly the same amount of GC messages log, and the memory usage is a lot closer:

15:

{
  rss: 188 424 192,
  heapTotal: 131 973 120,
  heapUsed: 107 757 448,
  external: 1 737 081
}

16:

{
  rss: 189 775 872,
  heapTotal: 144 068 608,
  heapUsed: 118 096 208,
  external: 1 780 389
}

I think the script that I have at the top post also exhibits higher memory usage - not sure if any of this is related, but figured it's worth testing/sharing...

I also started looking at various V8 options that I could play with and I've no idea what --trace-for-in-enumerate does, but it produces 2x more logs on 12.16.1 than on 12.15.0.

mmarchini · 2020-03-05T17:22:03Z

I wonder if this is a memory leak. You could take a heap snapshot on each version and compare using Chrome DevTools to see which objects are being retained.

Also, are you able to test on v13.10.1? Would be good to double check if the issue is already fixed or not.

dominykas · 2020-03-06T16:40:13Z

v13.10.1 still has the problem, and so does latest master.

Next week I'll try to work on a better case to reproduce this, but I spotted slowdown (~3x) in other areas where we use various methods of cloning. The one place I did poke does also show increased time for GC.

I have not yet checked the memory snapshot diffs.

I have however tried a couple of experiments:

There's some awaits in our unit test - removing those makes the memory usage more stable (expected I suppose), so @mcollina suggested that it could have to do with objects getting moved between new/old space (if I got it right) due to something making the object lifetime tick over some threshold?
So I tried deliberately slowing down the shallow clone operation (by running it 50 times) - doing that makes 12.15.0 memory usage and execution time behave the same as 12.16.1, i.e. I can make 12.15.0 be slow (whereas 12.16.1 is slow even without that).
I tried running things with --max-semi-space-size=1 - this slows down 12.16.1 (and 13.10.1) significantly, but does not affect 12.15.0.

So it feels it has less to do with GC, but more with how/where the objects get allocated? I'm sorry if I'm not explaining this well - V8 memory management is totally not my area of competence :)

I'm also not really able to repro much of that with the reduced case I have at the top, so I'll keep on working on that next week.

mcollina · 2020-03-06T17:01:58Z

I've tested the above on the following:

12.15.0: 603ms
12.16.1: 664ms
13.10.1: 639ms
master: 680ms

The trend is not really helping.

I wonder if this is a memory leak. You could take a heap snapshot on each version and compare using Chrome DevTools to see which objects are being retained.

We looked into that. However forcing gc() actually free up memory.

mmarchini · 2020-03-06T17:48:47Z

Note that from 13.10.1 and master are using the same V8 version, showing the results are unstable (which is coherent with what I've seen so far), but there's a small overhead. @mcollina can you try the latest node-v8 build as well?

Premature promotion to old space makes sense. If you haven't already, you can use v8.getHeapSpaceStatistics or --trace-gc-verbose to monitor how each space is growing over time. --trace-gc-verbose also shows the total time the program spent on GC, which can help understand if the difference we're seeing is coming from GC or something else.

mcollina · 2020-03-06T18:42:07Z

@mmarchini in node-v8 it gets even worse: 728.514ms (same dedicated server).

This is the output from the --trace-gc-verbose (one round, they are more or less the same):

[11337:0x56129b20e3e0] Fast promotion mode: false survival rate: 0%
[11337:0x56129b20e3e0]      785 ms: Scavenge 6.0 (10.7) -> 2.1 (10.7) MB, 0.1 / 0.0 ms  (average mu = 1.000, current mu = 1.000) allocation failure
[11337:0x56129b20e3e0] Memory allocator,       used:  10960 KB, available: 4232496 KB
[11337:0x56129b20e3e0] Read-only space,        used:    115 KB, available:      0 KB, committed:    116 KB
[11337:0x56129b20e3e0] New space,              used:     44 KB, available:   4047 KB, committed:   8192 KB
[11337:0x56129b20e3e0] New large object space, used:      0 KB, available:   4091 KB, committed:      0 KB
[11337:0x56129b20e3e0] Old space,              used:   1649 KB, available:    170 KB, committed:   1928 KB
[11337:0x56129b20e3e0] Code space,             used:     64 KB, available:      0 KB, committed:    332 KB
[11337:0x56129b20e3e0] Map space,              used:    221 KB, available:      0 KB, committed:    260 KB
[11337:0x56129b20e3e0] Large object space,     used:    128 KB, available:      0 KB, committed:    132 KB
[11337:0x56129b20e3e0] Code large object space,     used:      0 KB, available:      0 KB, committed:      0 KB
[11337:0x56129b20e3e0] All spaces,             used:   2222 KB, available: 4240805 KB, committed:  10960 KB
[11337:0x56129b20e3e0] Unmapper buffering 0 chunks of committed:      0 KB
[11337:0x56129b20e3e0] External memory reported:      9 KB
[11337:0x56129b20e3e0] Backing store memory:    719 KB
[11337:0x56129b20e3e0] External memory global 0 KB
[11337:0x56129b20e3e0] Total time spent in GC  : 12.4 ms

The only conclusion I could draw from this is that the object is allocated in the Large Object Space.
I can upload the full thing if you are interested, but it's very easy to generate from the example.

mcollina · 2020-03-06T18:54:45Z

Ok, I think I've identified part of the problem. In Node 12.15.0 the object did not go into large object space. In Node 12.16.1 and beyond, it moves into large object space.

gsathya · 2020-03-10T12:04:54Z

cc @camillobruni

dominykas · 2020-03-18T13:45:14Z

Found another repo where we can see a ~10% total slowdown in the unit test suite when changing from 12.15.0 to 12.16.1. Running clinic flame in 12.16.1 has the following hottest frames (nowhere to be seen on 12.15.0):

Once again - looping through an object to construct a new object...

camillobruni · 2020-03-18T20:28:58Z

Tracked in V8 as well: https://bugs.chromium.org/p/v8/issues/detail?id=10339
From what we've measured this is not a GC regression but rather a V8 runtime issue.

mcollina · 2020-06-19T07:19:56Z

Closed, this seems fixed.

dominykas · 2020-06-19T07:56:23Z

Does this need any action to get into v14.x?

mcollina · 2020-06-19T09:26:48Z

I thought it was already shipped in v14, normally PRs are backported after 2 week in a current release.

dominykas · 2020-06-19T10:14:24Z

My bad, it seems to actually be there: 748720e, not sure how I missed it. Thanks.

mmarchini added the performance Issues and PRs related to the performance of Node.js. label Mar 2, 2020

This was referenced May 20, 2020

deps: V8: cherry-pick 548f6c81d424 #33484

Closed

[v12.x] deps: V8: cherry-pick 548f6c81d424 #33553

Closed

mcollina closed this as completed Jun 19, 2020

mcollina mentioned this issue Jul 30, 2020

Severe performance regression with object spread operator introduced in v12.18.1 and v14.5.0 #34322

Closed

schamberg97 mentioned this issue Feb 20, 2021

Object.setPrototypeOf performance regression after Node.js 12.18.0. #37457

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression when enumerating objects (for shallow cloning) v12.15.0 to v12.16.1 #32049

Performance regression when enumerating objects (for shallow cloning) v12.15.0 to v12.16.1 #32049

dominykas commented Mar 2, 2020 •

edited

Loading

mmarchini commented Mar 3, 2020

dominykas commented Mar 3, 2020 •

edited

Loading

mmarchini commented Mar 3, 2020

dominykas commented Mar 5, 2020 •

edited

Loading

dominykas commented Mar 5, 2020 •

edited

Loading

mmarchini commented Mar 5, 2020

dominykas commented Mar 6, 2020

mcollina commented Mar 6, 2020

mmarchini commented Mar 6, 2020

mcollina commented Mar 6, 2020

mcollina commented Mar 6, 2020

gsathya commented Mar 10, 2020

dominykas commented Mar 18, 2020 •

edited

Loading

camillobruni commented Mar 18, 2020

mcollina commented Jun 19, 2020

dominykas commented Jun 19, 2020

mcollina commented Jun 19, 2020 •

edited

Loading

dominykas commented Jun 19, 2020 •

edited

Loading

Performance regression when enumerating objects (for shallow cloning) v12.15.0 to v12.16.1 #32049

Performance regression when enumerating objects (for shallow cloning) v12.15.0 to v12.16.1 #32049

Comments

dominykas commented Mar 2, 2020 • edited Loading

What steps will reproduce the bug?

Additional information

mmarchini commented Mar 3, 2020

dominykas commented Mar 3, 2020 • edited Loading

mmarchini commented Mar 3, 2020

dominykas commented Mar 5, 2020 • edited Loading

dominykas commented Mar 5, 2020 • edited Loading

mmarchini commented Mar 5, 2020

dominykas commented Mar 6, 2020

mcollina commented Mar 6, 2020

mmarchini commented Mar 6, 2020

mcollina commented Mar 6, 2020

mcollina commented Mar 6, 2020

gsathya commented Mar 10, 2020

dominykas commented Mar 18, 2020 • edited Loading

camillobruni commented Mar 18, 2020

mcollina commented Jun 19, 2020

dominykas commented Jun 19, 2020

mcollina commented Jun 19, 2020 • edited Loading

dominykas commented Jun 19, 2020 • edited Loading

dominykas commented Mar 2, 2020 •

edited

Loading

dominykas commented Mar 3, 2020 •

edited

Loading

dominykas commented Mar 5, 2020 •

edited

Loading

dominykas commented Mar 5, 2020 •

edited

Loading

dominykas commented Mar 18, 2020 •

edited

Loading

mcollina commented Jun 19, 2020 •

edited

Loading

dominykas commented Jun 19, 2020 •

edited

Loading