Possible Memory Leak #1031

todbachman-wf · 2019-05-08T17:16:51Z

We've been updating a number of existing Dart 1 projects to be Dart 2 compatible. Unfortunately, some of those projects have large unit test suites that fail under Dart 2 as the browser appears to run out of memory. The only workaround we've found so far is to break up those suites into smaller suites.

While trying to track down any memory leaks in our code, I found that even in very basic unit tests objects are being retained longer than I was expecting, which feels like it could be contributing to the increased memory pressure we're seeing.

I've tried to create a simple example test suite that demonstrates this:

import 'dart:async';

import 'package:test/test.dart';

class Baptiste {
  Future<Null> walk() async {}
  void run() {}
}

void main() {
  group('Lucy', () {
    test('Rhaegar', () async {
      var baptiste = new Baptiste();
      expect(baptiste, isNotNull);

      // This async call seems to entangle `baptiste` such that it is still
      // retained in memory once execution hits the "Tookie" test. If you comment
      // out this line of code, once `baptiste` is nulled out below it can
      // be garbage collected and all subsequent heap snapshots don't include
      // `baptiste`.
      await baptiste.walk();
      // baptiste.run();

      // Place a breakpoint here, collect garbage and take a heap snapshot.
      // Verify `baptiste` is in the heap.
      baptiste = null;

      // Place a breakpoint here, collect garbage and take a heap snapshot.
      // `baptiste` is still in the heap here.
      expect(baptiste, isNull);
    });

    test('Tookie', () {
      // Place a breakpoint here, collect garbage and take a heap snapshot.
      // `baptiste` is still in the heap here.
      expect(true, isTrue);
    });
  });
}

If I run this unit test suite on DDC (with Dart 2.3 on MacOS), with Chrome, placing breakpoints where indicated in the code above, I'm finding that the Baptiste object is retained after it is nulled out and even after the first test completes. This is true even when manually collecting garbage before taking the heap snapshots.

One thing that is interesting is it appears this is true only if I call the async walk function on the Baptiste object. If I comment out that existing call, or replace it with a call to the non-async function run, baptiste does not get retained after it is nulled out. Looking at the retention tree that results, it appears it is getting caught up in some sort of stack trace context:

I'm not 100% certain this is an issue with the test package vs. a Dart SDK issue vs. maybe a misunderstanding on our part, but I thought it worth reporting to see what y'all (the experts 😉 ) might think about it.

The text was updated successfully, but these errors were encountered:

robbecker-wf · 2019-05-08T17:21:07Z

fyi @kevmoo @nex3

kevmoo · 2019-05-08T17:21:29Z

CC @grouma

grouma · 2019-05-10T21:13:35Z

Thanks for the awesome issue write up!

My initial thought is that the issue is a result of how we track outstanding callbacks to keep the test alive.

In particular the logic around here:

test/pkgs/test_api/lib/src/backend/invoker.dart

Lines 242 to 257 in c4280a6

    
             Future waitForOutstandingCallbacks(fn()) { 
        
               heartbeat(); 
        
               Zone zone; 
        
               var counter = OutstandingCallbackCounter(); 
        
               runZoned(() async { 
        
                 zone = Zone.current; 
        
                 _outstandingCallbackZones.add(zone); 
        
                 await fn(); 
        
                 counter.removeOutstandingCallback(); 
        
               }, zoneValues: {_counterKey: counter}); 
        
               return counter.noOutstandingCallbacks.whenComplete(() { 
        
                 _outstandingCallbackZones.remove(zone); 
        
               }); 
        
             }

I'll have to do further investigation to figure out exactly what's going on.

grouma · 2019-05-10T22:37:33Z

I believe the root cause is coming from the Chain.capture call here:

test/pkgs/test_api/lib/src/backend/invoker.dart

Line 384 in c4280a6

Chain.capture(() {

If I remove this logic the Baptiste object is cleared from the heap before the second test. I am very unfamiliar with package:stack_trace so I might have to bother @nex3 to get some context and help with further debugging.

todbachman-wf · 2019-05-10T22:51:31Z

We are seeing an odd thing as we play with this some more. If we don't place a breakpoint and take a heap snapshot inside the "Rhaegar" test that baptiste is created in we don't see baptiste in the heap snapshot for the "Tookie" test. So, measuring the heap is causing the heap to be different when we measure it again later. I'm thoroughly confused.

grouma · 2019-05-11T00:14:04Z

That's not what I'm seeing.

I can place a breakpoint only in the Tookie test and see the Baptiste object. Again, removing the Chain.capture call prevents me from seeing the Baptiste object.

This is running with the following settings:
Dart VM version: 2.3.0-dev.0.3 (Tue Apr 23 12:02:59 2019 -0700) on "linux_x64"
pub run test test/memory_test.dart -p chrome --pause-after-load (so Dart2JS - I dont' think this matters)
Version 74.0.3729.131 (Official Build) (64-bit)

grouma · 2019-05-11T01:21:01Z

Did a little more investigation. The StackTrace.current call here causes the Baptiste object to stick around is here:
https://github.com/dart-lang/stack_trace/blob/4b1768f776c2cfee8f135139c19e2b9caa1ca112/lib/src/stack_zone_specification.dart#L225

I don't know if this is a package:stack_trace issue or an SDK issue. I'll have to chat with folks that are more knowledgeable than me.

evanweible-wf · 2019-05-13T19:01:00Z

@grouma thanks for digging into this! We tried using a local version of test_api with that call to Chain.capture() commented out and saw a significant improvement in terms of both memory usage and test execution speed. Several test suites that we were looking at breaking up into smaller suites ran to completion with that change, although some of our larger suites still fail. So at the very least it's an improvement, but maybe there are other or similar instances that are still contributing to the issue?

One thing that stands out to me is the fact that the Chain.capture() usage in test_api has existed for quite a long time (the git blame says 4 years). Similar story for the call to StackTrace.current in the stack_trace package that you pointed out. We also did not start seeing these memory issues until dart2, so perhaps there's a compiler or SDK issue on dart2 related to how StackTraces are generated/used?

Unfortunately I don't see an obvious workaround, so our only other option at this point is to continue breaking up the larger test suites. If there's anyway we can help further with this issue, we'd be happy to do so.

Edit: The thought just occurred to me that if it's an issue with using StackTrace.current and not just Chain.capture(), then there maybe other packages using it and causing objects to be incorrectly retained, which might explain why updating test_api alone did not fix the memory-related crashes for all of our test suites.

kevmoo · 2019-05-13T19:02:51Z

You could also run with --no-chain-stack-traces – which should have the same affect – I think.

Then you don't need a dependency override. At least work trying

evanweible-wf · 2019-05-13T19:35:28Z

@kevmoo good idea, hadn't thought of that. Unfortunately that alone doesn't work :( I'm guessing that's because Chain.capture(when: false) still runs the callback in a zone with a custom onError handler that calls into Chain.current/Chain.forTrace.. but I'm not familiar enough with how that stuff works to know for sure.

robbecker-wf · 2019-05-14T16:43:07Z

@grouma Any news here? None of the workarounds are working for us.

grouma · 2019-05-14T20:18:14Z

No news unfortunately and there likely won't be an update this week. The entire Dart team is attending an internal summit. That being said this is one of my top priorities and I will continue the investigation in earnest starting next week.

CC @vsmenon as we briefly discussed this issue.

grouma · 2019-05-20T18:30:49Z

Quick update. I can reproduce this issue without the use of package:test on the Dart VM. I'll do some further investigations but will likely open up a new issue on either package:stack_trace and or the SDK.

robbecker-wf · 2019-05-20T18:58:16Z

Thanks @grouma. Can you link the ticket here once you've made it?

grouma · 2019-05-20T22:34:53Z

Ignore my previous comment. My minimal repro without package:test on the VM was actually me doing something silly.

However, I have verified that your provided example does not have the same issue when running on the VM. I used the VM observatory and heap snapshots to confirm.

My next goal is to see if I can create a repro for the web without package:test.

grouma · 2019-05-21T22:31:10Z

Ok. I finally have a handle on this. This is not a memory leak.

You can force garbage collection by hitting the little trash can icon:

If you do this prior to collecting the second snapshot, you'll see that the baptiste object is not contained in the heap. By design we don't have control over when the garbage is collected.

Yesterday I was not able to reproduce the "issue" on the VM because stack traces on the VM are quite different from those in V8. Unfortunately, stack traces in V8 contain a ton of additional context. This is why the baptiste object is getting caught up in the StackTrace.current result.

Internally, we actually disable stack trace chaining by default as stack traces are quite slow to collect. We use the flag that Kevin suggested above: --no-chain-stack-traces. I can't recall the exact numbers but when I initially implemented the feature we saw a significant reduction in overall test times. When debugging individual failures we of course remove the flag if more context is required.

If I run your example with --no-chain-stack-traces I see that the baptiste object is only retained in the first snapshot, and is removed from the heap by the time the second snapshot is taken. Note that Chain.current/Chain.forTrace are only called if an error is thrown when Chain.capture is disabled.

Given that you are still running into issues with stack trace chaining disabled, I'm inclined to believe that the memory leak is on your side. My guess is that the memory intensive chaining is tickling the issue.

For what it's worth, we do have some pretty massive test suites internally and this is the first memory leak complaint on package:test as far as I know. For context, our internal infrastructure limits memory to 12 gigabytes.

I'm going to close this issue as I don't think there's much else I can do.

robbecker-wf · 2019-05-22T17:14:40Z

I appreciate your investigation into this. We're ramping up investigations to determine what else we can do to alleviate out of memory errors when running tests. It does seem like your investigation confirms that the default behavior without the --no-chain-stack-traces does retain memory. That indicates there could still be a memory retention issue that's unhandled in zones or stack chaining. Thoughts?

grouma · 2019-05-22T19:19:39Z

The default behavior definitely requires more memory which is expected. I don't necessarily agree that my investigation would indicate memory retention issues. Forcing a garbage collection event removes the baptiste object from the heap as expected.

It's possible that you are just trying to run too many tests in a single suite. There is overhead with each test that can't be avoided. The solution to this is to shard your tests. It's a little unfortunate but you'll likely get better performance as well depending on your CI environment.

evanweible-wf · 2019-05-22T19:27:58Z

@grouma I think our question is this - when stack trace chaining is enabled, why are objects that are completely local to a test callback being retained even beyond the execution of that test? My expectation would be that setting a breakpoint in a separate test body (e.g. the Tookie test in the original example) that is running in a distinct zone should allow gc to clean up the zones from previous test bodies (especially when no errors are thrown), but that is not happening when chaining is enabled and that zone's callback is async.

evanweible-wf · 2019-05-22T19:30:20Z

We spent some time digging through the retaining paths and the StackTrace implementation and as best we can tell, it's related to the JS stacks being lazily computed, which requires storing the Error objects that get created in JS on the _StackTrace objects in the compiled JS output (note: we see the same behavior via ddc and dart2js).

We manually modified the dart2js output to not store the exception:

    _StackTrace: function _StackTrace(t0) {
-     this._exception = t0;
      this._trace = null;
    },

..and that also fixed the issue (Baptiste was no longer retained outside the zone for that test, meaning the zone was able to be gc'd). Obviously this also breaks the whole purpose behind chaining stack traces, but based on that it appears to us as though something is retaining a reference to the stack trace which in turn prevents the zone that the test runs in from being gc'd.

natebosch · 2019-05-22T19:37:51Z

why are objects that are completely local to a test callback being retained even beyond the execution of that test?

something is retaining a reference to the stack trace

We haven't been able to reproduce this. When we trigger a GC manually the references always go away when expected, and the errant references only seem to hang around because V8 decided not to GC yet.

Are you able to see the stale references after a manual GC?

evanweible-wf · 2019-05-22T19:40:34Z

Yes, we've been doing manual GCs at each breakpoint before collecting the heap snapshot. With the original repro at the top, we see Baptiste retained even after a manual GC at a breakpoint in the second test. I'm headed out right now but someone else on my team may be able to provide a screen recording if that would help.

cc @todbachman-wf @robbecker-wf

grouma · 2019-05-22T20:04:47Z

My reproduction is as follows:

I first run:
pub run test test/sample_test.dart --pause-after-load -p chrome
(which runs the test in Dart2JS)
I set a break point on line 29 which is expect(basptiste, isNull) in the first test.
I set another break point on line 35 which is expect(true, isTrue) in the second test.
I hit continue to execute the tests.
At the first break point I grab a heap snapshot and look for the object Baptiste. (One is found).
I hit continue to hit the second break point.
I manually hit the GC icon.
I capture a second snapshot and look for the object Baptiste. (None are contained in the snapshot)

georgelesica-wf · 2019-05-22T20:15:26Z

Here's a video showing a repro. Admittedly, I'm capturing heap snapshots in more places, but barring a Chrome bug that shouldn't matter AFAIK.

https://drive.google.com/file/d/11LCfMGN4XwvuozujSBr_j5m50ffTF9nK/view?usp=sharing

georgelesica-wf · 2019-05-22T20:47:40Z

Here's another version where I tried to do exactly what you described in your repro attempt. Let me know if I messed anything up.

https://drive.google.com/file/d/1OOvMp2y3cZJDo-ULFdRDOosiuF6knGbr/view?usp=sharing

grouma · 2019-05-22T21:16:57Z

Thanks for the videos. Very interesting!

I believe we are running into a Chrome optimization. Note that you are doing a manual GC before each snapshot.

If you dont do a GC before the first snapshot, then the Baptiste object is removed from the heap in the second test. This is assuming you do a manual GC before taking the snapshot. I can reproduce this behavior on Chrome 74.0.3729.169 on both Linux and Mac with Dart2JS and DDC.

However, if I repeat the steps in either of your videos then the Baptiste object does stick around for me. The fact that the Baptiste object survives multiple GC events likely means it is allowed to be retained in the heap indefinitely. Chrome is a black box to me so you'll likely want to reach out to them for more information.

Anyway, as we direct our internal customers, I suggest sharding your tests and using --no-chain-stack-traces if you are hitting memory limits.

georgelesica-wf · 2019-05-24T16:04:06Z

FWIW, our large test suite will run to completion successfully on Firefox, which seems to support the idea that we're running into some strange Chrome-specific behavior.

grouma · 2019-05-24T18:33:39Z

I was told from some folks on the team that V8's stack traces are very different than those from Firefox so that doesn't surprise me too much. I tried to look for specific documentation around the topic but couldn't find any. Thanks for the extra details!

robbecker-wf · 2019-05-28T21:52:05Z

Is there anyone on the chrome team that could take our simplified retention example, the fact that Firefox passes, and the idea that it's something with stack traces to determine if it's an issue in Chrome itself?

grouma · 2019-05-29T17:17:49Z

Spoke with some folks internally. We are likely running into this issue:
https://bugs.chromium.org/p/v8/issues/detail?id=2340
Let me reach out to yangguo@ to see if he has any suggestions.

It would definitely be helpful if you could come up with a more minimal repro, especially one that doesn't use package:test.

grouma · 2019-05-29T21:42:03Z

FYI.

I'm experimenting with an SDK change that may work around the linked Chrome issue. It looks like we lazily store the JS error and simply use it to eventually call .stack. Storing the stack instead does not cause the observed memory issue. However, I need to validate that this does not regress performance significantly.

georgelesica-wf · 2019-05-29T21:57:43Z

You're the hero we need. We were actually just talking about trying to do something similar a couple hours ago, but most of us aren't terribly familiar with the SDK itself, so we weren't sure how big a lift it would be. Thank you!

robbecker-wf · 2019-05-30T17:34:45Z

This is very exciting to hear! I hope the performance works out. That fix might also result in a memory reduction in our production runtime code and not just during tests.

grouma · 2019-05-31T02:33:58Z

I believe I have a much better understanding of this problem.

The issue is that the Dart stack trace objects are self referential within Chrome. As you know they contain a reference to a JS Error() object, which in Chrome contains additional context. This context points to the wrapping Dart stack trace object. As a result, these traces are never garbage collected.

I'm still having discussions with the Chrome team but they believe this issue is related to https://bugs.chromium.org/p/v8/issues/detail?id=2340

Unfortunately the suggested solution outlined in the above link does not resolve the problem. That is calling .stack on the Error() object, while still storing it in the Dart stack trace, prevents the Dart stack traces from being garbage collected.

I do have another potential solution but it does come at a cost. Instead of storing the Error() object we can store the .stack value directly. Unfortunately, this requires more memory but does fix the memory leak! In theory it should also be slower but as far as I can tell the performance difference is imperceptible.

Some Angular tests will chain thousands of traces and run into memory crashes with this change. However, in practice the additional memory requirements does not seem to be an issue. I have run many thousand internal tests and only a few are impacted.

There are a couple work arounds for those tests that are impacted by the increased memory requirements. One could simply shard these tests or alternatively configure Chrome to reduce the number of frames included in a trace. By default in DDC, we do not limit the frames, and thus they can become quite large.

Next steps:

I would like for you to try out a modified version of my proposed DDC SDK change and see if it resolves your issue.

Please edit the DDC SDK (dart_sdk.js) such that it has the following two line change:

(dart._StackTrace.new = function(jsError) {
    this[_trace] = null;
    // Store the stack directly. (We will need to update the toString accordingly.)
    this[_jsError] = jsError != null && typeof jsError === "object" ? jsError.stack : null;;
    this[_jsObjectMissingTrace] = null;
  }).prototype = dart._StackTrace.prototype;
 (dart._StackTrace.missing = function(caughtObj) {
    this[_trace] = null;
    this[_jsObjectMissingTrace] = caughtObj != null ? caughtObj : "null";
    // Store the stack directly.
    this[_jsError] = Error().stack;
  }).prototype = dart._StackTrace.prototype;

If you are impacted by the increased memory requirements you can modify your tests to limit the number of frames by doing the following:

@JS()
library stacktracelimit;

import 'package:js/js.dart';

@JS('Error.stackTraceLimit')
external set stackTraceLimit(int n);

void main() {
  stackTraceLimit = 5;
  test('foo',(){
  });
}

If this change works for you, we would like to move forward with it quickly and have it land in version 2.4 of the DDC SDK. @vsmenon and I believe that resolving the memory leak but requiring more memory in general is a correct trade off for DDC. As for Dart2JS the same tradeoff is not as clear. I am told they limit the number of frames to Chrome's default but further discussion is needed.

I will continue my discussions with the Chrome team for any other potential work arounds. However, I fear there won't be much that we can do. Judging from the age of the linked issue this appears to be a very difficult problem for them.

evanweible-wf · 2019-05-31T17:03:45Z

@grouma Thanks for digging into this further, we really appreciate it. We tried making that change in a few different places where we have problematic test suites and unfortunately we saw roughly the same results if not fewer tests passing before running out of memory. It does fix the minimal repro that we originally submitted with this issue, so I think there's probably value in this change, but if we're the only user group that is hitting this problem right now then it may not be worth rushing it into 2.4.

We're going to pick out a few internal test suites and do a deep dive on them, and now that we have your proposed changeset we can try including that as well to see if it, along with other changes of our own, get us past this issue. We'll report back if we learn anything more. Thanks again for all your help!

On a related note, is there a better way to inject that change to the DDC SDK than manually modifying the outputs of a pub run build_runner test -o build? Or would that require using a custom branch of the Dart SDK?

vsmenon · 2019-05-31T17:11:25Z

On a related note, is there a better way to inject that change to the DDC SDK than manually modifying the outputs of a pub run build_runner test -o build? Or would that require using a custom branch of the Dart SDK?

@evanweible-wf - you can edit your installed Dart SDK directly as well so you don't have to keep re-applying. build_runner is just copying the file from $INSTALLED_DART_SDK/lib/dev_compiler/amd/dart_sdk.js

evanweible-wf · 2019-05-31T17:12:27Z

Awesome, thanks @vsmenon!

vsmenon · 2019-05-31T17:12:52Z

Actually, the path may be:

$INSTALLED_DART_SDK/lib/dev_compiler/kernel/amd/dart_sdk.js

Note the kernel - we temporarily have two copies while we transition front ends.

robbecker-wf · 2019-05-31T19:16:30Z

Some more info: I'm digging into a heap snapshot in one of our suites that can only run 5-6 tests before running out of memory. I found that after 4 test there were 222,464 each of
ZoneSpecification, LazyTrace, StackTrace which is an absurdly high amount of these objects for running so few tests. I searched around and this issue seems relevant dart-lang/sdk#19548 or possibly this one? dart-lang/sdk#36983

grouma · 2019-06-01T04:21:08Z

That's not all too surprising.

We have similar metrics for some of our very large Angular tests which have a ton of asynchronous calls. However, if I disable stack trace chaining then the memory does not grow unbounded. Note that you must also disable chaining in the Angular framework:
https://github.com/dart-lang/angular/blob/master/angular/lib/src/core/zone/ng_zone.dart#L73

My proposed fix above would likely allow for some of these traces and zones to be GC'd but as you know unfortunately the fix requires significantly more memory.

I think the logical next step is to understand why your tests fail when stack trace chaining is disabled. Do you have additional infrastructure that is calling Chain.capture? Are you leaking memory elsewhere?

robbecker-wf · 2019-06-05T21:12:25Z

Side thought: Would it be possible to ask someone on the chrome team to push on building zones and/or async stack chaining into the browser?

grouma · 2019-06-05T22:52:43Z

They are working on it. Here's a public doc: https://docs.google.com/document/d/13Sy_kBIJGP0XT34V1CV3nkWya4TwYx9L3Yv45LdGB6Q

robbecker-wf · 2019-06-06T16:14:04Z

Oh, according to https://v8.dev/blog/fast-async that shipped in Chrome 73

kevmoo added the type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) label May 8, 2019

grouma closed this as completed May 21, 2019

grouma reopened this May 29, 2019

grouma mentioned this issue Jun 10, 2019

Should DDC target ES6 async / wait? dart-lang/sdk#28422

Open

Possible Memory Leak #1031

Possible Memory Leak #1031

Comments

todbachman-wf commented May 8, 2019

robbecker-wf commented May 8, 2019

kevmoo commented May 8, 2019

grouma commented May 10, 2019

grouma commented May 10, 2019 • edited Loading

todbachman-wf commented May 10, 2019

grouma commented May 11, 2019

grouma commented May 11, 2019

evanweible-wf commented May 13, 2019 • edited Loading

kevmoo commented May 13, 2019

evanweible-wf commented May 13, 2019

robbecker-wf commented May 14, 2019

grouma commented May 14, 2019

grouma commented May 20, 2019

robbecker-wf commented May 20, 2019

grouma commented May 20, 2019

grouma commented May 21, 2019

robbecker-wf commented May 22, 2019 • edited Loading

grouma commented May 22, 2019

evanweible-wf commented May 22, 2019

evanweible-wf commented May 22, 2019

natebosch commented May 22, 2019

evanweible-wf commented May 22, 2019

grouma commented May 22, 2019

georgelesica-wf commented May 22, 2019

georgelesica-wf commented May 22, 2019

grouma commented May 22, 2019

georgelesica-wf commented May 24, 2019

grouma commented May 24, 2019

robbecker-wf commented May 28, 2019

grouma commented May 29, 2019

grouma commented May 29, 2019

georgelesica-wf commented May 29, 2019

robbecker-wf commented May 30, 2019

grouma commented May 31, 2019

evanweible-wf commented May 31, 2019

vsmenon commented May 31, 2019

evanweible-wf commented May 31, 2019

vsmenon commented May 31, 2019

robbecker-wf commented May 31, 2019 • edited Loading

grouma commented Jun 1, 2019

robbecker-wf commented Jun 5, 2019

grouma commented Jun 5, 2019

robbecker-wf commented Jun 6, 2019

grouma commented May 10, 2019 •

edited

Loading

evanweible-wf commented May 13, 2019 •

edited

Loading

robbecker-wf commented May 22, 2019 •

edited

Loading

robbecker-wf commented May 31, 2019 •

edited

Loading