-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible Memory Leak #1031
Comments
CC @grouma |
Thanks for the awesome issue write up! My initial thought is that the issue is a result of how we track outstanding callbacks to keep the test alive. In particular the logic around here: test/pkgs/test_api/lib/src/backend/invoker.dart Lines 242 to 257 in c4280a6
I'll have to do further investigation to figure out exactly what's going on. |
I believe the root cause is coming from the
If I remove this logic the |
We are seeing an odd thing as we play with this some more. If we don't place a breakpoint and take a heap snapshot inside the "Rhaegar" test that |
That's not what I'm seeing. I can place a breakpoint only in the This is running with the following settings: |
Did a little more investigation. The I don't know if this is a |
@grouma thanks for digging into this! We tried using a local version of One thing that stands out to me is the fact that the Unfortunately I don't see an obvious workaround, so our only other option at this point is to continue breaking up the larger test suites. If there's anyway we can help further with this issue, we'd be happy to do so. Edit: The thought just occurred to me that if it's an issue with using |
You could also run with Then you don't need a dependency override. At least work trying |
@kevmoo good idea, hadn't thought of that. Unfortunately that alone doesn't work :( I'm guessing that's because |
@grouma Any news here? None of the workarounds are working for us. |
No news unfortunately and there likely won't be an update this week. The entire Dart team is attending an internal summit. That being said this is one of my top priorities and I will continue the investigation in earnest starting next week. CC @vsmenon as we briefly discussed this issue. |
Quick update. I can reproduce this issue without the use of |
Thanks @grouma. Can you link the ticket here once you've made it? |
Ignore my previous comment. My minimal repro without However, I have verified that your provided example does not have the same issue when running on the VM. I used the VM observatory and heap snapshots to confirm. My next goal is to see if I can create a repro for the web without |
I appreciate your investigation into this. We're ramping up investigations to determine what else we can do to alleviate out of memory errors when running tests. It does seem like your investigation confirms that the default behavior without the |
The default behavior definitely requires more memory which is expected. I don't necessarily agree that my investigation would indicate memory retention issues. Forcing a garbage collection event removes the It's possible that you are just trying to run too many tests in a single suite. There is overhead with each test that can't be avoided. The solution to this is to shard your tests. It's a little unfortunate but you'll likely get better performance as well depending on your CI environment. |
@grouma I think our question is this - when stack trace chaining is enabled, why are objects that are completely local to a test callback being retained even beyond the execution of that test? My expectation would be that setting a breakpoint in a separate test body (e.g. the |
We spent some time digging through the retaining paths and the StackTrace implementation and as best we can tell, it's related to the JS stacks being lazily computed, which requires storing the We manually modified the dart2js output to not store the exception: _StackTrace: function _StackTrace(t0) {
- this._exception = t0;
this._trace = null;
}, ..and that also fixed the issue ( |
We haven't been able to reproduce this. When we trigger a GC manually the references always go away when expected, and the errant references only seem to hang around because V8 decided not to GC yet. Are you able to see the stale references after a manual GC? |
Yes, we've been doing manual GCs at each breakpoint before collecting the heap snapshot. With the original repro at the top, we see Baptiste retained even after a manual GC at a breakpoint in the second test. I'm headed out right now but someone else on my team may be able to provide a screen recording if that would help. |
My reproduction is as follows:
|
Here's a video showing a repro. Admittedly, I'm capturing heap snapshots in more places, but barring a Chrome bug that shouldn't matter AFAIK. https://drive.google.com/file/d/11LCfMGN4XwvuozujSBr_j5m50ffTF9nK/view?usp=sharing |
Here's another version where I tried to do exactly what you described in your repro attempt. Let me know if I messed anything up. https://drive.google.com/file/d/1OOvMp2y3cZJDo-ULFdRDOosiuF6knGbr/view?usp=sharing |
Thanks for the videos. Very interesting! I believe we are running into a Chrome optimization. Note that you are doing a manual GC before each snapshot. If you dont do a GC before the first snapshot, then the However, if I repeat the steps in either of your videos then the Anyway, as we direct our internal customers, I suggest sharding your tests and using |
FWIW, our large test suite will run to completion successfully on Firefox, which seems to support the idea that we're running into some strange Chrome-specific behavior. |
I was told from some folks on the team that V8's stack traces are very different than those from Firefox so that doesn't surprise me too much. I tried to look for specific documentation around the topic but couldn't find any. Thanks for the extra details! |
Is there anyone on the chrome team that could take our simplified retention example, the fact that Firefox passes, and the idea that it's something with stack traces to determine if it's an issue in Chrome itself? |
Spoke with some folks internally. We are likely running into this issue: It would definitely be helpful if you could come up with a more minimal repro, especially one that doesn't use |
FYI. I'm experimenting with an SDK change that may work around the linked Chrome issue. It looks like we lazily store the JS error and simply use it to eventually call |
You're the hero we need. We were actually just talking about trying to do something similar a couple hours ago, but most of us aren't terribly familiar with the SDK itself, so we weren't sure how big a lift it would be. Thank you! |
This is very exciting to hear! I hope the performance works out. That fix might also result in a memory reduction in our production runtime code and not just during tests. |
I believe I have a much better understanding of this problem. The issue is that the Dart stack trace objects are self referential within Chrome. As you know they contain a reference to a JS I'm still having discussions with the Chrome team but they believe this issue is related to https://bugs.chromium.org/p/v8/issues/detail?id=2340 Unfortunately the suggested solution outlined in the above link does not resolve the problem. That is calling I do have another potential solution but it does come at a cost. Instead of storing the Some Angular tests will chain thousands of traces and run into memory crashes with this change. However, in practice the additional memory requirements does not seem to be an issue. I have run many thousand internal tests and only a few are impacted. There are a couple work arounds for those tests that are impacted by the increased memory requirements. One could simply shard these tests or alternatively configure Chrome to reduce the number of frames included in a trace. By default in DDC, we do not limit the frames, and thus they can become quite large. Next steps: I would like for you to try out a modified version of my proposed DDC SDK change and see if it resolves your issue. Please edit the DDC SDK ( (dart._StackTrace.new = function(jsError) {
this[_trace] = null;
// Store the stack directly. (We will need to update the toString accordingly.)
this[_jsError] = jsError != null && typeof jsError === "object" ? jsError.stack : null;;
this[_jsObjectMissingTrace] = null;
}).prototype = dart._StackTrace.prototype;
(dart._StackTrace.missing = function(caughtObj) {
this[_trace] = null;
this[_jsObjectMissingTrace] = caughtObj != null ? caughtObj : "null";
// Store the stack directly.
this[_jsError] = Error().stack;
}).prototype = dart._StackTrace.prototype; If you are impacted by the increased memory requirements you can modify your tests to limit the number of frames by doing the following: @JS()
library stacktracelimit;
import 'package:js/js.dart';
@JS('Error.stackTraceLimit')
external set stackTraceLimit(int n);
void main() {
stackTraceLimit = 5;
test('foo',(){
});
} If this change works for you, we would like to move forward with it quickly and have it land in version I will continue my discussions with the Chrome team for any other potential work arounds. However, I fear there won't be much that we can do. Judging from the age of the linked issue this appears to be a very difficult problem for them. |
@grouma Thanks for digging into this further, we really appreciate it. We tried making that change in a few different places where we have problematic test suites and unfortunately we saw roughly the same results if not fewer tests passing before running out of memory. It does fix the minimal repro that we originally submitted with this issue, so I think there's probably value in this change, but if we're the only user group that is hitting this problem right now then it may not be worth rushing it into 2.4. We're going to pick out a few internal test suites and do a deep dive on them, and now that we have your proposed changeset we can try including that as well to see if it, along with other changes of our own, get us past this issue. We'll report back if we learn anything more. Thanks again for all your help! On a related note, is there a better way to inject that change to the DDC SDK than manually modifying the outputs of a |
@evanweible-wf - you can edit your installed Dart SDK directly as well so you don't have to keep re-applying. |
Awesome, thanks @vsmenon! |
Actually, the path may be:
Note the |
Some more info: I'm digging into a heap snapshot in one of our suites that can only run 5-6 tests before running out of memory. I found that after 4 test there were 222,464 each of |
That's not all too surprising. We have similar metrics for some of our very large Angular tests which have a ton of asynchronous calls. However, if I disable stack trace chaining then the memory does not grow unbounded. Note that you must also disable chaining in the Angular framework: My proposed fix above would likely allow for some of these traces and zones to be GC'd but as you know unfortunately the fix requires significantly more memory. I think the logical next step is to understand why your tests fail when stack trace chaining is disabled. Do you have additional infrastructure that is calling |
Side thought: Would it be possible to ask someone on the chrome team to push on building zones and/or async stack chaining into the browser? |
They are working on it. Here's a public doc: https://docs.google.com/document/d/13Sy_kBIJGP0XT34V1CV3nkWya4TwYx9L3Yv45LdGB6Q |
Oh, according to https://v8.dev/blog/fast-async that shipped in Chrome 73 |
We've been updating a number of existing Dart 1 projects to be Dart 2 compatible. Unfortunately, some of those projects have large unit test suites that fail under Dart 2 as the browser appears to run out of memory. The only workaround we've found so far is to break up those suites into smaller suites.
While trying to track down any memory leaks in our code, I found that even in very basic unit tests objects are being retained longer than I was expecting, which feels like it could be contributing to the increased memory pressure we're seeing.
I've tried to create a simple example test suite that demonstrates this:
If I run this unit test suite on DDC (with Dart 2.3 on MacOS), with Chrome, placing breakpoints where indicated in the code above, I'm finding that the
Baptiste
object is retained after it is nulled out and even after the first test completes. This is true even when manually collecting garbage before taking the heap snapshots.One thing that is interesting is it appears this is true only if I call the async
walk
function on theBaptiste
object. If I comment out that existing call, or replace it with a call to the non-async functionrun
,baptiste
does not get retained after it is nulled out. Looking at the retention tree that results, it appears it is getting caught up in some sort of stack trace context:I'm not 100% certain this is an issue with the test package vs. a Dart SDK issue vs. maybe a misunderstanding on our part, but I thought it worth reporting to see what y'all (the experts 😉 ) might think about it.
The text was updated successfully, but these errors were encountered: