Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when a Promise is stored on an Express request that's added to a domain #23862

Closed
rgrove opened this issue Oct 25, 2018 · 22 comments
Closed
Labels
confirmed-bug Issues with confirmed bugs. domain Issues and PRs related to the domain subsystem. memory Issues and PRs related to the memory management or memory footprint. regression Issues related to regressions.

Comments

@rgrove
Copy link

rgrove commented Oct 25, 2018

  • Version: v10.12.0
  • Platform: Darwin zero-gravitas-lan.jetpants.com 18.0.0 Darwin Kernel Version 18.0.0: Wed Aug 22 20:13:40 PDT 2018; root:xnu-4903.201.2~1/RELEASE_X86_64 x86_64
  • Subsystem: domain

The following code, which is a reduced test case using functionality that's relied upon by the popular Sentry error reporting module, works fine in Node 8.12.0 but consistently leaks memory on each request in Node 10.12.0, 11.0.0, and 9.11.2:

'use strict';

const domain = require('domain');
const app = require('express')();

app.use((req, res, next) => {
  let requestDomain = domain.create();

  requestDomain.add(req);
  requestDomain.on('error', next);

  requestDomain.run(() => {
    next();
  });
});

app.get('/', (req, res) => {
  req.cachedPromise = new Promise(resolve => {
    let tenMebibyteString = 'a'.repeat(10 * 1024 * 1024);
    resolve(tenMebibyteString);
  });

  res.send('hi');
});

app.listen(5000);

I would expect the req object and its req.cachedPromise property to be GCed once the response is finished, since no references to it are being held. This is what happens in Node 8. But in Nodes 9, 10, and 11, req.cachedPromise never seems to be GCed, causing a steady memory leak.

I first noticed this issue when I tried to upgrade a production service using Sentry to Node 10. It likely affects anyone who uses Sentry with Express and caches promises on the request object (or on any descendant of the request object).

The relevant code in Sentry is here: https://github.com/getsentry/sentry-javascript/blob/839326fbb498c669bd8b9c38bd93d39ca15b6266/packages/node/src/handlers.ts#L220-L229

@jasnell
Copy link
Member

jasnell commented Oct 25, 2018

/cc @mcollina @addaleax ...

I'm thinking that there's possibly a leak in the revamped domain internals that's using async_hooks?

@addaleax addaleax added confirmed-bug Issues with confirmed bugs. domain Issues and PRs related to the domain subsystem. memory Issues and PRs related to the memory management or memory footprint. labels Oct 25, 2018
@addaleax
Copy link
Member

Ouch. I think the cause here is a circular-ish reference chain…

  • The IncomingMessage instance (req) has a reference to the cachedPromise
  • The cachedPromise keeps the requestDomain alive through the global pairing map (async ID → domain)
  • The requestDomain has a members list
  • The members list contains IncomingMessage because it was added through .add().

A regular circular reference would be no problem for GC, but the fact that we have this global pairing map makes it impossible for GC to tell that there is a kind of “if X is inaccessible, then Y is going to be purged from the map, becoming inaccessible itself” logic here.

@ChALkeR ChALkeR added v10.x regression Issues related to regressions. labels Oct 25, 2018
@ChALkeR
Copy link
Member

ChALkeR commented Oct 25, 2018

@addaleax
These are all the occurrences of pairing that I was able to locate:

lib/domain.js:51:const pairing = new Map();
lib/domain.js:56:      pairing.set(asyncId, process.domain);
lib/domain.js:61:    const current = pairing.get(asyncId);
lib/domain.js:67:    const current = pairing.get(asyncId);
lib/domain.js:73:    pairing.delete(asyncId); // cleaning up

There seems to be no iteration.
Would just replacing it with WeakMap help, if that's the only place where those refs are retained?

@addaleax
Copy link
Member

@ChALkeR Yes, that’s the map – but it’s… difficult. WeakMap doesn’t work because we use integers as keys – ironically, partly with the intention of avoiding issues with GC…

I don’t have a good idea of how this could be addressed at the moment.

@mcollina
Copy link
Member

This is a fundemental design flaw of async_hooks. We use numbers as indexes (so no weakmaps) on a global map to allocate/track state and we deallocate it only when objects are destroyed. In several cases, this are tied to when the garbage collector actually cleans up the object, making it unpredictable.

In the past, I proposed to introduce a currentAsyncResource() API to be able to store state on the resource itself, removing the need for the global map. I’ll resume that work asap.

@addaleax
Copy link
Member

So… one thing we could do is to turn the values in the map into weak references… that’s icky, but it solves this particular problem.

It would mean that we rely on the resource.domain = process.domain line in the init hook to keep the domain alive for the lifetime of the resource (which seems like a reasonable assumption?), and it would mean going back to relying on an internal utility, unless we want to expose a public API à la https://www.npmjs.com/package/weak or https://www.npmjs.com/package/weak-napi…

Example diff for v10.x in the fold
diff --git a/lib/domain.js b/lib/domain.js
index e8ae3ff10034..b66fdc3f3a09 100644
--- a/lib/domain.js
+++ b/lib/domain.js
@@ -35,6 +35,9 @@ const {
 } = require('internal/errors').codes;
 const { createHook } = require('async_hooks');
 
+const kWeak = Symbol('kWeak');
+const { WeakReference } = internalBinding('util');
+
 // overwrite process.domain with a getter/setter that will allow for more
 // effective optimizations
 var _domain = [null];
@@ -53,7 +56,7 @@ const asyncHook = createHook({
   init(asyncId, type, triggerAsyncId, resource) {
     if (process.domain !== null && process.domain !== undefined) {
       // if this operation is created while in a domain, let's mark it
-      pairing.set(asyncId, process.domain);
+      pairing.set(asyncId, process.domain[kWeak]);
       resource.domain = process.domain;
       if (resource.promise !== undefined &&
           resource.promise instanceof Promise) {
@@ -67,13 +70,13 @@ const asyncHook = createHook({
   before(asyncId) {
     const current = pairing.get(asyncId);
     if (current !== undefined) { // enter domain for this cb
-      current.enter();
+      current.get().enter();
     }
   },
   after(asyncId) {
     const current = pairing.get(asyncId);
     if (current !== undefined) { // exit domain for this cb
-      current.exit();
+      current.get().exit();
     }
   },
   destroy(asyncId) {
@@ -174,6 +177,7 @@ class Domain extends EventEmitter {
     super();
 
     this.members = [];
+    this[kWeak] = new WeakReference(this);
     asyncHook.enable();
 
     this.on('removeListener', updateExceptionCapture);
diff --git a/src/node_util.cc b/src/node_util.cc
index 5adecf4d9753..c41162dc9a81 100644
--- a/src/node_util.cc
+++ b/src/node_util.cc
@@ -1,5 +1,6 @@
 #include "node_internals.h"
 #include "node_watchdog.h"
+#include "base_object-inl.h"
 
 namespace node {
 namespace util {
@@ -8,7 +9,9 @@ using v8::ALL_PROPERTIES;
 using v8::Array;
 using v8::Context;
 using v8::FunctionCallbackInfo;
+using v8::FunctionTemplate;
 using v8::Integer;
+using v8::Isolate;
 using v8::Local;
 using v8::Object;
 using v8::ONLY_CONFIGURABLE;
@@ -172,6 +175,36 @@ void SafeGetenv(const FunctionCallbackInfo<Value>& args) {
             v8::NewStringType::kNormal).ToLocalChecked());
 }
 
+class WeakReference : public BaseObject {
+ public:
+  WeakReference(Environment* env, Local<Object> object, Local<Object> target)
+    : BaseObject(env, object) {
+    MakeWeak();
+    target_.Reset(env->isolate(), target);
+    target_.SetWeak();
+  }
+
+  static void New(const FunctionCallbackInfo<Value>& args) {
+    Environment* env = Environment::GetCurrent(args);
+    CHECK(args.IsConstructCall());
+    CHECK(args[0]->IsObject());
+    new WeakReference(env, args.This(), args[0].As<Object>());
+  }
+
+  static void Get(const FunctionCallbackInfo<Value>& args) {
+    WeakReference* weak_ref = Unwrap<WeakReference>(args.Holder());
+    Isolate* isolate = args.GetIsolate();
+    if (!weak_ref->target_.IsEmpty())
+      args.GetReturnValue().Set(weak_ref->target_.Get(isolate));
+  }
+
+  SET_MEMORY_INFO_NAME(WeakReference)
+  SET_SELF_SIZE(WeakReference)
+  SET_NO_MEMORY_INFO()
+ private:
+  Persistent<Object> target_;
+};
+
 void Initialize(Local<Object> target,
                 Local<Value> unused,
                 Local<Context> context) {
@@ -219,6 +252,16 @@ void Initialize(Local<Object> target,
 
   env->SetMethod(target, "safeGetenv", SafeGetenv);
 
+  Local<String> weak_ref_string =
+      FIXED_ONE_BYTE_STRING(env->isolate(), "WeakReference");
+  Local<FunctionTemplate> weak_ref =
+      env->NewFunctionTemplate(WeakReference::New);
+  weak_ref->InstanceTemplate()->SetInternalFieldCount(1);
+  weak_ref->SetClassName(weak_ref_string);
+  env->SetProtoMethod(weak_ref, "get", WeakReference::Get);
+  target->Set(context, weak_ref_string,
+              weak_ref->GetFunction(context).ToLocalChecked()).FromJust();
+
   Local<Object> constants = Object::New(env->isolate());
   NODE_DEFINE_CONSTANT(constants, ALL_PROPERTIES);
   NODE_DEFINE_CONSTANT(constants, ONLY_WRITABLE);

@addaleax
Copy link
Member

fyi @nodejs/diagnostics

@ChALkeR
Copy link
Member

ChALkeR commented Oct 25, 2018

Deleting from the map on promiseResolve seems to help, at least for this specific testcase.
Do we need to retain it after promiseResolve has been fired once?

I.e.:

index 0caeb624b4..af7db89b34 100644
--- a/lib/domain.js
+++ b/lib/domain.js
@@ -71,6 +71,9 @@ const asyncHook = createHook({
   },
   destroy(asyncId) {
     pairing.delete(asyncId); // cleaning up
+  },
+  promiseResolve(asyncId) {
+    pairing.delete(asyncId); // cleaning up
   }
 });

@addaleax
Copy link
Member

@ChALkeR I don’t think this is specific to promises :/

And no, I don’t think we can do something inside promiseResolve

const async_hooks = require('async_hooks');
async_hooks.createHook({
  init(...args) { process._rawDebug('init', args) },
  before(id) { process._rawDebug('before', id) },
  after(id) { process._rawDebug('after', id) },
  destroy(id) { process._rawDebug('destroy', id) },
  promiseResolve(id) { process._rawDebug('promiseResolve', id) },
}).enable();

Promise.resolve(42).then(() => { process._rawDebug('inside then') });

prints

init [ 5, 'PROMISE', 1, PromiseWrap { isChainedPromise: false } ]
promiseResolve 5
init [ 6, 'PROMISE', 5, PromiseWrap { isChainedPromise: true } ]
before 6
inside then
promiseResolve 6
after 6

@ChALkeR
Copy link
Member

ChALkeR commented Oct 25, 2018

@addaleax Oh, I missed that (re: call order), thanks.
Is after always called no later than in the same tick as promiseResolve? If yes, we could just delay.

diff --git a/lib/domain.js b/lib/domain.js
index 0caeb624b4..8070610235 100644
--- a/lib/domain.js
+++ b/lib/domain.js
@@ -71,6 +71,11 @@ const asyncHook = createHook({
   },
   destroy(asyncId) {
     pairing.delete(asyncId); // cleaning up
+  },
+  promiseResolve(asyncId) {
+    process.nextTick(() => {
+      pairing.delete(asyncId);
+    });
   }
 });

Re: not only to promises: can I have a testcase for that? Something timer-related? I might be missing something.


Upd:

const async_hooks = require('async_hooks');
const end = new Set();
async_hooks.createHook({
  init(...args) { process._rawDebug('init', args.join('|')) },
  before(id) { process._rawDebug('before', id) },
  after(id) {
    process._rawDebug('after', id);
    if (end.has(id)) {
      process._rawDebug('wrong execution order')
      process.exit(1)
    }
  },
  destroy(id) { process._rawDebug('destroy', id) },
  promiseResolve(id) {
    process._rawDebug('promiseResolve', id);
    process.nextTick(() => {
      end.add(id);
      process._rawDebug('promiseResolve (delayed)', id)
    })
  },
}).enable();

Not sure if that's reliable though, need to look inside async_hooks.

@addaleax
Copy link
Member

@ChALkeR I think any GC-based resource would have the same issue – I haven’t tried, but an async_hooks.AsyncResource with default options should do the trick just as well?

@vdeturckheim
Copy link
Member

I am not sure why I missed that issue. Sorry.

@addaleax when you suggest async_hooks.AsyncResource is that the same as @mcollina 's currentAsyncResource()?

@addaleax
Copy link
Member

addaleax commented Nov 6, 2018

@vdeturckheim I’m not sure I understand the question – the two concepts are very, very different; one is a JS class to create custom async resource types, the other is a function that gives you the currently active resource.

@salper
Copy link

salper commented Nov 15, 2018

@rgrove For the record, I've also encountered the same issue this week. Ended up with the same conclusions. To contain the leak, I'm cleaning the active domain (Sentry is the only one using domains in my stack) calling this function at the end of my request handlers:

import domain from 'domain'

export default function domainCleanup() {
  const activeDomain = domain.active
  if (activeDomain) {
    const members = activeDomain.members.slice()
    members.forEach(member => activeDomain.remove(member))
    const eventNames = activeDomain.eventNames()
    eventNames.forEach(eventName => activeDomain.removeAllListeners(eventName))
    activeDomain.__SENTRY__ = null
    delete activeDomain.__SENTRY__
  }
}

@addaleax The thing is that, using the profiler, I can still see the pairing map referencing each empty domain. I thought that by clearing the members, the request would have been garbaged (which is the case), leading the attached promised to be also collected and, ultimately, clearing the map entry. But that is not what I've observed. Didn't look at lib/domain.js though...

@salper
Copy link

salper commented Nov 15, 2018

Ok, so looking at lib/domain.js, it seems that if any asynchronous resources is not garbaged, it keeps a reference to the domain through the pairing global. So I need to find the resource in question. Using an asyncId, is there a way find an asynchronous resource ?

@salper
Copy link

salper commented Nov 16, 2018

Despite what the profiler said, the memory curve stayed flat for 10 hours, so something must collect those domains.

@Trott
Copy link
Member

Trott commented Nov 18, 2018

@addaleax @ChALkeR @mcollina @vdeturckheim Is someone looking into a fix for this? (If not, should we label help wanted? Maybe at a minimum we can get a known_issues test together for this?)

@mcollina
Copy link
Member

@mmarchini is taking up the fix I started around May and trying to taking over the finishing line. This is going to take a while unfortunately. I'm +1 in getting a failing test in known_issues in the meanwhile.

@addaleax
Copy link
Member

@mcollina Are you talking about currentResource()? If so, yes, that is going take time, and I’d go look into my patch from above again?

@mcollina
Copy link
Member

@addaleax +1 on that.

@kamilogorek
Copy link

kamilogorek commented Dec 4, 2018

Based on my tests, you don't even need to attach request object manually to the domain. Just create a promise inside run. Minimal repro case:

const domain = require("domain");
const app = require("express")();

app.use((req, res, next) => {
  let requestDomain = domain.create();
  requestDomain.on("error", next);
  requestDomain.run(next);
});

app.get("/", (req, res) => {
  req.aPromise = new Promise(resolve => resolve("A".repeat(10 * 1024 * 1024)));
  res.send("hello world");
});

app.listen(5000);

Also not sure why it happens, but requestDomain.on("error", next); line is required in order to trigger this leak. Without it, everything is collected just fine.

addaleax added a commit to addaleax/node that referenced this issue Feb 7, 2019
Avoid circular references that the JS engine cannot see through
because it involves an `async id` ⇒ `domain` link.

Using weak references is not a great solution, because it increases
the domain module’s dependency on internals and the added calls into
C++ may affect performance, but it seems like the least bad one.

Fixes: nodejs#23862
@addaleax
Copy link
Member

addaleax commented Feb 7, 2019

I’ve opened a PR with the solution from above in #25993 … still not a great solution, though.

@danbev danbev closed this as completed in 8679932 Feb 12, 2019
danbev pushed a commit that referenced this issue Feb 12, 2019
Avoid circular references that the JS engine cannot see through
because it involves an `async id` ⇒ `domain` link.

Using weak references is not a great solution, because it increases
the domain module’s dependency on internals and the added calls into
C++ may affect performance, but it seems like the least bad one.

PR-URL: #25993
Fixes: #23862
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: Vladimir de Turckheim <vlad2t@hotmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Refael Ackermann <refack@gmail.com>
addaleax added a commit that referenced this issue Feb 13, 2019
Add a simple `WeakReference` utility that we can use until the
language provides something on its own.

PR-URL: #25993
Fixes: #23862
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: Vladimir de Turckheim <vlad2t@hotmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Refael Ackermann <refack@gmail.com>
addaleax added a commit that referenced this issue Feb 13, 2019
Avoid circular references that the JS engine cannot see through
because it involves an `async id` ⇒ `domain` link.

Using weak references is not a great solution, because it increases
the domain module’s dependency on internals and the added calls into
C++ may affect performance, but it seems like the least bad one.

PR-URL: #25993
Fixes: #23862
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: Vladimir de Turckheim <vlad2t@hotmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Refael Ackermann <refack@gmail.com>
MylesBorins pushed a commit to MylesBorins/node that referenced this issue May 17, 2019
Add a simple `WeakReference` utility that we can use until the
language provides something on its own.

PR-URL: nodejs#25993
Fixes: nodejs#23862
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: Vladimir de Turckheim <vlad2t@hotmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Refael Ackermann <refack@gmail.com>
MylesBorins pushed a commit to MylesBorins/node that referenced this issue May 17, 2019
Avoid circular references that the JS engine cannot see through
because it involves an `async id` ⇒ `domain` link.

Using weak references is not a great solution, because it increases
the domain module’s dependency on internals and the added calls into
C++ may affect performance, but it seems like the least bad one.

PR-URL: nodejs#25993
Fixes: nodejs#23862
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: Vladimir de Turckheim <vlad2t@hotmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Refael Ackermann <refack@gmail.com>
MylesBorins pushed a commit that referenced this issue May 20, 2019
Add a simple `WeakReference` utility that we can use until the
language provides something on its own.

Backport-PR-URL: #27749
PR-URL: #25993
Fixes: #23862
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: Vladimir de Turckheim <vlad2t@hotmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Refael Ackermann <refack@gmail.com>
MylesBorins pushed a commit that referenced this issue May 20, 2019
Avoid circular references that the JS engine cannot see through
because it involves an `async id` ⇒ `domain` link.

Using weak references is not a great solution, because it increases
the domain module’s dependency on internals and the added calls into
C++ may affect performance, but it seems like the least bad one.

Backport-PR-URL: #27749
PR-URL: #25993
Fixes: #23862
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Сковорода Никита Андреевич <chalkerx@gmail.com>
Reviewed-By: Vladimir de Turckheim <vlad2t@hotmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. domain Issues and PRs related to the domain subsystem. memory Issues and PRs related to the memory management or memory footprint. regression Issues related to regressions.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants