Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add noListener option to useDb() to help memory usage in cases where the app is calling useDb() on every request #9961

Closed
bhovhannes opened this issue Feb 22, 2021 · 7 comments · Fixed by #10278
Assignees
Labels
new feature This change adds new functionality, like a new method or class
Milestone

Comments

@bhovhannes
Copy link

Do you want to request a feature or report a bug?

I want to report a bug.

What is the current behavior?

We are using database per tenant approach, thus we make useDb call for each http request. After some time nodejs server goes out of memory.

If the current behavior is a bug, please provide the steps to reproduce.

My package.json file:

{
  "name": "mongoose-connection-issues",
  "version": "1.0.0",
  "private": true,
  "main": "server.js",
  "dependencies": {
    "mongoose": "5.11.17"
  }
}

The whole code to run the server (server.js):

const {createServer} = require('http')
const mongoose = require('mongoose')

const mongoUrl = process.env.MONGO_URL || 'mongodb://127.0.0.1/test'

const connectOptions = {
    useNewUrlParser: true,
    useUnifiedTopology: true
}

const useDbOptions = {
    // this causes number of otherDbs to go up, relatedDbs stays the same
    useCache: false
    
    // this causes number of relatedDbs to go up in addition to otherDbs, even worse
    //useCache: true
}


let connectPromise = mongoose.connect(mongoUrl, connectOptions)

async function database(tenantId) {
    await connectPromise;
    
    // the following line leaks memory
    const tenantConnection = mongoose.connection.useDb(`db_${tenantId}`, useDbOptions);
    
    console.log(
        `tenantId:${tenantId}`, 
        `otherDbs:${tenantConnection.otherDbs.length}`, 
        `connection.otherDbs:${mongoose.connection.otherDbs.length}`, 
        `relatedDbs:${Object.keys(tenantConnection.relatedDbs).length}`
    )

    return tenantConnection
}


async function handleRequest(id) {
    const conn = await database(id)
    const res = {
        id,
        otherDbsLength: conn.otherDbs.length,
        relatedDbsLength: Object.keys(conn.relatedDbs).length
    }
    await conn.close()
    return res
}


const server = createServer( async function(request, response) {
    const url = new URL(request.url, 'https://127.0.0.1/');
    if (url.pathname.startsWith('/ping')) {
        const result = await handleRequest(url.searchParams.get('id'))
        response.setHeader('content-type', 'application/json')
        response.write(JSON.stringify(result))
        response.end()
    }
})
server.listen(process.env.PORT || 3000);

The code to generate load (generate-load.sh):

for ((i = $1; i <= $2; i++)); do
    curl "http://localhost:3000/ping?id=${i}"
done

A docker-compose.yml to start mongo:

version: "3.6"
services:
  mongo:
    image: mongo:bionic
    container_name: test-mongo
    ports:
      - "27017:27017"

Steps to see the issue:

  1. Start mongo server via docker-compose up
  2. Start Nodejs server via npm start
  3. Generate some load via ./generate.sh 1 30

If you look at stdout output of server you'll see:

tenantId:1 otherDbs:1 connection.otherDbs:1 relatedDbs:0
tenantId:2 otherDbs:1 connection.otherDbs:2 relatedDbs:0
tenantId:3 otherDbs:1 connection.otherDbs:3 relatedDbs:0
tenantId:4 otherDbs:1 connection.otherDbs:4 relatedDbs:0
tenantId:5 otherDbs:1 connection.otherDbs:5 relatedDbs:0
tenantId:6 otherDbs:1 connection.otherDbs:6 relatedDbs:0
tenantId:7 otherDbs:1 connection.otherDbs:7 relatedDbs:0
tenantId:8 otherDbs:1 connection.otherDbs:8 relatedDbs:0
tenantId:9 otherDbs:1 connection.otherDbs:9 relatedDbs:0
tenantId:10 otherDbs:1 connection.otherDbs:10 relatedDbs:0
tenantId:11 otherDbs:1 connection.otherDbs:11 relatedDbs:0
tenantId:12 otherDbs:1 connection.otherDbs:12 relatedDbs:0
(node:21422) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 connected listeners added to [NativeConnection]. Use emitter.setMaxListeners() to increase limit
tenantId:13 otherDbs:1 connection.otherDbs:13 relatedDbs:0
tenantId:14 otherDbs:1 connection.otherDbs:14 relatedDbs:0
tenantId:15 otherDbs:1 connection.otherDbs:15 relatedDbs:0
tenantId:16 otherDbs:1 connection.otherDbs:16 relatedDbs:0
tenantId:17 otherDbs:1 connection.otherDbs:17 relatedDbs:0
tenantId:18 otherDbs:1 connection.otherDbs:18 relatedDbs:0
tenantId:19 otherDbs:1 connection.otherDbs:19 relatedDbs:0
tenantId:20 otherDbs:1 connection.otherDbs:20 relatedDbs:0
tenantId:21 otherDbs:1 connection.otherDbs:21 relatedDbs:0
tenantId:22 otherDbs:1 connection.otherDbs:22 relatedDbs:0
tenantId:23 otherDbs:1 connection.otherDbs:23 relatedDbs:0
tenantId:24 otherDbs:1 connection.otherDbs:24 relatedDbs:0
tenantId:25 otherDbs:1 connection.otherDbs:25 relatedDbs:0
tenantId:26 otherDbs:1 connection.otherDbs:26 relatedDbs:0
tenantId:27 otherDbs:1 connection.otherDbs:27 relatedDbs:0
tenantId:28 otherDbs:1 connection.otherDbs:28 relatedDbs:0
tenantId:29 otherDbs:1 connection.otherDbs:29 relatedDbs:0
tenantId:30 otherDbs:1 connection.otherDbs:30 relatedDbs:0

otherDbs collection grows over time. Looking though mongoose code I see that code pushes to that collection in useDb, but nowhere there is a code for removing from that collection.
MaxListenersExceededWarning also signals about memory leak. After each useDb mongoose setups listeners on created connection and as connection never gets deleted, event listeners stay for a while.

Passing useCache: true option to useDb makes things even worse, as relatedDbs collection also starts to grow.

What is the expected behavior?

I'd expect that when useCache: false, connection objects created by the useDb are garbage collected properly.

What are the versions of Node.js, Mongoose and MongoDB you are using? Note that "latest" is not a version.

NodeJS: 14.15.1
Mongoose: 5.11.17
MongoDB: 3.6.4 - comes as a dependency of Mongoose

@IslandRhythms IslandRhythms added the has repro script There is a repro script, the Mongoose devs need to confirm that it reproduces the issue label Feb 22, 2021
@vkarpov15 vkarpov15 added this to the 5.11.19 milestone Feb 23, 2021
@bhovhannes
Copy link
Author

Yesterday I've run ./generate.sh as ./generate.sh 1 100000, attempting to find a point when nodejs will crash because of memory leak.
It turned out that Mongoose connection itself is pretty cheap in terms of memory consumption, and, to be honest, I was not able to crash the server.
However, as soon as I added a line tenantConnection.model('foo', MySchema), memory growth rate increased significantly, and server crashed at 16000 connections, during its first GC.

The modified code (see line with unusedModelToIncreaseMemoryUsage variable):

const {createServer} = require('http')
const mongoose = require('mongoose')

const mongoUrl = process.env.MONGO_URL || 'mongodb://127.0.0.1/test'

const connectOptions = {
    useNewUrlParser: true,
    useUnifiedTopology: true
}

const useDbOptions = {
    // this causes number of otherDbs to go up, relatedDbs stays the same
    useCache: false
    
    // this causes number of relatedDbs to go up in addition to otherDbs, even worse
    //useCache: true
}


const Property = new mongoose.Schema({
    key: { type: String, required: true },
    type: { type: String, required: true },
});
const Unlocked = new mongoose.Schema({
    active: { type: Boolean },
    unlockedAt: { type: Date },
});
const Field = new mongoose.Schema({
    name: String,
    active: { type: Boolean, required: true },
    subject: String,
    properties: [Property],
    unlocked: Unlocked,
    updatedAt: { type: Date, required: true },
});
const MySchema = new mongoose.Schema({
    groupId: { type: String, unique: true },
    fieldMap: { type: Map, of: Field },
});


let connectPromise = mongoose.connect(mongoUrl, connectOptions)

async function database(tenantId) {
    await connectPromise;
    
    // the following line leaks memory
    const tenantConnection = mongoose.connection.useDb(`db_${tenantId}`, useDbOptions)

   // this significantly increases memory consumed by tenantConnection,
   // because internally connection keeps an collection of all its models.
   const unusedModelToIncreaseMemoryUsage = tenantConnection.model('foo', MySchema)
    
    console.log(
        `tenantId:${tenantId}`, 
        `otherDbs:${tenantConnection.otherDbs.length}`, 
        `connection.otherDbs:${mongoose.connection.otherDbs.length}`, 
        `relatedDbs:${Object.keys(tenantConnection.relatedDbs).length}`
    )

    return tenantConnection
}


async function handleRequest(id) {
    const conn = await database(id)
    const res = {
        id,
        otherDbsLength: conn.otherDbs.length,
        relatedDbsLength: Object.keys(conn.relatedDbs).length
    }
    await conn.close()
    return res
}


const server = createServer( async function(request, response) {
    const url = new URL(request.url, 'https://127.0.0.1/');
    if (url.pathname.startsWith('/ping')) {
        const result = await handleRequest(url.searchParams.get('id'))
        response.setHeader('content-type', 'application/json')
        response.write(JSON.stringify(result))
        response.end()
    }
})
server.listen(process.env.PORT || 3000);

@vkarpov15
Copy link
Collaborator

First of all, if you're using useDb() for multi-tenant, we strongly recommend using useCache: true. With useCache: true, Mongoose can reuse the same connection if there's a request for the same tenantId, rather than creating a new one every time.

Without useCache, you'll eventually run out of memory because the number of connections will grow with the number of requests. With useCache, the number of connections will be limited by the number of tenants.

Regarding models adding to memory usage, you can use the deleteModel() function to clean up models when you're done with your connection, which allows models to be GC-ed. On my machine, using 32MB max memory, I can only get to about 1000 connections without cleaning up models, but I can get up to about 12000 connections if I clean up models:

async function handleRequest(id) {
    const conn = await database(id)
    const res = {
        id,
        otherDbsLength: conn.otherDbs.length,
        relatedDbsLength: Object.keys(conn.relatedDbs).length
    }
    conn.deleteModel(/.*/) // Delete all models on `conn`
    await conn.close()
    return res
}

We'll add a new method to free up a connection for GC, because that's tricky to do right now due to event emitters. For now, I'd recommend managing connections manually if you expect to have an unlimited number of tenants, or use useCache if you have < 10k tenants or so.

@vkarpov15 vkarpov15 changed the title Memory leak because of connections created by useDb Allow garbage collection to clean up useDb() connections that are closed with close({ permanent: true }) Mar 2, 2021
@vkarpov15 vkarpov15 added new feature This change adds new functionality, like a new method or class and removed has repro script There is a repro script, the Mongoose devs need to confirm that it reproduces the issue labels Mar 2, 2021
@vkarpov15 vkarpov15 modified the milestones: 5.11.19, 5.12 Mar 2, 2021
@bhovhannes
Copy link
Author

@vkarpov15 if we are speaking about new api here, what do you think of adding a maxCacheSize option for limiting size of internal cache collections (otherDbs and relatedDbs) to the options argument passed to useDb (or createConenction) ?

Something like this:

const tenantConnection = mongoose.connection.useDb(`db_${tenantId}`, {
  useCache: true,
  maxCacheSize: 999
})

where cache internally is a LRU cache which evicts old connections releasing them for GC as new connections come in as soon as limit is reached.

I think for consumers it will be easier to specify this option than think of dropping models and connections manually on each request.
Dropping models and connections manually can be still useful for script-based scenarios, so this is probably an addition to api you propose.

What do you think?

I will be happy to send an MR for implementing both additions, just want to make sure I understand your vision of api correctly.

@vkarpov15
Copy link
Collaborator

The cache approach is possible, but I don't think it would work well because we would have to close connections that are evicted from the cache. The worst case scenario is that Mongoose has to evict and close a connection that is actively being used, which leads to a bad user experience.

In general, I'd also say that opening and closing connections on every request is bad practice. If you have 100k+ tenants, then I think you have to open/close connections on every request. In most multi-tenant environments I've seen, you keep connections open on a per-tenant basis because the number of tenants is relatively small and the cost of opening a new connection is high.

@vkarpov15
Copy link
Collaborator

I took a closer look and there's some issues with the example script. First, do not call await conn.close() on every request - that closes every single connection, which is what causes the "MaxListenerWarning" that you mentioned.

In addition to that, we've added a noListener option that's analogous to the MongoDB driver's noListener option that significantly reduces memory overhead for cases when you're using useDb() for every request. So in v5.12.0, you'll be able to do:

const useDbOptions = {
    // this causes number of otherDbs to go up, relatedDbs stays the same
    useCache: false,
    noListener: true
    
    // this causes number of relatedDbs to go up in addition to otherDbs, even worse
    //useCache: true
}

And:

async function handleRequest(id) {
    const conn = await database(id)
    const res = {
        id,
        otherDbsLength: conn.otherDbs.length,
        relatedDbsLength: Object.keys(conn.relatedDbs).length
    }
    conn.deleteModel(/.*/) // Delete all models on `conn`
    // Note that there's no `await conn.close()` here, that's intentional!
    return res
}

And that should drastically reduce memory usage and significantly increase the amount of time your server can go without running out of memory 👍

vkarpov15 added a commit that referenced this issue Mar 10, 2021
…e you're using `useDb()` on every request

Fix #9961
@vkarpov15 vkarpov15 changed the title Allow garbage collection to clean up useDb() connections that are closed with close({ permanent: true }) Add noListener option to useDb() to help memory usage in cases where the app is calling useDb() on every request Mar 10, 2021
@bhovhannes
Copy link
Author

bhovhannes commented Mar 10, 2021

... do not call await conn.close() on every request

We don't, I added that line only to illustrate that even calling conn.close() does not release connection for the GC.

... we've added a noListener option that's analogous ...

I am trying to understand in which cases adding noListener will change application behavior. I went through the commit, and have maybe the same general question. Why mongoose has that mechanism for updating connection states. Isn't that handled by mongo driver itself?

Thank you very much for all good advices and noListener option. Cleaning up models and specifying noListener will certainly make mongoose to use less memory in a long run.

@vkarpov15
Copy link
Collaborator

The MongoDB driver handles updating connection states under the hood, but Mongoose connections still need to be aware of them. And, in the case where you're creating new connections with useDb(), every connection you create needs to be aware of state changes. We're considering replacing this event emitter based approach to tracking state now that we're planning on removing Mongoose buffering in 6.0 with #8702

This was referenced Mar 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature This change adds new functionality, like a new method or class
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants