-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Massive memory usage when creating a document with approximately 30k+ subdocuments #11541
Comments
Can you please provide an example of what your document looks like? |
@vkarpov15 Here is slightly simplified version of the schema (a reduced number of properties, the structure is the same) all in one place for an example. Thanks!
|
Is there any more information I can give to help with this? I understand that as I have presented it you haven't got much to go on but I am not sure what sort of information would be helpful. Would the output of memory profiling, for example, be useful? Thanks, Robert |
We're working our way through this, we confirmed that the below script takes about 10x the memory of just using a POJO: 'use strict';
const mongoose = require('mongoose');
const { Schema } = mongoose;
const geoPointSchema = new Schema({
type: {
type: String,
enum: ["Point"],
required: true,
},
coordinates: {
type: [Number],
required: true,
},
});
const journeySchema = new Schema({
status: {
type: String,
enum: ["available", "completed", "cancelled"],
},
start_point_text: String,
start_point_coordinates: {
type: geoPointSchema,
index: "2dsphere",
},
end_point_text: String,
end_point_coordinates: {
type: geoPointSchema,
index: "2dsphere",
},
start_time: Date,
end_time: Date,
});
const journeySummarySchema = new Schema({
id: Schema.Types.ObjectId,
private: Boolean,
groups: [{ id: Schema.Types.ObjectId }],
created: Date,
start_time: Date,
end_time: Date,
start_point_coordinates: geoPointSchema,
start_point_text: String,
end_point_coordinates: geoPointSchema,
end_point_text: String
});
const userSummarySchema = new Schema({
role: { type: String, enum: ["driver", "requested", "passenger", "not-part-of-journey"] },
});
const journeyEventSchema = new Schema({
time: Date,
journey: journeySummarySchema,
user: userSummarySchema,
action: String, /*[
"create",
"cancel",
"join"
],*/
reconstructed: { type: Boolean, default: false },
});
const journeyStateUserSummarySchema = new Schema({
...userSummarySchema.obj,
status: String //"confirmed" | "refunded",
});
const journeyStateJourneySummarySchema = new Schema(
{
...journeySummarySchema.obj,
request_status: {
type: String,
enum: ["available", "accepted"],
},
status: {
type: String,
enum: ["available", "completed", "cancelled"],
},
}
);
const journeyStateSchema = new Schema({
_id: Schema.Types.ObjectId,
journey: journeyStateJourneySummarySchema,
complete_journey: journeySchema,
user: journeyStateUserSummarySchema,
});
const searchEventSchema = new Schema({
time: Date,
start_point_coordinates: geoPointSchema,
end_point_coordinates: geoPointSchema,
});
const userEventLogSchema = new Schema({
schema_version: Number,
user: {
id: Schema.Types.ObjectId,
},
events: {
journeys: [journeyEventSchema],
searches: [searchEventSchema],
},
states: {
journeys: [journeyStateSchema],
},
});
const UserEventLog = mongoose.model('UserEventLog', userEventLogSchema);
run().catch(err => console.log(err));
async function run() {
const doc = new UserEventLog({});
//const doc = { events: { journeys: [], searches: [] }, states: { journeys: [] } };
setInterval(() => {
console.log('[Timer] Memory usage:', process.memoryUsage().heapUsed / (1024 ** 2));
}, 2_000);
const start = Date.now();
for (let i = 0; i < 10000; ++i) {
doc.events.journeys.push({
journey: {
created: new Date(),
start_point_coordinates: { type: 'Point', coordinates: [0, 0] }
},
user: {
role: 'driver'
}
});
doc.events.searches.push({
start_point_coordinates: { type: 'Point', coordinates: [0, 0] }
});
doc.states.journeys.push({
journey: {
created: new Date(),
start_point_coordinates: { type: 'Point', coordinates: [0, 0] },
request_status: 'available'
},
complete_journey: {
status: 'available',
start_point_coordinates: { type: 'Point', coordinates: [0, 0] }
},
user: {
role: 'driver',
status: 'confirmed'
}
});
}
console.log('Done', Date.now() - start);
} We don't know exactly why yet, managed to reduce some memory usage by removing calls to |
…le of document properties Re: #11541
63af194 makes some improvements. Before:
After:
Slightly better. Will keep working on a few other ideas we have to trim this down some more. |
Another way to trim down the overhead is to get rid of defaults on your schemas. The default mongoose.Schema.Types.DocumentArray.set('default', undefined);
const { Schema } = mongoose;
const geoPointSchema = new Schema({
type: {
type: String,
enum: ["Point"],
required: true,
},
coordinates: {
type: [Number],
required: true,
default: undefined
},
}, { _id: false });
const journeySchema = new Schema({
status: {
type: String,
enum: ["available", "completed", "cancelled"],
},
start_point_text: String,
start_point_coordinates: {
type: geoPointSchema,
index: "2dsphere",
},
end_point_text: String,
end_point_coordinates: {
type: geoPointSchema,
index: "2dsphere",
},
start_time: Date,
end_time: Date,
}, { _id: false });
const journeySummarySchema = new Schema({
id: Schema.Types.ObjectId,
private: Boolean,
groups: [{ id: Schema.Types.ObjectId }],
created: Date,
start_time: Date,
end_time: Date,
start_point_coordinates: geoPointSchema,
start_point_text: String,
end_point_coordinates: geoPointSchema,
end_point_text: String
}, { _id: false });
const userSummarySchema = new Schema({
role: { type: String, enum: ["driver", "requested", "passenger", "not-part-of-journey"] },
}, { _id: false });
const journeyEventSchema = new Schema({
time: Date,
journey: journeySummarySchema,
user: userSummarySchema,
action: String, /*[
"create",
"cancel",
"join"
],*/
reconstructed: { type: Boolean },
}, { _id: false });
const journeyStateUserSummarySchema = new Schema({
...userSummarySchema.obj,
status: String //"confirmed" | "refunded",
}, { _id: false });
const journeyStateJourneySummarySchema = new Schema(
{
...journeySummarySchema.obj,
request_status: {
type: String,
enum: ["available", "accepted"],
},
status: {
type: String,
enum: ["available", "completed", "cancelled"],
},
},
{ _id: false }
);
const journeyStateSchema = new Schema({
_id: Schema.Types.ObjectId,
journey: journeyStateJourneySummarySchema,
complete_journey: journeySchema,
user: journeyStateUserSummarySchema,
}, { _id: false });
const searchEventSchema = new Schema({
time: Date,
start_point_coordinates: geoPointSchema,
end_point_coordinates: geoPointSchema,
}, { _id: false });
const userEventLogSchema = new Schema({
schema_version: Number,
user: {
id: Schema.Types.ObjectId,
},
events: {
journeys: [journeyEventSchema],
searches: [searchEventSchema],
},
states: {
journeys: [journeyStateSchema],
},
});
const UserEventLog = mongoose.model('UserEventLog', userEventLogSchema); We get to:
We'll look to see if we can support making the |
Thank you for your hard work with this. I have actually ended up moving away from this method of solving the problem and ended up mirroring my data in bigquery but hopefully this will be helpful for others in the future. |
In f1c5412 we allowed disabling
With that line, we get:
Without that line:
The tradeoff is no |
…de` on prototype to avoid unnecessary memory usage Re: #11541
…en creating document array with schema re: #11541
perf(document): avoid creating unnecessary empty objects when creating a state machine
Do you want to request a feature or report a bug? Bug
What is the current behavior?
When saving a large document (9.66MB) I am getting what looks to be some sort of memory leak. I initially thought that this might just be a result of my document being too big but I have concluded that the behaviour I am seeing does seem to constitute a bug as far as I can tell at the moment. Perhaps it would be a good idea not to have such a large document but I feel that what I am trying to do should work, at least at this scale.
This document contains a large array of subdocuments (low hundreds). Updating the document takes a long time (about 50 seconds for this particular 9.66MB I am using as an example). During this time, memory usage as observed in the Chrome profiler jumps from around 65MB to around 500MB. The document does appear to be saved successfully, it just takes a long time and uses a lot of memory.
I am finding it a little difficult to debug this error so do let me know what information I can give to be more helpful.
I have watched the memory change over time and recorded the flow through my function and I am pretty sure that it is specifically the .save() function on a document instance which is taking ~50 seconds and that the memory increase is happening after the call to save. I do not have any hooks defined on the model.
If the current behavior is a bug, please provide the steps to reproduce.
What is the expected behavior?
Much less memory usage and quicker save.
What are the versions of Node.js, Mongoose and MongoDB you are using? Note that "latest" is not a version.
Mongoose: 6.2.7, Node v14.17.5, MongoDB: 5.0.6
The text was updated successfully, but these errors were encountered: