-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] Populate.next roadmap - feedback welcome #1292
Conversation
Fixed a typo in the migration docu
I do not agree with the 4 point. I think at now it works properly. Why should I have an ids whose documents don't exists in database? |
@vovan22 users are allowed to pass query conditions to filter out docs as well. query.populate(path, fields, match, options) |
@vovan22, I would agree in the sense that the model layer should never make changes to the underlaying data without your knowledge. If it drops ids from an array of references without your knowledge then you have potential for even more corruptions in your data set. As for #4 in general, I think consistency of data structures is critical here. The array either needs to be an array of model objects or an array of ids. There should never be a situation where the structures change depending on whether someone has previously populated a reference or not. ie. When trying to get the id of a referenced document in an array I should never have to use instanceof operators. I think population in general needs to allow for access to the direct field with the option of retrieving the referenced document. If a field in mongo is an array of object ids then I should be able to access that field as an array of object ids or optionally retrieve the referenced documents via a populate call, but the field itself should remain an array of object ids. For example if I have a model called Post which references an array of Subscribers as subscribers then: post.subscribers should return an array of object ids. if populate is smart enough, it will cache the response so multiple calls to populate would return the same list. If modifications to the subscribers array are made, the cache can be dropped or altered. I know keeping these in sync would be difficult. If I am confused in these please help clarify. |
@braunsquared I agree that we need to expose both the original _ids as well as populated docs separately somehow. the problem i see with caching within |
@aheckmann I agree with what you are saying and understand the complexity of adding a cache management system which can add lots of bloat as well. My main concern is keeping it consistent and preventing the need to pass references around to populated fields outside of the primary object reference. What about the possibility of wrapping the populate functionality in a DocumentArray class which understands how to populate the array of references, the query that was used etc etc. We could then manually reload if need be rather than bombard the db with unnecessary duplicate queries? As for non-array fields. IMO the field should always return an object reference regardless of whether it is populated or not. If it hasn't been populated, the document will only have an _id field and we can call populate/load on the object directly. Also, please use the $in call very carefully. From our experience it is faster to perform multiple queries than use the $in operator even when proper indexes are in use. |
@braunsquared I've not observed any performance impacts when using non-array fields: "fake" docs don't seem to help this scenario. if anything it makes it less clear that a document was not found. i don't want to think any more than i have to. wrapping in a document-array: i don't see any benefit to this. Another option based on your earlier suggestion: Model.findById(id).populate('friends').exec(function (err, doc) {
var originalIds = doc.friends;
var populatedDocs = doc.populated('friends');
}) This means original paths are never overwritten. No "syncing" between original path and populated data would occur when manipulating the original array or populated array, this exercise is left to the user. These would be strictly separate views. |
Going with |
@aheckmann Sounds great. I'm just thinking out loud. As for $in, it depends on the situation. When you throw in $in with sort options mongo doesn't leverage indexes as well as it should and it's very easy to fall out of the index world into the scan complete collection world. On small collections this generally doesn't cause an issue but when you are referring to millions of records of decently sized docs, the performance hit can be detrimental. This was last tested by us with v1.8 of mongod so it may have improved since then. |
@braunsquared thanks for thinking out loud, its very helpful. re: |
Hi i've use population feature in my project and have to rewrite it for personall use. or like that this will load all person that has addres with city New York the engine can work with relation cardinality 1:1 or 1:* and even :. the engine just traverse the query tree and builds params for populate relation, and on findExec it loads all the items within document. childs also can have conditions onlyIds, exists notExists attribute that shape the result tree to fit specific conditions. |
fixed "psuedo" typo in Readme
Fixed incorrect annotation in SchemaNumber#min
fixes - cpu spikes - reconnection issues with primaries
https://github.com/promises-aplus/promises-spec adds promise#then, promise#reject
useful when cleaning up old properties that are no longer in the schema. http://bytes.goodeggs.com/post/36553128854/how-to-remove-a-property-from-a-mongoosejs-schema
added doc#populate added Model#populate standardize query,document,Model populate arguments Model .find() .populate({ path: "_creator" , select: "name age" , match: { age: { $gte: 21 }} , options: { sort: { age: -1 }} }) .exec() TODO needs to support multi docs + non Mongoose Documents & friends code is in non-working state
currently retaining _ids for documents not found in populate queries
closing for now. |
merged into 3.6x |
Was there any decision on the solution to point 3, AKA #601? |
Core population functionality it now exposed for users to do this manually. |
Hey Aaron - thanks, very helpful. What about point 1? CompanyMembership.statics.getCompaniesForUser = function (userId, callback) {
this.find({ user: userId })
.exec(function (err, memberships) {
var opts = [
{ path: 'company' },
{ path: 'company.createdBy' }
];
this.populate(memberships, opts, function (err, memberships) {
// memberships[0].company.createdBy is still just an ObjectId
});
}.bind(this));
}; This isn't working quite how I expect - maybe I've got the wrong syntax? I'm expecting |
a membership looks like: in that case, populating |
I know I'm coming in late to this discussion, but I'm wondering if any care has been taken for this comment made by @braunsquared. I'm running into the situation where I have to check to see if the field is populated:
This is really hacky and unintuitive. I feel uncomfortable with referencing I found this to be the most relevant place for this comment. If there's somewhere else you'd like me to discuss this issue, I'd be happy to move this comment there. |
@YourDeveloperFriend I agree, I've been working on a new project and already it's awkward explaining to new devs that "sometimes the 'person' field is an object, but in the database it's an ObjectId". however sometimes I do actually want the person field to be a personId, so in the code it makes it confusing. One suggestion is to allow an additional option: a fieldname to populate into. That way it's backwards compatible, if you don't set a special fieldName. It would be more like a virtual field, and we could always expect the initial reference field to exist:
|
@mattcasey I totally agree. That was the solution I was envisioning. |
DO NOT MERGE
For release 3.6 our primary objective is a rewrite of the
populate
functionality. Currently it suffers from several limitations:1) Poor support for nested path population
Given a schema like the following:
An attempt to populate the following path will not work as expected:
2) Cannot further populate a document after population
3) Recursive sub-document population not supported
References more than one level deep are not resolved.
#601
4) Original _ids not found in populate query do not exist in returned document
Given the following document:
and we query/populate the documents friends
All
friends
ids that are not found during population are dropped on the floor. For example, if friends 7 and 34 are found, 8 is dropped from the returned document:5) Inefficient queries
We currently execute a separate query per document per path.
6) Clumsy population syntax when specifying multiple params
http://mongoosejs.com/docs/api.html#query_Query-populate
7) Assigning values to populated paths casts the value back to an _id
#570
8) Lean queries cannot be populated
#1260
NOTES
So far, items #1, #2, #5, #6, #8 have been fixed in the populate-wip branch.
#1, #2, #3, #6, #8 solutions
The core populate functionality is being rewritten and exposed through
Model.populate(docs, paths, cb)
. This core functionality will now also be made accessible throughdocument#populate(path, options, cb)
.Query#populate
will still exist as in previous versions.This refactor and exposure opens the door to populating plain js objects as well as mongoose documents, meaning population of documents returned from
lean
queries will now be supported.The syntax of
Query#populate()
is being cleaned up (yet backward compatible). Its interface will support passing an options object containing all knobs instead of having to pass nulls as placeholders:This syntax also supports populating multiple paths with the same options:
As for auto-populating multi-level deep refs, this will be supported through the public exposure of
Model.populate(docs, options, cb)
which can be utilized to continue populating returned documents ad infinitum.#4 solution may require breaking changes
arrays
For populated arrays we could associate the original ids as an ad-hoc property so we don't lose anything. However, it makes managing array manipulation a headache, ensuring the ad-hoc list of original _ids stays in sync with the exposed array of populated documents in cases where the user then saves the original document is pretty silly.
single properties
When single fields are populated and no value was found (due to specified query matchers etc) we currently replace the value with
null
(the value returned by the query). This means we lose the original_id
. No bookkeeping is really necessary in this case b/c thenull
value will not get saved unless the user modifies it somehow. But, we still lose access to the original _id.solution
When no matching document is found for a given
_id
, we leave that_id
in the document. This means that no_ids
would be lost after population and the risk of manipulating a populated array and saving the document back to the db would be mitigated. The result is that populated arrays may contain a mix of_ids
and documents and it will be up to your code to manage either type. No bookkeeping required. Same goes for populating an individual path, the result could be a path with either the original_id
or the found document. At this time, this is the direction we are taking but by no means is this final yet. The goal is to remain 3.x backward compatible and hide this new behavior behind and option. Feedback is welcome.