-
-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On-the-Fly Conversions #45
Comments
How could the What about a createdAt: {
type: Date,
autoValue: function () { ... },
version: 2,
upgrade: function (oldValue, version) {
// In v1 `createdAt` was of type String
if (version === 1)
return new Date(oldValue);
}
} |
Yes, maybe it will have to be a separate function rather than using autoValue. I don't think a simple old value -> new value function handles enough cases, though, so I'd rather use the same |
But for performances reasons, to avoid making more than one Then the find will automatically add the |
|
Yes, I forgave this :-) So maybe |
As this essentially seems to introduce partial versioning (and an extra collection), I'll add the follwoing info from mcrider/azimuth#69
|
As I see it, versioned data is different from versioned schemas. This issue is about supporting loose schema versioning, as a way to do simple on-the-fly data conversions when I decide to change the data model for collections that already contain production data. Your comments, though interesting, seem to be more about versioned collections, wherein documents are kept over time rather than being replaced. Is that correct, or was I reading too quickly? |
If I was to do a Example.. using @mquandalle's example:
I need to find docs even if |
If I was to add a new
If there is no real reason to have autoValue then I think it shouldn't be necessary.. assuming you take the |
@craig-l, I agree with all that. In your date find example, I think that should be cause for kicking off conversions for all the documents synchronously before returning from Only a rough thought at the moment, but maybe we need both upgrade and downgrade? Maybe something involving returning a selector or partial selector to identify documents needing updates? |
This will be simplest when querying on only fields that do |
schema vs. data versions Yes, you are correct in what I linked the If a collection is set to be Different language versions of a post or article content, from what I read about the mongo text index, need to be stored in separate mongo documents with their Now, schema version updates (on the fly or bulk) add another important vector to this. If i understand it correctly, from a C2 perspective they introduce the presence of different schemas in the same collection (nothing untypical in the mongo world). Incidentially, mybe you already noticed that in issue #54 I posted some collection design ideas that I came accross, and one is to see schemas as linked to documents rather than collections. So all points above introduce different schemas into C2 collections, that should be trackable across document versions and between+within collections. OK, where to start? |
BTW: The meteor book update is said to cover a new migrations package now: https://github.com/percolatestudio/meteor-migrations |
I was really thinking of very simple migrations only. I've looked briefly at meteor-migrations, and I think it's probably a really good solution, but it doesn't seem to be designed for on-the-fly migrations. I try to migrate data on the fly for pretty much every schema change I make. I see your point about schema versioning having an impact on the ability to version documents. We'll keep it in mind. I have no specific ideas at the moment. |
How do you think about data that is only very seldomly accessed after while, like archives or old versions? It doesn't feel so right to me to just leave them untouched, requiring continious concideration and dependence on the migration code (e.g. avoid direct db accesses). |
BTW "on-the-fly migration" probably deserve it's own package. There are no real reason to implement this in Collection2. Moreover usually c2 schema are defined in files shared by the client and the server, and we probably want to keep migrations on the server side only. Since collection2 now overwrite the native |
Agree On Wed, Feb 26, 2014 at 7:10 PM, Maxime Quandalle
|
transform Function |
I find myself needing this, as we're writing a lot of unnecessary migrations to add defaultValues, autoValues etc on production data very often. The important part is to not update some autoValues (like I guess a simple function like this should do the job, just call it from a method somewhere (requires import { Mongo } from 'meteor/mongo';
import omit from 'lodash/omit';
const skippedCollections = ['_cacheMigrations', 'grapher_counts'];
const skippedFields = ['_id', 'createdAt', 'updatedAt'];
const makeCleanDocument = (collection, schema) => ({ _id, ...doc }) => {
const cleanDoc = schema.clean(doc, {
mutate: true,
filter: true,
autoConvert: true,
removeEmptyStrings: false,
trimStrings: true,
getAutoValues: true,
});
const withoutSkippedFields = omit(cleanDoc, skippedFields);
// Sometimes empty documents can slip through, and update will fail because $set is empty
if (!withoutSkippedFields || Object.keys(withoutSkippedFields).length === 0) {
console.log('empty document', _id);
return Promise.resolve();
}
return collection.instance
.rawCollection()
.update({ _id }, { $set: withoutSkippedFields });
};
const cleanCollection = (collection) => {
if (
!collection.name
|| skippedCollections.includes(collection.name)
|| !collection.instance._c2
) {
return;
}
const schema = collection.instance._c2._simpleSchema;
const allDocuments = collection.instance.find({}).fetch();
return Promise.all(allDocuments.map(makeCleanDocument(collection, schema)));
};
export const cleanAllData = async () => {
const collections = Mongo.Collection.getAll();
await Promise.all(collections.map(cleanCollection));
}; |
I have a rough idea of a way to handle on-the-fly conversions.
convert
option to schema, false by default.find
method. Ifconvert: true
andautoValue
is set, call an asynchronousupdate
for all of the documents thatfind
is returning, right before they are returned.autoValue
for each field needing conversion, then do $set for that field with the returned auto value.The idea is that most schema changes don't require large scale data conversions. Instead, each document can be converted the first time it is requested. Since we'd do the conversion updates asynchronously, it wouldn't affect
find
performance all that much. Unconverted documents would be used temporarily until the conversion completes, causing deps update.One key would be to be able to track when the conversion has run for a doc. This likely requires creating an indexed tracking collection to be used internally by C2. This would do two things: prevent running the conversion more than once per doc, and allow the user to see when a conversion has been run on all docs, meaning that any code handling the old schema can be removed. To do this accurately, we might also need a
version
option, or maybeconvert
can be set to a version identifier instead oftrue
.Anyone with thoughts, feel free to comment.
The text was updated successfully, but these errors were encountered: