-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
swingstore schema migration framework #8089
Comments
In talking with @toliaqat today, we settled on a safer pattern: export function initBundleStore(db, version) {
if (!version) {
db.run('CREATE TABLE ..'); // v1 schema
version = 1;
}
if (version === 1) {
db.run('ALTER TABLE ..'); // delta v1->v2
version = 2;
}
// v2 schema: CREATE TABLE ..
if (version === 2) {
db.run('ALTER TABLE ..'); // delta v2->v3
version = 3;
}
// v3 schema: CREATE TABLE ..
return version;
} The const currentVersion = deduceVersion(); // use 'didExist' and contents of 'version' table
const targetVersion = 3;
const bundleVersion = initBundleStore(db, currentVersion);
assert.equal(bundleVersion, targetVersion);
const kvVersion = initKVStore(db, currentVersion);
assert.equal(kvVersion, targetVersion);
// repeat for all components By always starting with the original schema, we know it's always possible to migrate it to the current one. This keeps us honest, and removes the potential for a bug where newly-created DBs get a slightly different schema than old-and-migrated DBs. If On the downside, it adds overhead in the common case, where we're creating a brand new DB and want the current version. Every DB creation must replay the historical trail. Ontogeny recapitulates phylogeny. The It would be great if there were a way to assert that the final DB, after all the alterations, matches a more declarative statement of the schema. I know SQLite has a meta-table which lists all other tables and the SQL text used to create them. It might be interesting to query that table and do a text comparison, but 1: there are probably lots of trivial differences (whitespace), and 2: it wouldn't capture things like |
Wondering if we may want to have a different version for the snapStore / transcriptStore and bundleStore.
I am uncomfortable with implicit migrations on updates. Can't we assert the right version on open and require an explicit migration function to be run? Separately, I have mixed feelings about requiring to run the migration steps for a brand new DB. It requires us to keep historical migration functions in code forever, and potentially some logic associated to that version. On the other hand, I do agree it give us more confidence a new DB has the same shape as an upgraded DB. |
I guess that'd look like I'm dubious. On one hand, it introduces the question of what should we do if some stores have a version in there and some don't, or if some stores are out of date and others are not. I'm sure we'd never want to migrate one store and not the others. It would allow us to avoid writing a bunch of empty upgrader clauses (if the 3-to-4 delta changes only bundleStore, then all the other stores would need dummy clauses, which might feel annoying). I think it'd be the most clear to have fewer options: use a single version across the whole DB, so any code which ever needs to touch multiple tables at once (none yet, but maybe some day) can be confident that it has the latest of everything. And it lets us talk about the state of the DB much more simply (imagine the log message we emit during upgrade: is it just "upgrading swingstore DB from version 2 to 3", or "upgrading swingstore DB bundleStore component from version 2 to 3; snapStore is up-to-date at version 4; kvStore is up-to-date at version 1; transcriptStore is up-to-date at version 6").
I'm not against that.. is there a good place within the cosmic-swingset upgrade handler to run it? I'm not sure how to square that approach within the current "
Yeah, we figured it would feel a bit weird, but it removes the path-dependency concerns pretty directly. No chances that some DBs get one schema and others get a different one, just because of where they started. |
#8060 added a Combined we can:
The main problem in this setup is if we attempt to do a data migration during a cosmos upgrade, like we're planning on doing now. The |
Previously, the swingstore importer would ignore "historical metadata": records (with hashes) for transcripts and heap snapshots that are not strictly necessary to rebuild workers. This was a mistake: our intention was to always preserve these hashes, so that we might safely (with integrity) repopulate the corresponding data in the future, using artifacts from untrusted sources. This commit rewrites the importer to record *all* metadata records in the first pass, regardless of whether we want historical data or not. All of these records will be stubs: they contain hashes, but are missing the actual bundle or snapshot or transcript items, as if they had been pruned. Then, in the second pass, we populate those stubs using the matching artifacts (or ignore the historical ones, as configured by the `includeHistorical` option). A final `assertComplete` pass insists that all the important (non-historical) records are fully populated. The exporter was updated to omit empty artifacts. New tests were added to assert that metadata is preserved regardless of import mode, and that the `assertComplete` pass really catches everything. Also, we check that an import throws if given a mis-sized artifact, like a transcript span that is missing a few items. A new `docs/swingstore.md` was added to describe the data model, including what it means for records to be pruned, and `docs/data-export.md` was updated. Note: this commit changes the schema of the `snapshots` table (to support temporarily-unpopulated `inUse = 1` snapshot data). To be precise, any swing-store created by this version (either via `initSwingStore` or `importSwingStore`) will get the new schema: pre-existing DBs opened with `openSwingStore` will continue to use the old/strict schema. This is fine for now, but as the comments in snapStore.js explain, we'll need to implement DB schema versioning and upgrade (#8089) before we can safely change any non-`importSwingStore` code to create unpopulated `inUse=1` records. fixes #8025
Oh, also the schema upgrade might create new export-data records to be sent to a callback. So whatever drives the upgrade needs to provide I'm ok with having a separate So, I think this is driving us towards: import { upgradeSwingStore, openSwingStore } from '@agoric/swing-store';
// might upgrade, if so it calls exportCallback() zero or more times and then does a commit
await upgradeSwingStore(dbDir, { exportCallback });
// maybe commit the export-data to host-app DB now
// now open the swingstore for real
const { hostStorage, kernelStorage } = openSwingStore(dbDir, { exportCallback, keepTranscripts: true }); |
Previously, the swingstore importer would ignore "historical metadata": records (with hashes) for transcripts and heap snapshots that are not strictly necessary to rebuild workers. This was a mistake: our intention was to always preserve these hashes, so that we might safely (with integrity) repopulate the corresponding data in the future, using artifacts from untrusted sources. This commit rewrites the importer to record *all* metadata records in the first pass, regardless of whether we want historical data or not. All of these records will be stubs: they contain hashes, but are missing the actual bundle or snapshot or transcript items, as if they had been pruned. Then, in the second pass, we populate those stubs using the matching artifacts (or ignore the historical ones, as configured by the `includeHistorical` option). A final `assertComplete` pass insists that all the important (non-historical) records are fully populated. The exporter was updated to omit empty artifacts. New tests were added to assert that metadata is preserved regardless of import mode, and that the `assertComplete` pass really catches everything. Also, we check that an import throws if given a mis-sized artifact, like a transcript span that is missing a few items. A new `docs/swingstore.md` was added to describe the data model, including what it means for records to be pruned, and `docs/data-export.md` was updated. Note: this commit changes the schema of the `snapshots` table (to support temporarily-unpopulated `inUse = 1` snapshot data). To be precise, any swing-store created by this version (either via `initSwingStore` or `importSwingStore`) will get the new schema: pre-existing DBs opened with `openSwingStore` will continue to use the old/strict schema. This is fine for now, but as the comments in snapStore.js explain, we'll need to implement DB schema versioning and upgrade (#8089) before we can safely change any non-`importSwingStore` code to create unpopulated `inUse=1` records. fixes #8025
Previously, the swingstore importer would ignore "historical metadata": records (with hashes) for transcripts and heap snapshots that are not strictly necessary to rebuild workers. This was a mistake: our intention was to always preserve these hashes, so that we might safely (with integrity) repopulate the corresponding data in the future, using artifacts from untrusted sources. This commit rewrites the importer to record *all* metadata records in the first pass, regardless of whether we want historical data or not. All of these records will be stubs: they contain hashes, but are missing the actual bundle or snapshot or transcript items, as if they had been pruned. Then, in the second pass, we populate those stubs using the matching artifacts (or ignore the historical ones, as configured by the `includeHistorical` option). A final `assertComplete` pass insists that all the important (non-historical) records are fully populated. The exporter was updated to omit empty artifacts. New tests were added to assert that metadata is preserved regardless of import mode, and that the `assertComplete` pass really catches everything. Also, we check that an import throws if given a mis-sized artifact, like a transcript span that is missing a few items. A new `docs/swingstore.md` was added to describe the data model, including what it means for records to be pruned, and `docs/data-export.md` was updated. Note: this commit changes the schema of the `snapshots` table (to support temporarily-unpopulated `inUse = 1` snapshot data). To be precise, any swing-store created by this version (either via `initSwingStore` or `importSwingStore`) will get the new schema: pre-existing DBs opened with `openSwingStore` will continue to use the old/strict schema. This is fine for now, but as the comments in snapStore.js explain, we'll need to implement DB schema versioning and upgrade (#8089) before we can safely change any non-`importSwingStore` code to create unpopulated `inUse=1` records. fixes #8025
Previously, the swingstore importer would ignore "historical metadata": records (with hashes) for transcripts and heap snapshots that are not strictly necessary to rebuild workers. This was a mistake: our intention was to always preserve these hashes, so that we might safely (with integrity) repopulate the corresponding data in the future, using artifacts from untrusted sources. This commit rewrites the importer to record *all* metadata records in the first pass, regardless of whether we want historical data or not. All of these records will be stubs: they contain hashes, but are missing the actual bundle or snapshot or transcript items, as if they had been pruned. Then, in the second pass, we populate those stubs using the matching artifacts (or ignore the historical ones, as configured by the `includeHistorical` option). A final `assertComplete` pass insists that all the important (non-historical) records are fully populated. The exporter was updated to omit empty artifacts. New tests were added to assert that metadata is preserved regardless of import mode, and that the `assertComplete` pass really catches everything. Also, we check that an import throws if given a mis-sized artifact, like a transcript span that is missing a few items. A new `docs/swingstore.md` was added to describe the data model, including what it means for records to be pruned, and `docs/data-export.md` was updated. Note: this commit changes the schema of the `snapshots` table (to support temporarily-unpopulated `inUse = 1` snapshot data). To be precise, any swing-store created by this version (either via `initSwingStore` or `importSwingStore`) will get the new schema: pre-existing DBs opened with `openSwingStore` will continue to use the old/strict schema. This is fine for now, but as the comments in snapStore.js explain, we'll need to implement DB schema versioning and upgrade (Agoric#8089) before we can safely change any non-`importSwingStore` code to create unpopulated `inUse=1` records. fixes Agoric#8025
Previously, the swingstore importer would ignore "historical metadata": records (with hashes) for transcripts and heap snapshots that are not strictly necessary to rebuild workers. This was a mistake: our intention was to always preserve these hashes, so that we might safely (with integrity) repopulate the corresponding data in the future, using artifacts from untrusted sources. This commit rewrites the importer to record *all* metadata records in the first pass, regardless of whether we want historical data or not. All of these records will be stubs: they contain hashes, but are missing the actual bundle or snapshot or transcript items, as if they had been pruned. Then, in the second pass, we populate those stubs using the matching artifacts (or ignore the historical ones, as configured by the `includeHistorical` option). A final `assertComplete` pass insists that all the important (non-historical) records are fully populated. The exporter was updated to omit empty artifacts. New tests were added to assert that metadata is preserved regardless of import mode, and that the `assertComplete` pass really catches everything. Also, we check that an import throws if given a mis-sized artifact, like a transcript span that is missing a few items. A new `docs/swingstore.md` was added to describe the data model, including what it means for records to be pruned, and `docs/data-export.md` was updated. Note: this commit changes the schema of the `snapshots` table (to support temporarily-unpopulated `inUse = 1` snapshot data). To be precise, any swing-store created by this version (either via `initSwingStore` or `importSwingStore`) will get the new schema: pre-existing DBs opened with `openSwingStore` will continue to use the old/strict schema. This is fine for now, but as the comments in snapStore.js explain, we'll need to implement DB schema versioning and upgrade (#8089) before we can safely change any non-`importSwingStore` code to create unpopulated `inUse=1` records. fixes #8025
Previously, the swingstore importer would ignore "historical metadata": records (with hashes) for transcripts and heap snapshots that are not strictly necessary to rebuild workers. This was a mistake: our intention was to always preserve these hashes, so that we might safely (with integrity) repopulate the corresponding data in the future, using artifacts from untrusted sources. This commit rewrites the importer to record *all* metadata records in the first pass, regardless of whether we want historical data or not. All of these records will be stubs: they contain hashes, but are missing the actual bundle or snapshot or transcript items, as if they had been pruned. Then, in the second pass, we populate those stubs using the matching artifacts (or ignore the historical ones, as configured by the `includeHistorical` option). A final `assertComplete` pass insists that all the important (non-historical) records are fully populated. The exporter was updated to omit empty artifacts. New tests were added to assert that metadata is preserved regardless of import mode, and that the `assertComplete` pass really catches everything. Also, we check that an import throws if given a mis-sized artifact, like a transcript span that is missing a few items. A new `docs/swingstore.md` was added to describe the data model, including what it means for records to be pruned, and `docs/data-export.md` was updated. Note: this commit changes the schema of the `snapshots` table (to support temporarily-unpopulated `inUse = 1` snapshot data). To be precise, any swing-store created by this version (either via `initSwingStore` or `importSwingStore`) will get the new schema: pre-existing DBs opened with `openSwingStore` will continue to use the old/strict schema. This is fine for now, but as the comments in snapStore.js explain, we'll need to implement DB schema versioning and upgrade (#8089) before we can safely change any non-`importSwingStore` code to create unpopulated `inUse=1` records. fixes #8025
@mhofman hm, do we have enough cosmos/cosmic-swingset side knowledge to let me get away with properly splitting If so, I think I'd want a swing-store API that can be used like this in the "create for the first time" case: import { initSwingStore } from '@agoric/swing-store';
// creates up-to-date empty DB, commits initial state, returns nothing
await initSwingStore(dbDir);
// now open the swingstore, for initial use
const { hostStorage, kernelStorage } = openSwingStore(dbDir, { exportCallback, keepTranscripts: true }); and this for the it-already-exists case: import { upgradeSwingStore, openSwingStore } from '@agoric/swing-store';
// might upgrade, if so it calls exportCallback() zero or more times and then does a commit
await upgradeSwingStore(dbDir, { exportCallback });
// now the host should maybe commit the export-data to host-app DB
// now open the swingstore for real
const { hostStorage, kernelStorage } = openSwingStore(dbDir, { exportCallback, keepTranscripts: true }); and ideally the create-for-first-time case doesn't even have the |
Yes! The cosmos side now informs cosmic-swingset during init whether the start is a bootstrap or not. The sim-chain seem to always start by bootstrapping. I've been wanting to re-write cosmic-swingset's Regarding migrations, if necessary we might also be able to rely on the upgrade info that is now plumbed into cosmic-swingset as well. |
Ok, so the host app needs to call both That introduces a second critical window: if the application gets interrupted after I don't really want to do this, but one fix would be for Another not-great solution would be to record the ugprade-time export data in a table, and retrieve it later, and delete it even later (after the host DB commits). Basically making the outputs of the process look more like outgoing comms messages, which get recorded/embargoed/retired in our usual pattern. That feels like more complexity than the problem deserves. The export-data I'm thinking of would be like:
|
The more I ponder this, the less comfortable I am with the very idea of |
Is the alternative to automatically perform the upgrades in But I think it has the same issue with whether the We need one recovery path for each case that we can handle. If we only have one swingstore commit (i.e. either
If we have both a commit inside
The middle case is the hard one: we'd need a flag (more than just a single @mhofman mentioned today that all |
And let's see, if That would make for two invocation recipes. "first-time" is same as above: import { initSwingStore } from '@agoric/swing-store';
// creates up-to-date empty DB, commits initial state, returns nothing
await initSwingStore(dbDir);
// now open the swingstore, for initial use
const { hostStorage, kernelStorage } = openSwingStore(dbDir, { exportCallback, keepTranscripts: true }); but "already-exists" changes: import { upgradeSwingStore, openSwingStore } from '@agoric/swing-store';
// might upgrade, if so it calls exportCallback() zero or more times. Does not commit.
const { hostStorage, kernelStorage } = upgradeSwingStore(dbDir, { exportCallback, keepTranscripts: true });
// now the host can do block execution as usual
// when done, the host should commit everything .. which means that |
That pair of recipes looks like a real tar baby. I don't think we should be orchestrating swingstore internals from outside the swingStore. Schema changes to the swingstore seem to me to fall into two buckets:
What am I missing here? |
My recommendation:
|
@FUDCo writes:
The issue is that swingstore export requires some integrity-protecting data (the "export-data") to be handed to the host for shadowing into authentic storage (e.g. hashes of transcript spans, but also every single Which means while we're doing the upgrade, we're also emitting export-data rows (by invoking the If the internal changes get committed immediately, an interruption during that window might lose the export data, which would lead to invalid cosmos state-sync exports. This is the same sort of hangover-inconsistency problem that could happen with normal changes, like vatstore writes during execution, it's just that we don't usually think of |
My position is that internal swingstore schema changes by definition do not modify export data. In any case, we should rule out changes that modify historical export data, such that any externally visible consequences of a schema update are limited to things that happen after the update. This constrains our future options somewhat, but takes large cohort of headaches off the table. |
Agreed, there is such a point. An upgrade is a consensus operation requiring using a new version of software. However it is still part of the block advancement process, constrained by our requirement to only have a single host controlled commit point per block, and that the software upgrade may require other work than performing the swing-store migration, work that may cause further changes to the state before the commit. The remaining questions are:
|
No, this is none of the host's business.
I'm inclined to say no to this also. It's a constraint, but one that would simplify a lot. |
I just realized I gave an either/or question a yes/no answer. I was saying no to the first alternative, with an implicit yes to the second. |
What would it simplify exactly? Given the other commit requirements, I don't believe it simplifies anything. |
It means you could go ahead and commit the schema changes with the rest of whatever happened in the block, as you wouldn't be exposing export events in some weird intermediate phase of execution. This would address @warner's concern about what happens if there's a failure after the schema update but before the block is committed. |
As long as the migration generates export events, there is no problem. I don't think there is any weird intermediate state. Either swing-store got committed at the end of the block, using a new schema, and containing a replay log of the cosmos sends (export events + other device related sends), or it didn't get committed and is still in the state it was previous to the migration. The only requirement is that the migration does not cause an implicit commit. That is it. |
Ah, I see your point. I was conflating the concern about implicit intermediate commits with concern about endogenous export events, but you're right that those are entirely separate. |
Let's see if I can enumerate the requirements we impose upon consensus-hopeful host authors:
And in turn, swingstore promises that:
That last point is open to debate. I think it's helpful, it means you can use a state-sync generated before the upgrade on software from after the upgrade, but.. maybe cosmos can't handle that, so there's no point in making swingstore handle it? If we offer that feature, then the importer needs to recognize a version in part of the export-data and basically create an old-format DB, then it can run the normal upgrade process immediately afterwards. If we choose to not offer that feature, then we still need a version in the export-data, but the importer can just throw an error if it sees the wrong one. And our importer code can be simpler, it doesn't need to handle old versions. And we don't need to decide how far back to support. But, if a state-sync export is produced every morning at 9am, and a noon on monday everybody upgrades to the new version, then a client who comes online monday afternoon won't be able to use that state sync. Either they'll need to 1: boot the old software, build the node from state-sync, let it run to the upgrade point, let it halt, switch to the new code, restart it, let it run to the current block, or 2: wait until tuesday morning so a new-version state-sync snapshot is available. |
Correct, no point in handling. You must use the right software. I think we can put a stronger requirement:
In particular we don't need to be able to import, export or otherwise read / modify mismatched versions besides what is necessary for the migration. |
The implementation I want to pursue will enable upgrades all the way from version 0, since that will keep us honest about upgradability, and will avoid bugs where the upgraded DB could have a different schema than the created-current-version DB. So:
We'll be able to do more than that, and we can safely perform multiple schema changes between subsequent chain-upgrade events (which would otherwise be a scheduling interdependency).
Yep, And good, by not requiring the ability to import older versions, the importer code is simpler. When we land a swingstore change to implement e.g. version 3, then the changes will be to:
|
One other note: I'd originally expected our new-limited once-only |
I still don't understand why you think the |
My concern about this is maintenance of that code. It requires us to keep a lot more logic around than is strictly necessary for operations. A chain software upgrade already requires new software, and chain upgrade handlers are not able to handle multiple versions anyway, so why should swing-store have that feature?
I think @warner explains how it's possible. "export data" content does not need to strictly align with the sub-stores, it can be anything we want. As such we could have a synthetic version entry. That said, I'm wondering if it might not be better to expose a schema version on the import/export interface instead. It would simplify the logic handling the export data. If the importer is not able to handle that version, just throw right away. Btw, the import/export version does not have to match internal swing-store schema versioning. As @FUDCo mentions, not all internal schema changes would result in a different shape of export data (and we should avoid those export data changes as much as possible given the performance implications on the IAVL DB). |
Imagine the following timeline:
If we can only upgrade one swingstore sche,a version at a time, then we must withhold the If swingstore can upgrade itself all the way, at each step, then we don't need to partition the swingstore upgrades across separate chain upgrades. Whatever version of swingstore/SwingSet is ready can be shipped at any time, and we can land the dependency bump as soon as its ready, entirely decoupled from any particular chain's deployment cadence.
The "DB creation recapitulates DB upgrade" approach described in the initial design (which I know you've got reservations about) means we already have that logic available, and it gets exercised and (lightly) tested with every new DB creation. We might reduce the code size by deleting the older upgraders, but:
I don't think the chain upgrade handler will see swing-store schema upgrades, or at least it doesn't need to. A chain upgrade handler might be doing things like The kernel code must always be compatible with the swing-store package version that it uses (
Yeah, that was my plan, to annotate the export with the schema version it was sourced from, with an export-data key named (and we'd probably define the extra key as an "export version", to decouple it from the schema version: I can imagine allowing multiple closely-related schema versions to all get exported into the same export version) It'd be nicer if the importer could somehow read
Yeah, but it feels like a lot of churn. We defined a "SwingStore Export" to mean a data set with two parts: export-data and artifacts, with one sequencing constraint (import ingests export-data first, then fetches artifacts). This would require three parts (version, export-data, artifacts), and have two sequencing constraint (import ingests version, then export-data, then artifacts). Let's see, the Ok, the If we do it this way, it will require matching changes on the cosmic-swingset side, whereas if we encode it into export-data, cosmic-swingset doesn't need to change. That's the main cost. The benefit is a cleaner design. |
Correct, that would put a requirement that there is at most a single migration per chain upgrade, or at least for those cases, that the migration logic from swing-store is able to handle a multi-version jump. I agree it does put an awkward dependency from "main chain upgrade schedule" onto the swing-store implementation, which is not optimal.
The problem is that it may not be able to understand what to do with this data until it reaches the version info, putting undue complications onto the implementation.
No we can just store the version in a special state-sync payload, no need to store it in the IAVL tree.
Correct!
I don't mind that cost. It will complicate a little how state-sync payloads are handled, but we knew we'd get there some time. Cosmos already has versioning in place to handle these situations. |
I'm not following the discussion of the "per" relationship between swingstore migrations and chain upgrades here. If we have multiple changes to the swingstore that are landed in between two chain upgrades, then from the chain's perspective they collectively just look like one big migration. Internal to swingstore this might be accomplished by multiple version steps, each with its own logic, but I don't think that changes the view as seen by the chain. (As before, I still don't see what business it is of the chain's what the swingstore schema version number is, but I don't think that's actually material to this particular question.) |
Hm.. I agree that it might not understand what to do, but I don't think that matters, or would require us to put more work/complication into the importer implementation. By deciding to reject old versions, we've allowed the importer to omit code that handles old versions correctly. So each export-data entry shows up, the importer assumes that it's current-format and interprets as such. That will put the correct data into the DB if it was indeed current-format, and it will either throw an error or do something wrong if it was not current-format. When the The only way to make it successfully to the end of the export-data stream is for the
What happens if the provider of your state-sync payload lies about the version in use? How does the importer validate the claimed version string? I think an attack would be shaped like:
The importer would process the payload according to Hm, but the payload itself is untrusted. The cosmos-side importer populates the IAVL tree, then hashes it, then compares the root against the chain. If that succeeds, it feeds the export-data (all of which is now known-good) into a swingstore importer. The importer then receives artifacts and compares them against the export-data. The cosmos side doesn't know how to interpret the export-data, it just writes into IAVL and then reads a subset out to the swingstore importer. The "version in a special state-sync payload" would mean some field in the payload provides the claimed swing-store export-version. The cosmos-side importer would have to parse this from the payload, remember it outside of IAVL, and then provide it (unverified) to the swingstore importer, via Nope, that's not ok. Version confusions are security bugs. We need a way to make it verified. |
@FUDCo writes:
Agreed on all counts. And yeah, the chain ( / host application in general) should not know what the swingstore schema version number is. An upgrade process that could only make one step at a time would effectively be revealing the schema version number to the host application, at least to the extent that there's this hidden number that is sometimes incremented by a new version of |
Right, only the export data is trusted in our model. However I'm doubtful there is much attack potential by having this version unverified.
Fair. Ok I'm fine with a synthetic export data entry for the "export version" |
What is the Problem Being Solved?
The swing-store SQL database will change over time, and we need each new version to be capable of using data created by the previous version. We've only had one release to our mainnet chain so far (the "bulldozer" release, which initialized a brand new DB, using the
@agoric/swing-store
package at version0.9.1
), so we haven't yet had to perform any kind of upgrade.PR #8075 needs a schema change to accomodate the new "import record, then populate data" approach used by the rewritten state-sync importer. The change is to remove a
CHECK
constraint from thesnapshots
table:Conveniently, we don't actually need this schema change for existing databases: only for the ones created for a state-sync import. So we don't need to build the schema migration tooling quite yet. When we roll out this change, will wind up with two different schemas on mainnet nodes: some with the
CHECK
constraint, some without.But, if we didn't get so lucky, we'd need a process to upgrade the schemas of existing databases. This ticket is about a design to accomplish that, for future updates.
Description of the Design
The core will be a new table named
version
, with a single row namedversion
, which (if present) will contain an integer starting from 1 and incremented for each new version. If this row is present, the integer will exactly define the schema in use.Each time
openSwingStore
is told to open a pre-existing DB, our upgrade mechanism will look at this table to decide what changes need to be made (if any). We do not intend to handle downgrades, soopenSwingStore
will abort if it sees a version that is higher than what it can accomodate.We must also handle the creation of a brand new DB (
initSwingStore
orimportSwingStore
), and the upgrade of a DB that was created before we established this mechanism.Each given release of
@agoric/swing-store
will have a "current version" integer. We pretend that the 0.9.1 release usedcurrentVersion = 1
. If release N usescurrentVersion = N
, then databases created by release N will have aversion
table withN
. Databases which start with an older version indicator will be upgraded to N upon first open, and after their firstcommit()
, theirversion
table will also haveN
, just as if they were created by the current release.My plan is:
initBundleStore
, in addition to the previousmakeBundleStore
, etcdb
handle and anoldVersion
argumentmakeBundleStore
function is then limited to building the API around adb
handle, and does not modify the DB when first calledinitSwingStore
andopenSwingStore
continue to share amakeSwingStore
function as before (I'd prefer to split them up, such thatopenSwingStore
refuses to create a DB, but that would change the interface enough to be annoying)makeSwingStore
starts by determining whether the DB already exists or not, tracked in a variable nameddidExist
version
table if it did not already existdidExist === false
, this table will be brand new and emptydidExist === true
but the DB was created by@agoric/swing-store 0.9.1
, the table will also be brand new and emptydidExist === true
and the DB was created by the upcoming version of swing-store, the table will not be emptyversion
from the table and stores it inorigVersion
oldVersion
to:undefined
ifdidExist === false
, to mean "there are no tables yet: create them"1
ifdidExist === true
and theversion
table was empty, to mean "we started with the implicitly-labelled version 1", as created by 0.9.1origVersion
otherwise, to mean "we started with thisorigVersion
"initBundleStore
,initSnapStore
,initKVStore
,initTranscriptStore
withdb
andoldVersion
oldVersion === undefined
by creating their tables, using the brand new schemaoldVersion === currentVersion
, the init function should do nothingoldVersion
into the schema forcurrentVersion
makeSwingStore
should writecurrentVersion
into theversion
table's single rowhostStorage.commit()
is called, so all changes (including the upgrade) will land atomically, and only when the host is ready to commitThe component initialization functions can look something like:
Each time we want to make a change, we must increment
CURRENT_VERSION
, add a new delta, and modify the finalif (!oldVersion)
clause to make new DBs with the current schema. By adding a copy of the intermediate schemas in the comments, it's easier to verify that the deltas will do the right thing.If a version does not modify that particular component (e.g. the #8075 change modifies the
snapshots
table but none of the others), the delta clauses should be empty except for theoldVersion =
increments.cc @mhofman @FUDCo
Other Considerations
SQLite has a
user_version
for doing things like this (PRAGMA user_version = 2;
sets it,PRAGMA user_version;
reads it). Should we consider using it instead? Conclusion: no. Although setting it is controlled by transactions (changes do not land untilCOMMIT
), the version does not appear in asqlite3 foo.sqlite .dump
command (file foo.sqlite
knows how to report it, but otherwise you need to execute a PRAGMA to see it). This would hamper debuggability.Maybe pass
CURRENT_VERSION
into eachinit
function, instead of making multiple copies of it. However for each new version, we still need to add a migration delta clause, I think.Note that SQLite can only perform limited alterations in the
ALTER TABLE
statment (only column insertions, deletions, and renames). To remove theCHECK
constraint, we must actually copy the whole table to a temporary one, delete the original, and then rename the temporary into place. https://www.sqlite.org/lang_altertable.html has details.The new init functions will use
CREATE TABLE
commands, notCREATE TABLE IF NOT EXISTS
, because that condition will already be handled by theoldVersion
checking logic.Security Considerations
none, schema migration can only happen when the swingstore is opened, which requires the same level of authority as creating the DB in the first place
Scaling Considerations
Some upgrade operations might have to copy an entire table, which could take a non-trivial amount of time. However, it's absolutely necessary that these changes land in an atomic commit, so we can't defer them or do them in the background.
We should run tests to estimate how long this will take on the mainnet data set and advise validators about how long they should expect to wait.
If
makeSwingStore()
notices a difference between theoldVersion
andCURRENT_VERSION
, it should print a message about the DB upgrade starting, and another when it completes.Test Plan
Unit tests which create deliberately old schemas and populate them with old data, then open the DB and check the subsequent contents and schemas to make sure they've been updated.
The docker-based chain upgrade test will cause a schema migration as a side-effect: we should add checks to that test to examine the new DB and make sure it looks right.
Upgrade Considerations
see above
Note that the #8075 hack means that the implicit "version 1" may or may not have the
CHECK
constraint, so this particular version of the schema has indeterminate contents. However the first "real" version (2) will have fixed contents, because the v1-to-v2snapStore
migration will copy the table to a new temporary one without the constraint, then delete/rename it back into place, in addition to whatever other v1-to-v2 changes we might add by the time we deploy this.Note that we don't intend to support downgrades. We might want to back up the DB during chain-software upgrades, just in case, perhaps using the
sqlite3 ~/.agoric/data/agoric/swingstore.sqlite ".backup ~/swingstore-backup.sqlite"
CLI command, but we must then also advise validators how to delete the 6GB-ish backup file when it is no longer needed.If we make multiple swingstore changes between released versions, an upgrade might take several steps, not just one. As long as we consistently increment
CURRENT_VERSION
in each PR that changes schemas (including adding brand new tables), this should work fine.The text was updated successfully, but these errors were encountered: