Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: export state-sync snapshot without a DB write-lock #8619

Merged
merged 2 commits into from
Dec 6, 2023

Conversation

warner
Copy link
Member

@warner warner commented Dec 5, 2023

Exporting a state-sync snapshot is a read-only operation, and is designed to run "in the background", i.e. in parallel with normal mutating operations. It accomplishes this by opening a read-only transaction right away, effectively capturing a snapshot of the SQLite database state, to insulate the export process from ongoing writes by the execution host.

The cosmic-swingset exporter starts with a query of host.height, to confirm that the database has not already advanced to a new block before this snapshot/read-transaction can be taken.

Previously, this query worked by using openSwingStore, and then calling hostStorage.hostKVStore.get('host.height'). This had two problems:

  • TOCTTOU: the hostKVStore.get used a different DB connection (and different txn) than the exporter, so it might return a different height, negating the accuracy of the consistency check

  • read-write txn: openSwingStore creates a read-write txn, even when merely opening the DB (because it might need to create the initial tables). This txn is closed right away, before openSwingStore() returns, so it did not present a threat to ongoing operations. But if the exporter was created while the ongoing execution side already had its own read-write txn open (e.g. while controller.run() was running), then it would fail, and makeSwingStoreExporter would fail with SQLITE_BUSY

Instead, we take advantage of the new swingStoreExporter.getHostKV() API, and use it to fetch host.height. Unlike the normal swingstore, the swingstore-exporter refrains from creating read-write transactions entirely. So the cosmic-swingset export code can safely query the height without fear of getting the wrong value or failing because of an ongoing write transaction.

We think this should fix the SQLITE_BUSY errors.

refs #8523

Add a new API to the exporter, `exporter.getHostKV(key)`, so that the
cosmic-swingset state-sync exporter process can query `host.height`
without accidentally creating a read-write transaction too.

refs #8523
Exporting a state-sync snapshot is a read-only operation, and is
designed to run "in the background", i.e. in parallel with normal
mutating operations. It accomplishes this by opening a read-only
transaction right away, effectively capturing a snapshot of the SQLite
database state, to insulate the export process from ongoing writes by
the execution host.

The cosmic-swingset exporter starts with a query of `host.height`, to
confirm that the database has not already advanced to a new block
before this snapshot/read-transaction can be taken.

Previously, this query worked by using `openSwingStore`, and then
calling `hostStorage.hostKVStore.get('host.height')`. This had two
problems:

* TOCTTOU: the `hostKVStore.get` used a different DB connection (and
  different txn) than the exporter, so it might return a different
  height, negating the accuracy of the consistency check

* read-write txn: `openSwingStore` creates a read-*write* txn, even
  when merely opening the DB (because it might need to create the
  initial tables). This txn is closed right away, before
  `openSwingStore()` returns, so it did not present a threat to
  ongoing operations. But if the exporter was created while the
  ongoing execution side already had its own read-write txn
  open (e.g. while `controller.run()` was running), then it would
  fail, and `makeSwingStoreExporter` would fail with `SQLITE_BUSY`

Instead, we take advantage of the new `swingStoreExporter.getHostKV()`
API, and use *it* to fetch `host.height`. Unlike the normal
swingstore, the swingstore-exporter refrains from creating read-write
transactions entirely. So the cosmic-swingset export code can safely
query the height without fear of getting the wrong value or failing
because of an ongoing write transaction.

We think this should fix the SQLITE_BUSY errors.

refs #8523
@warner warner added the cosmic-swingset package: cosmic-swingset label Dec 5, 2023
@warner warner requested a review from mhofman December 5, 2023 23:44
@warner warner self-assigned this Dec 5, 2023
@warner warner changed the base branch from warner/8523-exporter-hostkv to master December 5, 2023 23:48
@mhofman mhofman linked an issue Dec 5, 2023 that may be closed by this pull request
@mhofman
Copy link
Member

mhofman commented Dec 5, 2023

  • TOCTTOU: the hostKVStore.get used a different DB connection (and different txn) than the exporter, so it might return a different height, negating the accuracy of the consistency check

To be accurate, that problem does not apply in production cases because cosmic-swingset, the caller of initiateSwingStoreExport, holds any commit operation until this returns. But you're right that not all callers were able to (e.g. if manually exporting outside of state-sync)

Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic. Thanks for the amazing turn around!

@warner warner added the automerge:rebase Automatically rebase updates, then merge label Dec 5, 2023
@mergify mergify bot merged commit 414e88b into master Dec 6, 2023
79 checks passed
@mergify mergify bot deleted the warner/8523-export-without-writelock branch December 6, 2023 00:26
@mhofman
Copy link
Member

mhofman commented Dec 6, 2023

Verified on a new patched mainnet follower node started from state-sync that producing a snapshot while syncing up after restore no longer triggers the SQLITE_BUSY error.

mhofman pushed a commit that referenced this pull request Dec 6, 2023
…lock

fix: export state-sync snapshot without a DB write-lock
mhofman pushed a commit that referenced this pull request Dec 6, 2023
…lock

fix: export state-sync snapshot without a DB write-lock
mhofman pushed a commit that referenced this pull request Dec 6, 2023
…lock

fix: export state-sync snapshot without a DB write-lock
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge cosmic-swingset package: cosmic-swingset
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Creating state-sync snapshot can fail with SQLITE_BUSY
2 participants