-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Crashes can cause data corruption #548
Comments
Bolt uses a read only memory mmap for reading the data (
The issue looks like one of the meta pages was partially written and bolt is not rolling back to the previous transaction correctly. On the plus side, the data file isn't corrupt. The recovery code just needs to fall back if a checksum error occurs. |
Maybe I don't understand how boltdb works internally, but is "partially written" data (meaning the process died while modifying the memory that was to be flushed to disk) not a problem? No matter which part of the data it is? I guess that makes sense if transactions are immutable, but are you sure there's no place where this could go badly (like rewriting references to other transactions or something? |
During a commit, there are two phases:
Partial writes aren't a problem for data corruption. If a partial write happens in phase 1 then there's no meta page to point to the newly written data pages & freelist so those pages are effectively ignored. The data pages and freelist are written to what were unused, free pages in the previous transaction so overwriting those pages doesn't do anything. Bolt uses two meta pages at the beginning of the data file to protect against partial meta page writes. Each new transaction will flip between the pages. Txn 1 writes to meta 0, txn 2 writes to meta 1, txn 3 writes to meta 0, txn 4 writes to meta 1, etc. Since the meta pages are checksummed, if there is a partial write to one of them then Bolt should detect that and fallback to the previous meta page. The fallback is the part that's not working right now.
Bolt is a pretty simple database. There aren't any txn references to rewrite or write ahead logs or compactions or really anything too fancy going on. The |
@cyphar What is the commit hash of bolt you are using? Are you able to reproduce the corruption? |
@xiang90 This happens with all versions of Docker, the latest of which has vendored However, this is really frustrating because this error means that people who use Docker can't actually start their daemon without nuking their entire setup if the database is in this state. |
That is unfortunate... @benbjohnson maybe if you could provide some clue on how it should behave someone will make an PR. As I understand here: https://github.com/boltdb/bolt/blob/master/db.go#L202-L204 it should try to validate another page, and here https://github.com/boltdb/bolt/blob/master/db.go#L266-L271 it should somehow gracefully degrade and reset meta (but how? I'm not familiar with source). |
@kron4eg It's relatively easy fix. I've been trying to find time in the last week to fix it but I simply haven't been able to get to it. Thanks for digging into it. You're on the right track. Here's some more info:
This corruption should be exceedingly rare since disks will generally write a full sector (512 bytes) even in the event of power loss. The meta pages are page aligned and the meta data fits within a sector. |
@benbjohnson Using the dotpoints you provided, I made a PR (#555). PTAL. The main concern I had was with your first dot point and whether that would cause issues if you had an invalid pagesize and you were opening the database on a different operating system than the one you created it on. |
This was noticed while testing a Docker bug where Docker was killed by the kernel due to a resource allocation issue and the resultant Bolt database is in a corrupted state. It's a bit hard to test this, and I'm trying to get a nice test case that can be tested just using
bolt
. Examples of the errors include:The issue is clearly that the file wasn't written properly, presumably the mapping wasn't properly synced or something like that. Since we're using mmap'd pages I'm not sure what happens if
bolt
is in the middle of modifying the in-memory state of the database and then it crashes -- I assume the kernel will flush the page cache eventually. This is obviously going to be a problem because the state may not be valid.In addition, we use
fdatasync
rather thanmsync
. But if you look at themsync(2)
man page it makes clear you should only use it:The text was updated successfully, but these errors were encountered: