-
Notifications
You must be signed in to change notification settings - Fork 1.5k
ext3/ext4 is not fully POSIX, to be safe there, need to fsync after file size changes #284
Comments
Awesome, thanks for the fix! I moved the code over to |
@tv42 Is the |
@benbjohnson The truncate is the part that resizes the file; if you skip that, the file remains small until you write to it, and since data is only synced with If you remove either the truncate or the sync, it's as if this fix never went in. |
@tv42 Thanks, my question was dumb now that I read it again. :) I'm going to add a |
This commit adds the DB.NoTruncate flag to optionally revert mmap() calls to how they were implemented before the ext3/ext4 fix. When NoTruncate is true, remapping the data file will not force the file system to resize it immediately. This works for non-ext3/4 file systems. The default value of NoTruncate is false so it is still safe for ext3/ext4 file systems by default. See also: boltdb#284
This commit adds the DB.NoTruncate flag to optionally revert mmap() calls to how they were implemented before the ext3/ext4 fix. When NoTruncate is true, remapping the data file will not force the file system to resize it immediately. This works for non-ext3/4 file systems. The default value of NoTruncate is false so it is still safe for ext3/ext4 file systems by default. See also: boltdb#284
This commit adds the DB.NoTruncate flag to optionally revert mmap() calls to how they were implemented before the ext3/ext4 fix. When NoTruncate is true, remapping the data file will not force the file system to resize it immediately. This works for non-ext3/4 file systems. The default value of NoTruncate is false so it is still safe for ext3/ext4 file systems by default. See also: boltdb#284
This commit adds the DB.NoTruncate flag to optionally revert mmap() calls to how they were implemented before the ext3/ext4 fix. When NoTruncate is true, remapping the data file will not force the file system to resize it immediately. This works for non-ext3/4 file systems. The default value of NoTruncate is false so it is still safe for ext3/ext4 file systems by default. See also: boltdb#284
This commit adds the DB.NoTruncate flag to optionally revert mmap() calls to how they were implemented before the ext3/ext4 fix. When NoTruncate is true, remapping the data file will not force the file system to resize it immediately. This works for non-ext3/4 file systems. The default value of NoTruncate is false so it is still safe for ext3/ext4 file systems by default. See also: boltdb#284
This commit adds the DB.NoGrowSync flag to optionally revert mmap() calls to how they were implemented before the ext3/ext4 fix. When NoGrowSync is true, remapping the data file will not force the file system to resize it immediately. This works for non-ext3/4 file systems. The default value of NoGrowSync is false so it is still safe for ext3/ext4 file systems by default. See also: boltdb#284
@cespare pointed out this conversation to me: http://www.openldap.org/lists/openldap-devel/201411/msg00000.html (I've seen the talk before, but didn't pay enough attention).
Reading http://linux.die.net/man/2/fdatasync says (my emphasis) "fdatasync() [...] does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. [...] On the other hand, a change to the file size (st_size, as made by say ftruncate(2)), would require a metadata flush."
Quoting the paper (my emphasis): "On ext3 with the default “ordered” journaling mode, the file data is forced directly out to the main file system prior to its metadata being committed to the journal. This is why we observe the journaling of the length update (op#399, 400, and 402) after the file data updates(op#342–398)." "[...] means fdatasync on ext3 does not wait for the completion of journaling (similar behavior has been observed on ext4)."
Currently, bolt seems to rely on individual page writes increasing the st_size of the file; there's no file size change where it actually notices it needs more space:
bolt/db.go
Line 554 in 15a58b0
So, my takeaway from this is, ext3, and likely ext4, have a bug / "design tradeoff". If bolt wants to be pragmatic it should probably accommodate for that bug.
I think it would be enough to do something like this:
Doing the resize once, and not page-by-page when writing out dirty pages, might even be more efficient.
The text was updated successfully, but these errors were encountered: