-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ability to supend (and resume) all zpool IO (zpool suspend|resume
)
#12843
Comments
wouldn't this just hang forever if the system is on a ZFS root? |
Secretly, Lines 10958 to 10974 in f04b976
|
I'm aware of |
Sure, sorry, I wasn't trying to suggest it would serve here, merely remarking that the name is used, and given ZFS's strong disinterest in breaking prior expectations for things, it might be uphill to convince people to rename even such an internal thing. (Also I don't assume anyone knows it exists, after I was quite surprised when I found it in the test suite one day.) Then again, zdb has had options renamed around a few times, so maybe nobody will blink. |
The UX proposal has some typos errors at the end, it should read
Also: I would prefer to have |
I don't remember my use-case right now, but can this be used like the suspension that happens when a storage device is physically yanked off the bus? (Without actually physically yanking it off the bus.) |
so what should be done to allow PC to be suspended ? every-time I have suspended I end up with a failed pool with unrecoverable error's with raidz3 thankfully so far at least enough data still ok that it would re-silver all the disks show read / write / checksum errors , when it resumes , if I leave the PC on and thrash the drives for a month I have 0 errors so its not hardware issues , but every time I try suspend to RAM / hybinate , the pool falls apart and considering the HDD's are chewing up 1.2kw and I only use the stored data 1% of the time it would be good being able to hybinate the NAS , trying to unmount / export is a pain even -force refuses to do it not zfs on root its just a storage pool 5.18.19-051819-generic |
My suspicion is that the suspend/hibernation kicks in in the middle of a TXG, with some data already being written to disk but the referencing metadata (including metaslabs and uberblock) not yet being persisted. Then an import would pull the (from perspective of the frozen state) out of date metaslabs, pick free space that already contains new data and 'repurpose' that... when the hibernated state is restored the TXG will continue to commit to disk, writing metadata that references data that is thought to be stable on-disk but was overwritten by the import (which is invisible to the hibernated state). Check the logic inside your initramfs, most likely your distribution first imports the pool r/w and afterwards checks for hibernation state - which loads an in-memory state - triggering this scenario above. See #14118 (comment) |
@MasterCATZ Can you clarify, you mean "hibernated", not "suspended" as in suspend-to-RAM, right? |
(This is a feature request extracted from #260 (comment) )
Background
Linux supports freezing (and thawing) a mounted filesystem through an ioctl.
The use case for this is to suspend all IO requests to the underlying block device.
The use case for that is to enable block-device level snapshots, e.g., if the filesystem is deployed on top of a snapshot-capable volume manager.
Note that freeze is not used during hibernation, contrary to what's stated in the opening comment of ZFS issue #260.
As far as I'm aware, the above is an exhaustive description of the use case for freeze & thaw.
The Linux VFS provides two mechanisms for filesystems to suspend freeze&thaw.
The first is to implement the
freeze_fs
/unfreeze_fs
super block operations.The second is to implement the
freeze_super
/thaw_super
super block operations.If a filesystem implements the
_fs
type of operations, the VFS takes care locking out all VFS operations by means of a set of rwlock.Here's the kernel function
freeze_super
that is invoked from the freeze ioctl in that case. (Don't confuse the kernel functionfreeze_super
with the->freeze_super
super block operation).If a filesystem implements the
_super
type of operations, the ioctls map more or less directly to these callbacks.However, neither of the hooks above are suitable for ZFS.
The reason is that the the Linux concept of freeze&thaw expects that one super block has exclusive control of N block devices.
Whereas, with ZFS, M super blocks (= ZPL datasets) share the storage of N block devices.
And then there's also management operations such as
zpool scrub
andzfs recv
that perform IO and are not represented by super blocks at all.Of course, looking at how
btrfs
does it makes sense in this case.It's the mainline filesystem most similar to ZFS with regard to pooled storage (multiple blockdevs!) and multiple super blocks on top. Btrfs implements
freeze_fs
/unfreeze_fs
. But the btrfs secret sauce is that a single btrfs filesystem (= pool in ZFS terms) only has a singlestruct suber_block
- the subvolumes (= ZPL datasets in ZFS terms) are implemented throughmount_subtree
.UX Proposal
Instead of implementing the
{freeze,unfreeze}_*
ioctls, I propose to implement two newzpool
subcommands.Here's the
man
page that describes how the feature behaves towards the end user.Notes:
zpool import --discard-hibernation-state-and-fail-resume
operation that I proposed there would map tozpool import --unfreeze $MAGIC_TAG
.The text was updated successfully, but these errors were encountered: