-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request - online split clone #2105
Comments
It sounds like
|
I was looking at zfs promote, but this appears to just to flip the parent-child relationship... what I was thinking was an end state where both file systems are completely independent from each other... some use cases for this could be |
After you clone original@snapshot you can modify both dataset and clone freely, they won't affect each other except that they share data still common to both on-disk. If you want to destroy/recreate the template (original) you can simply destroy all snapshots on it (except the one(s) used as origins of clones), zfs rename original and zfs create a new one with the same name (origin property of clones isn't bound to the name of the original dataset, so you can rename the both freely). Only downside to that is that all unique data held in original@snapshot (= base of the clone) can't be released unless you are willing to destroy either the clone(s) or (after a promote of the clone) the original. |
@greg-hydrogen in the end did you determine if |
To comment on this, it would be nice if there would be a functionality to transform origin-clone relationships into a deduped kind: removing the logical links that keep the datasets from being individually destructed at will - while maintaining only one copy of the still shared data. |
@behlendorf: it almost certainly doesn't meet the need. |
Here's what I'm trying to do conceptually: user@backup:
user@test:
user@backup:
user@test:
In order for me to make progress, I actually have to run through sanitize, stop the thing that's using test, destroy test (completely), clone mirror as test, restart the thing using test, and then i can finally try to destroy the original snapshot. Or, I can decide that I'm going to take a pass, trigger a new snapshot on backup later, and send its increment over, delete the snapshot that was never mirrored to test, and try again. Fwiw, to get a taste of this...:
(the numbers are really wrong, I actually have 1 volume with 4 contained volumes, hence the recursive flags...) Now, I understand that I can use |
@behlendorf Hi, any progress on this? Splitting clone from it's original filesystem will be really great for VMs templates and/or big file-level restore. See the link @jsoref pasted above for a practical example. |
@kpande: the goal is to pay (in space and data transfer) for what has changed (COW), not for the entire dataset (each time this operation happens). If I had a 10TB moving dataset, and a variation of the dataset that I want to establish, sure, I could copy the 10TB, apply the variation, and pay for 20TB (if I have 20TB available). But, If my variation is really only 10MB different from the original 10TB, why shouldn't I be able to pay for 10TB+10MB? -- snapshots + clones give me that. Until the 10TB moves sufficiently that I'm now paying for 10TB (live + 10TB snapshot + 10TB diverged) and my 10MB variation moves so that it's now its own 10TB (diverged from both live and snapshot). In the interim, to "fix" my 30TB problem, I have to spend another 10TB (=40TB -- via your zfs send+zfs recv). That isn't ideal. Sure, it will "work", but it is neither "fast" nor remotely space efficient. Redacted send/recv sounds interesting (since it more or less matches my use case) -- but while I can find it mentioned in a bunch of places, I can't find any useful explanation of what it's actually redacting. Fwiw, for our system, I switched so that the sanitizing happens on the sending side (which is also better from a privacy perspective), which mostly got us out of the woods. The are instances where the data variation isn't "redacting" and where the system has the resources for zfs snapshot+zfs send but doesn't really want to allocate the resources to host a second database to do the "mutation" -- and doesn't want to have to pay to send the entire volume between primary and secondary (i.e. it would rather send an incremental snapshot to a system which already has the previous snapshot). |
Yes, I'm aware I could use dedup. We're paying for our cpus / ram, so dedicating constant cpu+ram to make a rare task (refresh mutated clone) fast felt like a poor tradeoff (I'd rather pay for a bit more disk space). |
@kpande this link quite clearly shows the problem with current clones. After all, if a clone diverges so much from the base snapshot, the permanent parent->child relation between the two is a source of confusion. Splitting the clone would be a clear indication that they diverged so much to not be considered tied anymore. But let me do a more practical example. Let
At some point, something bad happens to vm disk (ie: serious guest filesystem corruption), but in the meantime other users are actively storing new, important data on the other disks. You basically have some contrasting requirements: a) to revert to the old, not corrupted data of yesterday, b) to preserve any new data uploaded, which are not found in any snapshots and c) to cause minimal service interruption. Clones come to mind as a possible solution: you can clone
The affected VM runs from Splitting a clone from its source would completely solve the problem above. It is not clear to me how redacted send/recv would help in this case. |
@kpande first, thank for sharing your view and your solution (which is interesting!). I totally agree that a careful, and very specific, guest configuration (and host dataset tree) can avoid the problem exposed above. That said, libvirt (and its implementation of storage pools) does not play very well with this approach, especially when managing mixed environments with Windows virtual machines. Even more, this was a single example only. Splittable clones would be very useful, for example, when used to create a "gold master / base image", which can be instanced at will to create "real" virtual machines. With the current state of affair, doing that will tax you heavily in allocated space, as you will not be able to ever remove the original, potentially obsolete, snapshot. What surprise me is that, being ZFS a CoW filesystem, this should be a relative simple operation: when deleting the original snapshot, "simply" mark as free any non-referenced block and remove the parent/child relation. In other words, let be the clone a real filesystem, untangled from any source snapshot. Note that I used the world "simply" inside quotes: while it is indeed a simple logical operation, I am not sure if/how well it maps to the underlying zfs filesystem. |
@kpande ok, fair enough - if a real technical problem exists, I must accept it. But this is different from stating that a specific use case in invalid. If this view (ie: impossibility to split a clone from its original parent snapshot without involving the "mythical" BPR) is shared by the zfs developers, I think this FR can be closed. Thanks. |
+1 on needing this feature. Yes, send/recv could be used, but that would require downtime of whatever is using that dataset to switch from the old (clone) to the new dataset. I've ran into situations with LXD where a container is copied (cloned), but that causes problems with my separately managed snapshotting. |
@kpande: again, my use case has the entire dataset being a database, and a couple of variations of the database. From what I've seen, it doesn't look like overlayfs plays nicely w/ zfs as the file system (it seems happy w/ zvols and ext4/xfs according to your notes). It sounds like this approach would cover most cases, in which case documentation explaining how to set up overlayfs w/ ext4/xfs would be welcome. That said, some of us are using zfs not just for the volume management but also for the acl/allow/snapshot behavior/browsing, and would like to be able to use overlayfs w/ zfs instead of ext4/xfs, so if that isn't possible, is there a bug for that? If there is, it'd be good if that was highlighted (from here), if not, if you're endorsing the overlayfs approach, maybe you could file it (if you insist, I could probably write it, but I don't know anything about overlayfs, and that seems like a key technology in the writeup). |
The |
@ptx0 this only works if the guest OS supports |
Folder redirection, while existing since NT, dosn't always work relieable as software exists that (for obscure reasons) dosn't handle redirected folders correctly and simply fails when confronted with redirected Desktop or Documents. Apart from that, clones of Windows installations diverge, all by themselves, massively and quite quickly from the origin, courtesy of Windows Update - having different users logging on and off only speeds this up. |
+1 on this. I just realized today that I had the same misconception that 'zfs promote' does this. Found out it doesnt! |
Does block cloning (#13392) get us closer to a solution here? I'm envisioning an operation to split the parent snapshot of a clone, such that a copy of that snapshot is created, owned by the clone. The parent of the clone changes to the parent of the split snapshot. For example, consider the following setup: graph BT;
a1["fs@1"];
a2["fs@2"]-->a1;
a3["fs@3"]-->a2;
b3["clone@3"]-.->a2;
a["fs"]-->a3;
b["clone"]-->b3;
We run something like graph BT;
a1["fs@1"];
a2["fs@2"]-->a1;
a3["fs@3"]-->a2;
b2["clone@2"]-.->a1;
b3["clone@3"]-->b2;
a["fs"]-->a3;
b["clone"]-->b3;
Of course, we can't just copy a snapshot. If we did that and then deleted one of the snapshots, then any freed blocks would leave the other snapshot corrupted. We need some way to mark the blocks born in that snapshot as being referenced twice, so that the blocks are not freed until both snapshots are deleted. Isn't that exactly what block cloning (#13392) can do for us? ZFS clones: Probably not what you really want describes a scenario where parent and clone datasets have completely diverged from their parent snapshot. Suppose we copy that snapshot, adding its blocks to the block reference table. At this point, we are still using just as much space on disk. But now, we can delete |
Hello Everyone,
I wasn't sure where the correct place to post a request is as this is not an issue, so feel free to close this if this is not correct.
I have a feature request that might be useful to others. I am looking for the capability to split a clone when online, pretty much the same as the netapp vol clone split
there are certain times that a clone has completely diverged from the parent and it doesn't make sense to have the two filesystems linked. The only way I can think to do this today is to perform a zfs send/recv, but this will likely require some downtime to ensure consistency.
What I am proposing is that since zfs knows the blocks that are associated with the parent filesystem, there is a possibility to copy those blocks to a new area and repoint the clone to use those blocks instead (hopefully I have explained that properly). The end state will be a split clone while the filesystem is online and active...
The text was updated successfully, but these errors were encountered: