Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes raw send on encrypted datasets does not work when copying snapshots back #12594

Closed
digitalsignalperson opened this issue Sep 29, 2021 · 77 comments · Fixed by #12981
Closed
Labels
Component: Encryption "native encryption" feature Component: Send/Recv "zfs send/recv" feature Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@digitalsignalperson
Copy link

System information

Type Version/Name
Distribution Name Arch Linux
Distribution Version rolling
Kernel Version 5.14.8-arch1-1
Architecture x86_64
OpenZFS Version zfs-2.1.1-1

Describe the problem you're observing

I am able to send raw encrypted snapshots (incremental and replication streams) back and forth between file systems a limited number of times before getting a cannot mount 'rpool/mydataset': Input/output error and errors in zpool status.

I have tried many sequences of sends/receives with raw encrypted snapshots, sometimes I can pass back and forth only 1 time, others more. Below I will share two repeatable examples.

This seems like manifestation of the issue in "Raw send on encrypted datasets does not work when copying snapshots back #10523" which was previously resolved.

Describe how to reproduce the problem

Example 1 - fails on first send back

zfs create rpool/test_000 -o encryption=on -o keyformat=passphrase

# create some data and snapshots
touch /mnt/test_000/1.txt
zfs snapshot rpool/test_000@1
touch /mnt/test_000/2.txt
zfs umount rpool/test_000
zfs snapshot rpool/test_000@2

# send to a new encryption root
zfs send -Rw rpool/test_000@2 | zfs receive -u rpool/test_001

# modify data, snapshot, and send back
zfs mount -l rpool/test_001
touch /mnt/test_001/3.txt
zfs umount rpool/test_001
zfs snapshot rpool/test_001@3
zfs send -i @2 -w rpool/test_001@3 | zfs receive -u rpool/test_000

# try to mount
zfs mount rpool/test_000
# cannot mount 'rpool/test_000': Input/output error

Example 2 - more convoluted, but fails after a few back and forth

zfs create rpool/test_002 -o encryption=on -o keyformat=passphrase

# create some data and snapshots
touch /mnt/test_002/1.txt
zfs snapshot rpool/test_002@1
touch /mnt/test_002/2.txt
zfs snapshot rpool/test_002@2
touch /mnt/test_002/3.txt
zfs umount rpool/test_002
zfs snapshot rpool/test_002@3

# send to new encryption root (same steps as Example 1 so far)
zfs send -Rw rpool/test_002@3 | zfs recv -u rpool/test_003

# send to another new encryption root
zfs load-key rpool/test_003
zfs send -Rw rpool/test_003@3 | zfs receive -u rpool/test_004

# modify data, snapshot, and send back
zfs load-key rpool/test_004
zfs mount rpool/test_004
touch /mnt/test_004/4.txt
zfs snapshot rpool/test_004@4
zfs send -w -i @3 rpool/test_004@4 | zfs receive -u rpool/test_003

# try to mount - success where Example 1 failed and only difference is that extra send in-between
zfs mount rpool/test_003
ls /mnt/test_003

# modify data again and send back
touch /mnt/test_003/5.txt
umount rpool/test_003
zfs snapshot rpool/test_003@5
zfs send -w -i @4 rpool/test_003@5 | zfs receive -u rpool/test_004
ls /mnt/test_004/

# modify data and send back
touch /mnt/test_004/6.txt
zfs snapshot rpool/test_004@6
zfs send -w -i @5 rpool/test_004@6 | zfs receive -u rpool/test_003
zfs mount rpool/test_003
cannot mount 'rpool/test_003': Input/output error

At this point the output of zpool status -v includes

errors: Permanent errors have been detected in the following files:

        rpool/test_000:<0x0>
        rpool/test_003:<0x0>

If I rollback the last snapshot in question, then scrub once

zfs rollback -r rpool/test_000@2
zfs rollback -r rpool/test_003@5
zpool scrub rpool
zpool status -v

status still shows

errors: Permanent errors have been detected in the following files:

        rpool/test_000:<0x0>
        rpool/test_003:<0x0>

but if I scrub a second time

zpool scrub rpool
zpool status -v

end up with

errors: No known data errors

and if I repeat the last operation in question, it will get the same IO error again.

The steps are repeatable for me. I don't know if every step matters (e.g. extraneous load-key when I don't mount). I also have some other examples that fail at different points, but I figured these were simple enough to share.

@digitalsignalperson digitalsignalperson added the Type: Defect Incorrect behavior (e.g. crash, hang) label Sep 29, 2021
@rincebrain
Copy link
Contributor

I recommend not using native encryption until it gets a fair bit more polish in the future (I'm only so hopeful).

@gamanakis
Copy link
Contributor

gamanakis commented Sep 30, 2021

This happens because when sending raw encrypted datasets the userspace accounting is present
when it's not expected to be. This leads to the subsequent mount failure due a checksum error when verifying the local mac.
I tried unsuccessfully to tackle this in #11300.
See also: #10523, #11221, #11294.

Edit: If you have critical data lost due to this case I could help you recover them.

@putnam
Copy link

putnam commented Oct 17, 2021

I am able to reproduce this.

At a high level I wanted to send an unencrypted dataset to a new pool with encryption enabled, wipe the old pool, and raw send this encrypted dataset back to a fresh pool. But snapshots end up being sent back the other direction at times since it's on an active system. In this process I discovered this bug for myself.

I've made a repro script here which just makes a file-based pool and puts it into a broken state:

#!/bin/bash
truncate -s 64M /tmp/test.pool
echo "12345678" > /tmp/test.pool.key
zpool create testpool /tmp/test.pool
zfs create -o encryption=on -o keyformat=passphrase -o keylocation=file:///tmp/test.pool.key testpool/test-source
echo "honk" > /testpool/test-source/honk
zfs snapshot testpool/test-source@before
zfs send -w testpool/test-source@before | zfs recv testpool/test-dest

# key is not currently loaded for test-dest; load it to check and confirm files
zfs load-key -L file:///tmp/test.pool.key testpool/test-dest
zfs mount testpool/test-dest
# ls /testpool/test-dest
# honk

# now edit the dataset on test-dest, snapshot it, and send it back
echo "honk2" > /testpool/test-dest/honk2
zfs snapshot testpool/test-dest@after
zfs send -w -I testpool/test-dest@before testpool/test-dest@after | zfs recv testpool/test-source

# both files now exist in test-source; looks good (snapshots match between them too)
# ls /testpool/test-source
# honk honk2

# but as soon as you unmount and unload the key, then reload the key and mount it again...
zfs unmount testpool/test-source
zfs unload-key testpool/test-source
zfs load-key -L file:///tmp/test.pool.key testpool/test-source
zfs mount testpool/test-source
# cannot mount 'testpool/test-source': Input/output error
# zpool status -v testpool will show permanent errors
zpool status -v testpool

echo "to clean up:"
echo " zpool destroy testpool && rm /tmp/test.pool && rm /tmp/test.pool.key"

Worse, I originally poked around testing this on my personal pool and it faulted the pool in such a way that even destroying those affected datasets didn't help me:

# zpool status tank -v
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A

<snip>

errors: Permanent errors have been detected in the following files:

        <0xec80>:<0x0>

@aerusso
Copy link
Contributor

aerusso commented Oct 17, 2021

@putnam Hey! Thanks a ton for getting a local reproducer working!

I, however, cannot get this to work (i.e., bug out) on my test platform: an Intel laptop (but I unfortunately has never managed to reproduce on that device). I don't have time right now, but I will try this on my production machine (which does have the problem).

I (therefore) think there may be a hardware component to this bug/these bugs. In the meantime, can you check this on 0.8.6? (I'm low-key hoping you'll be willing to bisect this.)

@digitalsignalperson
Copy link
Author

Ran @putnam's script on my own setup (system info at the top) and it did not result in any errors. The final zfs mount testpool/test-source was successful and no errors on the pool.

I just added a few more sending data back and forth and it made it break for me. @aerusso I suspect if you pass snapshots back and forth a few more times (even if it varies per hardware) it will break eventually. I was thinking it would be easy to also write a bit of a fuzzing script to randomly send raw snapshots back and forth, unmounting, remounting, etc that should be able to generate many fail cases; not sure if that would be of help.

my mods to the script below

#!/bin/bash
truncate -s 64M /tmp/test.pool
echo "12345678" > /tmp/test.pool.key
zpool create testpool /tmp/test.pool
zfs create -o encryption=on -o keyformat=passphrase -o keylocation=file:///tmp/test.pool.key testpool/test-source
echo "honk" > /testpool/test-source/honk
zfs snapshot testpool/test-source@before
zfs send -w testpool/test-source@before | zfs recv testpool/test-dest

# key is not currently loaded for test-dest; load it to check and confirm files
zfs load-key -L file:///tmp/test.pool.key testpool/test-dest
zfs mount testpool/test-dest
# ls /testpool/test-dest
# honk

# now edit the dataset on test-dest, snapshot it, and send it back
echo "honk2" > /testpool/test-dest/honk2
zfs snapshot testpool/test-dest@after
zfs send -w -I testpool/test-dest@before testpool/test-dest@after | zfs recv testpool/test-source

# both files now exist in test-source; looks good (snapshots match between them too)
# ls /testpool/test-source
# honk honk2

# but as soon as you unmount and unload the key, then reload the key and mount it again...
zfs unmount testpool/test-source
zfs unload-key testpool/test-source
zfs load-key -L file:///tmp/test.pool.key testpool/test-source
zfs mount testpool/test-source

# ------------------------------
# Not an issue for me yet - my modifications below

# modify source, snapshot, send to dest
zfs rollback testpool/test-dest@after
touch /testpool/test-source/1
zfs snapshot testpool/test-source@1
zfs send -w -i @after testpool/test-source@1 | zfs recv testpool/test-dest

# modify dest, snapshot, send to source
zfs rollback testpool/test-source@1
touch /testpool/test-dest/2
zfs snapshot testpool/test-dest@2
zfs send -w -i @1 testpool/test-dest@2 | zfs recv testpool/test-source

# Everything looks ok with things still mounted.
# try reloading

zfs unmount testpool/test-source
zfs unload-key testpool/test-source
zfs load-key -L file:///tmp/test.pool.key testpool/test-source
zfs mount testpool/test-source

#cannot mount 'testpool/test-source': Input/output error

@putnam for me it got rid of those errors after two scrubs if you want to try to fix your personal pool

@putnam
Copy link

putnam commented Oct 17, 2021

Thanks @aerusso and @digitalsignalperson for the feedback and updates. I wonder what is different between our setups. For anyone running that script please post your kernel and ZFS versions at the time you ran it. (uname -a; cat /sys/module/zfs/version)

My kernel at the time of test: Debian 5.14.0-1-amd64
ZFS version: Debian 2.0.6-1

I did also find someone else wrote up a similar script (#11983) to attempt a reliable repro. This bug has been reported in several places and probably needs consolidation. It's also clear some efforts have already been made and maybe the root cause is already well-understood. See #11300 which has not been updated in ~4 months.

The situation seems kind of bad. I don't know all the possible use cases where it might occur (probably many) but my situation is:

  • I want to make an encrypted offsite replica for my pool and use raw sends to avoid loading keys
  • My current pool is unenecrypted and I want to encrypt it and, as a bonus, rebalance data on my vdevs.
  • I want to send everything to the new backup pool, promote that backup pool to master temporarily (so it receives writes etc.) then destroy my original pool and recreate it, sending the now-updated contents of the backup pool back to the original pool. Then I demote the backup pool back to being a backup, and go back to using my original pool as the master. Then I offsite the backup pool. I realize this is slightly convoluted, but the reasoning is that the backup pool is a denser configuration that takes up fewer RUs and the offsite is more limited in available space.
  • The problem is that I end up changing stuff on the 2nd pool and want to send it back, and I want to keep using raw sends. I did a dry-run with a test pool and ran into this bug.

@digitalsignalperson I will do two scrubs (this is a large pool so it'll take ~3 days) and report back if it fixes the pool error. Thanks!

@pepsinio
Copy link

@putnam i have same problem. did you find any solution?

@aerusso
Copy link
Contributor

aerusso commented Oct 19, 2021

Thanks! The modified version "works" (breaks) reliably on my test platform.

@putnam
Copy link

putnam commented Oct 19, 2021

@digitalsignalperson

Confirming that two back-to-back scrubs cleared the corruption error. Not sure the technical reason why it took two scrubs, but glad it's cleared.

For what it's worth, my system is an Epyc 7402P with 128GB of ECC RAM.

@digitalsignalperson
Copy link
Author

not sure either about the two scrubs, but saw it in suggested/reported in one or some of the other similar encryption issues

@rincebrain
Copy link
Contributor

ZFS remembers the errors from the last completed scrub too, which is why it takes two with the errors gone for them to go away, AIUI.

@bghira
Copy link

bghira commented Dec 19, 2021

this bug has likely existed since the introduction of the encryption feature.

@marker5a
Copy link

Having the same issue on this end... had no idea that my encrypted backups were getting hosed until I went to restore some of my datasets from my backup.

I do encrypted send w/ syncoid (--sendoptions="w") to backup to the backup pool. The only problem though is that I tried the double scrub but I'm still getting the same input/output error. Is there any other hope to recover the data from the backup pool?

It sounded like from other comments that you need to first get the error to go away from zpool status, and then do two scrubs... is that correct in my thinking? I'm going to spool up a bunch of scrubs sequentially as a last hail mary, but any other pointers would be useful

@rincebrain
Copy link
Contributor

Usually, one would go "[remove whatever is causing errors]" "[scrub twice]" and then zpool status would no longer list those errors.

@marker5a
Copy link

Yeah, that makes sense... the confusing thing is that doing the scrub "cleared the error", at least as far as zfs was concerned. Then doing the two sequential scrubs after that, zfs reports no errors, so I would have thought that trying to mount the dataset after two error-less scrubs would allow me to mount the dataset, but I'm still having issues.

Does clearing the error in this case refer to doing more than an initial scrub to make zfs think the error went away?

Also, side note, is there any plausible way to forensically recover the dataset by manually decrypting it? I unfortunately know too little about under the hood to know if this is even possible... or how one would go about it

@rincebrain
Copy link
Contributor

I believe the reason for the counterintuitive behavior is that the error is coming from trying to decrypt things, which zpool scrub very notably does not do.

From my understanding of the problem based on @gamanakis's patch and replies, I would assume it would be possible to write a patch to just ignore the failing bits and let you extract your data. (The existing reverted fix for this might even do that, I'm not sure.)

@marker5a
Copy link

I believe the reason for the counterintuitive behavior is that the error is coming from trying to decrypt things, which zpool scrub very notably does not do.

From my understanding of the problem based on @gamanakis's patch and replies, I would assume it would be possible to write a patch to just ignore the failing bits and let you extract your data. (The existing reverted fix for this might even do that, I'm not sure.)

Ok, well yeah, that does make a bit more sense in terms of scrub being unaware.

I'll see if @gamanakis responds here with any helpful info... also trying to figure out if zdb can be useful in getting the data out without patching zfs

@digitalsignalperson
Copy link
Author

I'd be interested to hear any solution. I wouldn't mind starting to use raw encrypted sends for offsite backup if there was a hacky workable recovery method.

@gamanakis
Copy link
Contributor

gamanakis commented Dec 22, 2021

@marker5a You could cherry-pick the commit here: gamanakis@c379a3c on top of zfs-2.1.0, zfs-2.1.1, or zfs-2.1.2.

That should resolve your problem. That commit just introduces a flag that marks the useraccounting metadata as invalid when being received, and so it forces their recalculation upon first mounting of the received dataset and avoids the error encountered otherwise.

@rincebrain
Copy link
Contributor

What's the reason not to try and get that approach merged in general?

@marker5a
Copy link

@marker5a You could cherry-pick the commit here: gamanakis@c379a3c on top of zfs-2.1.0, zfs-2.1.1, or zfs-2.1.2.

That should resolve your problem. That commit just introduces a flag that marks the useraccounting metadata as invalid when being received, and so it forces their recalculation upon first mounting of the received dataset and avoids the error encountered otherwise.

@gamanakis Thanks for speedy reply!!! I was about to go down that route by decided to sit on my hands and wait, lol. I'll give that a try and report back my findings... thanks!

@digitalsignalperson
Copy link
Author

The justification for the reversion in 6217656 was the original fix

could lead to failure mounting encrypted datasets created with intermediate versions of ZFS encryption available in master between major releases.

seems odd, reads like it's picking to break one thing (failure to mount raw encrypted sends in general) over another thing (failure to mount encrypted datasets created in-between releases using git master?). If we stick to releases is there any harm of the original patch?

@psy0rz
Copy link

psy0rz commented Jan 7, 2022

Because we cant always 100% trust zfs, I'm trying to create an intelligent zfs-compare tool that will compare the latest common snapshots in two pools. It will shasum the actual zvols and files, instead of relying on zfs. It will also transfer a remote dataset thats encrypted and only has a local key, so that the encryption key isnt needed remotely. Does anyone want this tool as well?

@digitalsignalperson
Copy link
Author

@psy0rz I'd be curious to see how it works.

@Blackclaws
Copy link

Blackclaws commented Jan 12, 2022

Seems to be the same issue as: #10523 Might it make sense to somehow merge all those issues into one and get rid of some of the duplicates? There appear to be a number of issues open on this subject and so far no fix since more than a year?

@gamanakis
Copy link
Contributor

gamanakis commented Jan 13, 2022

What's the reason not to try and get that approach merged in general?

@rincebrain the person who wrote the encryption (@tcaputi) suggested in PR #11300 that instead of introducing the new flag we should zero out the dnodes holding user accounting on the receiving side.

However, in the absence of a loaded key I fail to see how this is possible. If those dnodes are freed in the receiving side it starts searching for the key. I have pinged Tom again in this regard.

@Blackclaws
Copy link

@gamanakis Looking at the pull request there are a couple of things happening that apparently break when you do this. Wouldn't one solution be to just flag the raw received dataset so that this is done on first mount when the key has already been loaded? That way you don't run into missing key issues and evade a failing mount at the same time.

It seems from a cursory glance at the issue and pull request that the data is all well and there and its just some metadata that is corrupted. I guess a perfect solution would be to also correctly send the metadata, if that isn't possible flagging it for cleanup on next mount seems to be the next sensible thing to do.

@gamanakis
Copy link
Contributor

to just flag the raw received dataset so that this is done on first mount when the key has already been loaded?

Right, that was my initial approach (draft gamanakis/zfs@c379a3c).

a perfect solution would be to also correctly send the metadata

This was the latter approach which I have trouble implementing.

@Blackclaws
Copy link

I think that this sort of flag is needed in any case and might be cleared by checking whether metadata is valid if in future releases valid metadata can be sent/received.

The problem is that if you look at forwards/backwards compatbility there are already versions with raw send out there that will send wrong metadata. This will have to be handled anyway unless you want to do a major version break for this. So preemptively flagging the received sets and then deciding on first mount whether the data you received was good or needs to be fixed is I think the best way to go forward.

@gamanakis
Copy link
Contributor

gamanakis commented Jan 13, 2022

There is not a problem with older versions. The flags are stored in a uint64_t and the placeholder for the new flag defaults to 0, ie inactivated by default.

@Blackclaws
Copy link

I think I wasn't clear in what I meant.
I meant that while the idea of sending correct metadata is certainly the better longterm solution, its insufficient for solving the issues when sending from an old zfs version to a fixed one. Maybe I also didn't understand whether sending or receiving was actually the culprit here.

In the case where the sending side has to be upgraded to fix the metadata issue we need to add the Dirty flag anyway so that we can check a received dataset because we can't reliably know if its from a new enough version right? I'm unfortunately not very knowledgeable about zfs internals.

@gamanakis
Copy link
Contributor

I misunderstood what you said, you are right I think. I will do some cleanup and open a new PR introducing the flag,

@rincebrain
Copy link
Contributor

I wonder how pathological it could be to add a case for "if I fail to decrypt specifically the userobj_accounting metadata, just throw it out", possibly guarded by a tunable. (That, or a zhack command to go forcibly set the "clear on next open" flag.)

It'd be a shame for people to have to throw out pre-patch recvs.

@gamanakis
Copy link
Contributor

gamanakis commented Jan 20, 2022

I suspect PR #12981 (update of #11300) resolves this. Anyone interested, feel free to try it out.

@gamanakis
Copy link
Contributor

Both examples (1 and 2) in OP from @digitalsignalperson and the reproducer from @putnam complete without any errors with #12981 applied.

behlendorf pushed a commit that referenced this issue Jan 21, 2022
Raw receiving a snapshot back to the originating dataset is currently
impossible because of user accounting being present in the originating
dataset.

One solution would be resetting user accounting when raw receiving on
the receiving dataset. However, to recalculate it we would have to dirty
all dnodes, which may not be preferable on big datasets.

Instead, we rely on the os_phys flag
OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is
incomplete when raw receiving. Thus, on the next mount of the receiving
dataset the local mac protecting user accounting is zeroed out.
The flag is then cleared when user accounting of the raw received
snapshot is calculated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12981 
Closes #10523
Closes #11221
Closes #11294
Closes #12594
Issue #11300
@digitalsignalperson
Copy link
Author

Awesome thanks @gamanakis! I tested here in my vagrant box and seems to work. Looking forward to using raw sends

@putnam
Copy link

putnam commented Jan 21, 2022

Thanks @gamanakis !!

I hope after this, #12720 can get some attention. I am still not able to use raw sends as the generated send stream contains odd/damaged objects that break the receive.

@ipaqmaster
Copy link

ipaqmaster commented Feb 4, 2022

I experienced this last night reformatting a desktop with the expectation that I could zfs send -w my root, home and other encrypted datasets back to the machine from my nas. Only the ones using encryption (aes-256-gcm) gave "Input/Output error" responses when trying to mount them on the local machine after receiving. Unfortunately I didn't have the opportunity to try #12981 and just rsync'd everything to new encrypted datasets made locally overnight.

It's fun. The nas that these encrypted datasets were sent to could mount them just fine given the nature of the bug. But so could a laptop which I zfs send -w'd them to as well from that nas. The laptop was running openzfs version v2.1.1 on Archlinux kernel version 5.15.5. It is only the desktop which also received the encrypted datasets that could not mount them, experiencing the issue described here with a new zpool which was running the same kernel version and was downgraded to openzfs 2.1.1 to match the laptop who was successful at mounting them.

At this point my first guess is that the version of zfs your zpool were created on plays a part. But I'll have some more fun playing with it in a VM today now that I'm back online.

At least with the nas mounting the data, I was able to rsync it to the desktop as a one-off. I had a thread on reddit/zfs here but for now I've worked around the issue for myself.

Thank you for the PR @gamanakis.

@ipaqmaster
Copy link

ipaqmaster commented Feb 5, 2022

Just adding my own tests which are consistent with above comments when using ashift=9 and ashift=12 where using 12 causes the problem.

The same archlinux usb stick was able to zfs recv the "broken encrypted dataset" and mount it perfectly fine using default zpool settings. It was only when I used ashift=12 in the zpool creation that the "Input/Output error" issue became apparent.

Testing conditions:

Linux 5.15.5 and zfs 2.1.1, then zfs 2.1.2

  1. I booted the same archlinux usb stick I used to rebuild my desktop but in a qemu VM
  2. I created a 100GB qcow2 image for it to use and attached that using VirtIO which presented it to the VM as a 512b sectored "disk".
  3. I created a zpool 'zfstest' with no arguments (new zpool of default settings) on this virtual drive in the VM. Without specifying an ashift, zfs picked ashift=9 by default, which matches what the encrypted dataset was stored on originally.
  4. I sent the same failing datset from my nas to this VM's qcow2 zpool.
  5. It loaded the key and mounted just fine.

Then I did those tests again but for step 3 I included -o ashift=12 and got the IO error when trying to mount the received encrypted dataset.

Setting ashift=12 on zfs 2.1.1 and 2.1.2 causes the issue for me in a VM where /dev/vda was presented as a device using 512b sectors to the VM. Seems consistent enough. My laptop was able to read my desktop's encrypted root dataset because it's zpool still used ashift=9 like the encrypted root dataset it received and could mount.

tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Feb 5, 2022
Raw receiving a snapshot back to the originating dataset is currently
impossible because of user accounting being present in the originating
dataset.

One solution would be resetting user accounting when raw receiving on
the receiving dataset. However, to recalculate it we would have to dirty
all dnodes, which may not be preferable on big datasets.

Instead, we rely on the os_phys flag
OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is
incomplete when raw receiving. Thus, on the next mount of the receiving
dataset the local mac protecting user accounting is zeroed out.
The flag is then cleared when user accounting of the raw received
snapshot is calculated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes openzfs#12981 
Closes openzfs#10523
Closes openzfs#11221
Closes openzfs#11294
Closes openzfs#12594
Issue openzfs#11300
@gamanakis
Copy link
Contributor

In my tests it also happens with ashift=9 (default, checked with zdb) when raw sending back to the originating pool.

@ipaqmaster
Copy link

In my tests it also happens with ashift=9 (default, checked with zdb) when raw sending back to the originating pool.

I am curious, was your pool originally ashift=12 when it sent the snapshot away initially and is receiving it back as ashift=9 to experience the issue?

@gamanakis
Copy link
Contributor

gamanakis commented Feb 5, 2022

Without PR 12981 I cannot raw receive in the originating pool, regardless of the ashift, it throws an Input/Output error when mounting.

Let me try your case with 12981 applied.

@gamanakis
Copy link
Contributor

gamanakis commented Feb 5, 2022

Ok, this seems to be a different issue. Raw sending from pool1/encrypted with ashift=9 to pool2/encrypted with ashift=12 results to failure when mounting pool2/encrypted (Input/Output error).

I think you should open a new issue. I am not sure raw sending between pools with different ashift is possible, will take a look.

@gamanakis
Copy link
Contributor

I opened #13067 for this matter, did some debugging there too.

nicman23 pushed a commit to nicman23/zfs that referenced this issue Aug 22, 2022
Raw receiving a snapshot back to the originating dataset is currently
impossible because of user accounting being present in the originating
dataset.

One solution would be resetting user accounting when raw receiving on
the receiving dataset. However, to recalculate it we would have to dirty
all dnodes, which may not be preferable on big datasets.

Instead, we rely on the os_phys flag
OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is
incomplete when raw receiving. Thus, on the next mount of the receiving
dataset the local mac protecting user accounting is zeroed out.
The flag is then cleared when user accounting of the raw received
snapshot is calculated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes openzfs#12981 
Closes openzfs#10523
Closes openzfs#11221
Closes openzfs#11294
Closes openzfs#12594
Issue openzfs#11300
nicman23 pushed a commit to nicman23/zfs that referenced this issue Aug 22, 2022
Raw receiving a snapshot back to the originating dataset is currently
impossible because of user accounting being present in the originating
dataset.

One solution would be resetting user accounting when raw receiving on
the receiving dataset. However, to recalculate it we would have to dirty
all dnodes, which may not be preferable on big datasets.

Instead, we rely on the os_phys flag
OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is
incomplete when raw receiving. Thus, on the next mount of the receiving
dataset the local mac protecting user accounting is zeroed out.
The flag is then cleared when user accounting of the raw received
snapshot is calculated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes openzfs#12981 
Closes openzfs#10523
Closes openzfs#11221
Closes openzfs#11294
Closes openzfs#12594
Issue openzfs#11300
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Encryption "native encryption" feature Component: Send/Recv "zfs send/recv" feature Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.