Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encrypted dataset created under current SmartOS/Illumos cannot be mounted on Linux, gives Input/output error #13301

Open
lhuedepohl opened this issue Apr 5, 2022 · 12 comments
Labels
Component: Encryption "native encryption" feature Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@lhuedepohl
Copy link
Contributor

System information

Type Version/Name
Distribution Name openSUSE Tumbleweed, as well as Debian 11 (proxmox)
Kernel Version 5.17.1, 5.15.19-2-pve (respectively)
Architecture x86_64
OpenZFS Version 2.1.4, 2.1.2 (respectively)

Observed problem

An encrypted filesystem, it and its ZFS pool created under SmartOS, cannot be mounted under Linux, an attempt gives an `Input/output' error:

#> zfs mount zones/secrets
cannot mount 'zones/secrets': Input/output error

Curiously, a 'zfs send --raw zones/secrets | zfs receive zones/secrets2' seems to fix this, the resulting zones/secrets2 can be mounted, then.

How to reproduce the problem

Fortunately, I was now able to recreate this, with a VM installation of SmartOS. I experienced this issue on actual hardware, though. I downloaded the latest iso from

https://us-east.manta.joyent.com/Joyent_Dev/public/SmartOS/20220324T002253Z/smartos-20220324T002253Z.iso

and installed normally on a 20 GB disk image provided to the VM (also tried this with much earlier versions, up to 20200520T174734Z, with the same results). I then created an encrypted filesystem (and a snapshot, not sure if this is relevant, though!) in there, like this

SmartOS (build: 20200520T174734Z)
[root@smartos ~]# zfs create -o encryption=aes-256-gcm -o keyformat=passphrase -o keylocation=prompt zones/secrets
Enter passphrase: 
Re-enter passphrase: 
[root@smartos ~]# zfs snapshot zones/secrets@test
[root@smartos ~]# poweroff

After the shutdown finished, cleanly, I imported the pool on the host:

host> qemu-nbd --connect=/dev/nbd0 /var/lib/libvirt/images/vm1.qcow2
host> zpool import zones -o altroot=/mnt -f
host> zpool status
  pool: zones
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
	The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(7) for details.
config:

	NAME        STATE     READ WRITE CKSUM
	zones       ONLINE       0     0     0
	  nbd0      ONLINE       0     0     0

errors: No known data errors

and then tried to mount the test filesystem, which failed:

host> zfs load-key zones/secrets
Enter passphrase for 'zones/secrets':
host> zfs mount zones/secrets
cannot mount 'zones/secrets': Input/output error

Afterwards, there were errors shown in zpool status:

host> zpool status -v
  pool: zones
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
config:

	NAME        STATE     READ WRITE CKSUM
	zones       ONLINE       0     0     0
	  nbd0      ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        zones/secrets:<0x0>

But as said, the data actually seems to be still there and fine, as it could be revived by a raw send/receive:

host> zfs send --raw -v zones/secrets@test | zfs receive -v dpool/test
full send of zones/secrets@test estimated size is 50.1K
total estimated size is 50.1K
receiving full stream of zones/secrets@test into dpool/test@test
received 18.4K stream in 1 seconds (18.4K/sec)
host> zfs load-key dpool/test   
Enter passphrase for 'dpool/test':
host> zfs mount dpool/test

No warnings (other than from zpool status) were visible in any logs/dmesg. The errors shown in zpool status vanish after a scrub.

Remarks

I am actually not sure about the guarantees you would like to give here, is it expected to be able to mount (encrypted?) datasets from other ZFS flavors? In any case, hope this report helps someone out. I am happy to answer questions or do experiments!

I am also not sure if not SmartOS could be at fault here? But as I get the problems on the Linux side, I start by reporting the issue here.

If the reproduction procedure is too cumbersome I can also provide my test image directly (~200 MB).

Thanks for the good work on ZoL!

@lhuedepohl lhuedepohl added the Type: Defect Incorrect behavior (e.g. crash, hang) label Apr 5, 2022
@ikozhukhov
Copy link
Contributor

i think it is issue on illumos side - they ae not ported all OpenZFS updates and features.
we have ported many updates to dilos , but we will try to reproduce your issue for confirmation.
i think illumos ZFS and OpenZFS can be incompatible in many places right now.

@rincebrain
Copy link
Contributor

@lhuedepohl
Copy link
Contributor Author

I know of at least one reason this might break.

I see! So this is simply not something that is expected to work, right? If so, feel free to close this issue.

@rincebrain
Copy link
Contributor

I'm not saying it shouldn't be made to work, just that this sounded like the case I cited that people intended to come up with a solution for but AFAIK nobody did.

@behlendorf behlendorf added the Component: Encryption "native encryption" feature label Apr 6, 2022
@gamanakis
Copy link
Contributor

gamanakis commented Apr 10, 2022

@lhuedepohl Could you perhaps apply "#12981" in Linux and check if you can mount it then?

Edit: I just saw you tried 2.1.4, where that fix is included.

@rincebrain
Copy link
Contributor

@lhuedepohl Could you perhaps apply "#12981" in Linux and check if you can mount it then?

They also said a zfs send -w | zfs recv made it mountable, so presumably yes, that does indeed work, and the problem is that the ones from illumos aren't coming with that pre-set?

@lhuedepohl
Copy link
Contributor Author

In case someone wants to have a look but not go to the trouble with the SmartOS install, you can now download the
vm1.qcow2.tar.gz (200 MB). Careful, it expands into a 20GB (sparse) file.

@lhuedepohl
Copy link
Contributor Author

You will need the password: 12345678

@stale
Copy link

stale bot commented Jun 10, 2023

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Jun 10, 2023
@lundman
Copy link
Contributor

lundman commented Jun 10, 2023

Stale bot brought this to my attention, we had similar issues in macOS and added a bit of extra code. I/someone should check if macOS can handle it (and afterwards it should work with Linux... but its a one-way move)

@stale stale bot removed the Status: Stale No recent activity for issue label Jun 10, 2023
@jasonbking
Copy link
Contributor

I haven't had a chance to finish it yet, but the work on illumos is here: https://github.com/jasonbking/illumos-gate/tree/zfs-crypto-dnode and shouldn't be too bad to apply to openzfs.

It was designed to be backwards compatible. Unmodified pools will try either approach (I put in a bit to default to the platform's historic behavior) and then set a flag to denote if the project dnode is included in the hash or not. That way older systems that are compatible today stay that way, but systems with the fix can handle either. Additionally (the bit I hadn't done yet) was to add a zpool flag that controls which format new datasets use. Basically the admin stays in control and can keep things backwards compatible if needed.

If someone wanted to finish up the bits before I can get back to it, feel free.

@lundman
Copy link
Contributor

lundman commented Jun 10, 2023

Ah yeah, macOS we try the 3-tuple hash first, and if that fail, check the 2-tuple, if that is ok, carry on. It will then write the 3-tuple when done, hence the one-way. If you have a patch that can remember 2 or 3, that would be good for compatibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Encryption "native encryption" feature Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

7 participants