Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot zfs receive unencrypted dataset as a child of encrypted dataset ('cannot receive new filesystem stream: inherited key must be loaded') #13033

Closed
rsplaul opened this issue Jan 28, 2022 · 11 comments · Fixed by #13076
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@rsplaul
Copy link

rsplaul commented Jan 28, 2022

System information

Type Version/Name
Distribution Name Debian
Distribution Version bookworm/sid
Kernel Version 5.15.0-2-amd64
Architecture amd64
OpenZFS Version zfs-2.1.2-1 / zfs-kmod-2.1.2-1

Describe the problem you're observing

When receiving an unencrypted dataset as a child of an encrypted dataset, zfs receive complains with 'cannot receive new filesystem stream: inherited key must be loaded'

Describe how to reproduce the problem

zpool create pool1 /path/to/dev1
zpool create pool2 /path/to/dev2
zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt pool1/enc
zfs create -o encryption=off pool1/enc/noenc
zfs snap pool1/enc@snap
zfs snap pool1/enc/noenc@snap
zfs send -w pool1/enc@snap | zfs receive pool2/enc
zfs send -w pool1/enc/noenc@snap | zfs receive pool2/enc/noenc

Include any warning/errors/backtraces from the system logs

n/a

Additional remarks

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259231 obviously describes the same problem. However, it seems that the unencrypted dataset must be a child of an encrypted dataset on both sides to produce this behaviour.

@rsplaul rsplaul added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jan 28, 2022
@AttilaFueloep
Copy link
Contributor

Yes, I can reproduce this. There are actually two issues here. The first one is with your sequence of commands. Since pool1/enc/noenc isn't encrypted zfs send -w pool1/enc/noenc@snap is equivalent to zfs send -Lec pool1/enc/noenc@snap. This will generate a send stream without properties and the zfs receive pool2/enc/noenc will inherit all properties, including encryption, from pool2/enc, meaning the receive will encrypt pool2/enc/noenc on the fly. You also want to add -u to the receive since the parent pool2/enc isn't mounted. So you should either do

zfs snap -r pool1/enc@snap
zfs send -w pool1/enc@snap | zfs receive pool2/enc
zfs send -w pool1/enc/noenc@snap | zfs receive -u -o encryption=off pool2/enc/noenc

or

zfs snap -r pool1/enc@snap
zfs send -w pool1/enc@snap | zfs receive pool2/enc
zfs send -pw pool1/enc/noenc@snap | zfs receive -u pool2/enc/noenc

or even

zfs snap -r pool1/enc@snap
zfs send -Rw pool1/enc@snap | zfs receive -u pool2/enc 

Here comes the second issue into play, the above commands also fail with the same error message. This seems like a bug to me. The initial implementation of native encryption disallowed unencrypted datasets below encrypted ones. This restriction was removed in #8870 and it looks like zfs receive doesn't account for the case where an unencrypted dataset is received unencrypted below an encrypted one while doing the check for the loaded key. I'll put this on my list of raw encrypted send/recv issues I'm looking into.

As a side note, I think that nesting unencrypted datasets below encrypted ones is generally not a good idea, and I would try to avoid such a setup if possible. But I can see that here are setups that benefit from that.

As a workaround you can do

zfs send -w pool1/enc@snap | zfs receive pool2/enc
zfs load-key pool2/enc
zfs send -w pool1/enc/noenc@snap | zfs receive -o encryption=off pool2/enc/noenc

that works as intended but is less than ideal and may not be possible in certain scenarios at all.

@rsplaul
Copy link
Author

rsplaul commented Jan 29, 2022

Thanks for the explanations and your testing. Actually, the scenario in my case is a backup with syncoid on a potentially untrusted low-performance backup-host, hence the idea to do a raw send (--sendoptions=w).

I agree, that nesting unencrypted datasets under encrypted datasets is probably not a good idea in the first place, but since such configurations are possible, zfs receive should somehow support them without breaking the "zero-knowledge principle" that allows for backups of encrypted datasets on untrusted hosts.

@AttilaFueloep
Copy link
Contributor

the scenario in my case is a backup with syncoid on a potentially untrusted low-performance backup-host, hence the idea to do a raw send

Unfortunately that's one of the scenarios where loading the key isn't an option, yes.

but since such configurations are possible, zfs receive should somehow support them without breaking the "zero-knowledge principle" that allows for backups of encrypted datasets on untrusted hosts.

That's right, it should work seamlessly. Spotting the place where things fail is the hard part though ;-)

@gamanakis
Copy link
Contributor

gamanakis commented Feb 4, 2022

By setting the module parameter zfs_flags=512, one of the meaningful places that returns an EACCES (the "inherited key..." error message is because of an EACCES) is:
spa_keystore_dsl_key_hold_dd().

Instrumenting that function with spl_dumpstack() upon failure (ret!=0) we get:

[  +0.000001] Call Trace:
[  +0.000002]  <TASK>
[  +0.000001]  dump_stack_lvl+0x46/0x62
[  +0.000005]  spa_keystore_create_mapping+0x154/0x230 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000078]  dsl_dataset_hold_obj_flags+0x52/0x90 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000059]  dsl_dataset_hold_flags+0xec/0x310 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000057]  dmu_recv_begin_check+0x1ec/0x8b0 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000052]  dsl_sync_task_common+0x155/0x2a0 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000062]  ? recv_check_large_blocks+0x70/0x70 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000051]  ? recv_begin_check_existing_impl+0x5c0/0x5c0 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000050]  ? recv_begin_check_existing_impl+0x5c0/0x5c0 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000049]  ? recv_check_large_blocks+0x70/0x70 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000049]  dsl_sync_task+0x16/0x30 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000062]  dmu_recv_begin+0x1f6/0x3c0 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000051]  zfs_ioc_recv_impl.constprop.0+0x139/0xae0 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000079]  ? _raw_spin_unlock_irqrestore+0xa/0x30
[  +0.000003]  ? spl_kmem_alloc_track+0xf4/0x160 [spl 838f76d197c801d04c49382ba3515ffbe92396af]
[  +0.000007]  zfs_ioc_recv+0x1b0/0x360 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000081]  zfsdev_ioctl_common+0x6a3/0x780 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000079]  ? spl_kmem_alloc_impl+0xb2/0xd0 [spl 838f76d197c801d04c49382ba3515ffbe92396af]
[  +0.000006]  zfsdev_ioctl+0x53/0xf0 [zfs adb46dd4b500cde4a6849fce3ce6b5cdaaf580f7]
[  +0.000082]  __x64_sys_ioctl+0x8e/0xd0
[  +0.000003]  do_syscall_64+0x5c/0x90
[  +0.000002]  ? exc_page_fault+0x72/0x160
[  +0.000001]  entry_SYSCALL_64_after_hwframe+0x44/0xae

This means that the error starts in this part:

1201         if (drc->drc_featureflags & DMU_BACKUP_FEATURE_RESUMING) {
1202                 err = dsl_sync_task(tofs,
1203                     dmu_recv_resume_begin_check, dmu_recv_resume_begin_sync,
1204                     &drba, 5, ZFS_SPACE_CHECK_NORMAL);
1205         } else {
1206
1207                 /*
1208                  * For non-raw, non-incremental, non-resuming receives the
1209                  * user can specify encryption parameters on the command line
1210                  * with "zfs recv -o". For these receives we create a dcp and
1211                  * pass it to the sync task. Creating the dcp will implicitly
1212                  * remove the encryption params from the localprops nvlist,
1213                  * which avoids errors when trying to set these normally
1214                  * read-only properties. Any other kind of receive that
1215                  * attempts to set these properties will fail as a result.
1216                  */
1217                 if ((DMU_GET_FEATUREFLAGS(drc->drc_drrb->drr_versioninfo) &
1218                     DMU_BACKUP_FEATURE_RAW) == 0 &&
1219                     origin == NULL && drc->drc_drrb->drr_fromguid == 0) {
1220                         err = dsl_crypto_params_create_nvlist(DCP_CMD_NONE,
1221                             localprops, hidden_args, &drba.drba_dcp);
1222                 }
1223
1224                 if (err == 0) {
1225                         err = dsl_sync_task(tofs,
1226                             dmu_recv_begin_check, dmu_recv_begin_sync,
1227                             &drba, 5, ZFS_SPACE_CHECK_NORMAL);
1228                         dsl_crypto_params_free(drba.drba_dcp, !!err);
1229                 }

That dmu_recv_begin_check() at the end is a problem because it does:

 587         if (featureflags & DMU_BACKUP_FEATURE_RAW) {
 588                 /* raw receives require the encryption feature */
 589                 if (!spa_feature_is_enabled(dp->dp_spa, SPA_FEATURE_ENCRYPTION))
 590                         return (SET_ERROR(ENOTSUP));
 591
 592                 /* embedded data is incompatible with encryption and raw recv */
 593                 if (featureflags & DMU_BACKUP_FEATURE_EMBED_DATA)
 594                         return (SET_ERROR(EINVAL));
 595
 596                 /* raw receives require spill block allocation flag */
 597                 if (!(flags & DRR_FLAG_SPILL_BLOCK))
 598                         return (SET_ERROR(ZFS_ERR_SPILL_BLOCK_FLAG_MISSING));
 599         } else {
 600                 dsflags |= DS_HOLD_FLAG_DECRYPT;
 601         }

and sets dsflags |= DS_HOLD_FLAG_DECRYPT;, and then:

 603         error = dsl_dataset_hold_flags(dp, tofs, dsflags, FTAG, &ds);

goes on to retrieve the key, which is not loaded, so it errors out.

@gamanakis
Copy link
Contributor

I can confirm that removing the dsflags |= DS_HOLD_FLAG_DECRYPT; resolves this.
Now we only have to find a proper conditional.

@AttilaFueloep
Copy link
Contributor

AttilaFueloep commented Feb 4, 2022

Well, in dmu_recv_begin_check() we could test if the drba->drba_dcp nvlist contains cp_crypt = ZIO_CRYPT_OFF. I think that would imply that we are receiving with -o encryption=off or sending with -p an unencrypted dataset.

Since I'm new to this code please take everything I say with a grain of salt though.

Edit: BTW I've come to this conclusion by dumping the nvlist with gdb.

@gamanakis
Copy link
Contributor

Nice, I think that should work! Do you want to go ahead and file a PR?

@AttilaFueloep
Copy link
Contributor

Yes, I can do that, but before doing so I'd like to trace this a bit more to get a better understanding of the inner workings. That way I'd feel more comfortable changing the code.

@AttilaFueloep
Copy link
Contributor

@gamanakis Feeling sufficiently sure that this is the right approach now, I'm preparing a PR and would like to give you credit for the analysis by adding you as a co-author. Is that OK with you?

@gamanakis
Copy link
Contributor

Sure, thank you!

@AttilaFueloep
Copy link
Contributor

Opened PR #13076 with a fix.

behlendorf pushed a commit that referenced this issue Feb 9, 2022
dmu_recv_begin_check() unconditionally sets the DS_HOLD_FLAG_DECRYPT
flag before calling dsl_dataset_hold_flags(). If the key on the
receiving side isn't loaded or the send stream contains embedded
blocks, the receive check fails for a stream which is perfectly
valid and could be received without any problem. This seems like
a remnant of the initial design, where unencrypted datasets below
encrypted ones weren't allowed.

Add a condition to set `DS_HOLD_FLAG_DECRYPT` only for encrypted
datasets, modify an existing test to detect this regression and add
a test for raw replication streams.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13033 
Closes #13076
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Feb 15, 2022
dmu_recv_begin_check() unconditionally sets the DS_HOLD_FLAG_DECRYPT
flag before calling dsl_dataset_hold_flags(). If the key on the
receiving side isn't loaded or the send stream contains embedded
blocks, the receive check fails for a stream which is perfectly
valid and could be received without any problem. This seems like
a remnant of the initial design, where unencrypted datasets below
encrypted ones weren't allowed.

Add a condition to set `DS_HOLD_FLAG_DECRYPT` only for encrypted
datasets, modify an existing test to detect this regression and add
a test for raw replication streams.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes openzfs#13033 
Closes openzfs#13076
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Feb 16, 2022
dmu_recv_begin_check() unconditionally sets the DS_HOLD_FLAG_DECRYPT
flag before calling dsl_dataset_hold_flags(). If the key on the
receiving side isn't loaded or the send stream contains embedded
blocks, the receive check fails for a stream which is perfectly
valid and could be received without any problem. This seems like
a remnant of the initial design, where unencrypted datasets below
encrypted ones weren't allowed.

Add a condition to set `DS_HOLD_FLAG_DECRYPT` only for encrypted
datasets, modify an existing test to detect this regression and add
a test for raw replication streams.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes openzfs#13033 
Closes openzfs#13076
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Feb 17, 2022
dmu_recv_begin_check() unconditionally sets the DS_HOLD_FLAG_DECRYPT
flag before calling dsl_dataset_hold_flags(). If the key on the
receiving side isn't loaded or the send stream contains embedded
blocks, the receive check fails for a stream which is perfectly
valid and could be received without any problem. This seems like
a remnant of the initial design, where unencrypted datasets below
encrypted ones weren't allowed.

Add a condition to set `DS_HOLD_FLAG_DECRYPT` only for encrypted
datasets, modify an existing test to detect this regression and add
a test for raw replication streams.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes openzfs#13033 
Closes openzfs#13076
nicman23 pushed a commit to nicman23/zfs that referenced this issue Aug 22, 2022
dmu_recv_begin_check() unconditionally sets the DS_HOLD_FLAG_DECRYPT
flag before calling dsl_dataset_hold_flags(). If the key on the
receiving side isn't loaded or the send stream contains embedded
blocks, the receive check fails for a stream which is perfectly
valid and could be received without any problem. This seems like
a remnant of the initial design, where unencrypted datasets below
encrypted ones weren't allowed.

Add a condition to set `DS_HOLD_FLAG_DECRYPT` only for encrypted
datasets, modify an existing test to detect this regression and add
a test for raw replication streams.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes openzfs#13033 
Closes openzfs#13076
nicman23 pushed a commit to nicman23/zfs that referenced this issue Aug 22, 2022
dmu_recv_begin_check() unconditionally sets the DS_HOLD_FLAG_DECRYPT
flag before calling dsl_dataset_hold_flags(). If the key on the
receiving side isn't loaded or the send stream contains embedded
blocks, the receive check fails for a stream which is perfectly
valid and could be received without any problem. This seems like
a remnant of the initial design, where unencrypted datasets below
encrypted ones weren't allowed.

Add a condition to set `DS_HOLD_FLAG_DECRYPT` only for encrypted
datasets, modify an existing test to detect this regression and add
a test for raw replication streams.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes openzfs#13033 
Closes openzfs#13076
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants