Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Reset user accounting when raw receiving a snapshot #11300

Closed
wants to merge 1 commit into from

Conversation

gamanakis
Copy link
Contributor

@gamanakis gamanakis commented Dec 7, 2020

Motivation and Context

See #10523, #11221, #11294.

Description

When raw receiving a snapshot we should reset user accounting (to be recalculated upon loading the key and mounting it). When raw receiving a snapshot the local_mac is set to 0, as it's not portable. By resetting the dnode_phys_structures related to user accounting we now force zeroing out the local_mac upon mounting. Otherwise we will calculate a local_mac != 0 which prevents the dataset from being mounted.

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@gamanakis
Copy link
Contributor Author

@tcaputi, @behlendorf would you mind looking over this?

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Dec 7, 2020
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks makes sense to me. Thanks for tackling this so quickly.

module/zfs/dsl_crypt.c Outdated Show resolved Hide resolved
@gamanakis gamanakis force-pushed the raw_sends2 branch 2 times, most recently from 493e4c6 to 6ebe413 Compare December 8, 2020 20:15
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Dec 9, 2020
@gamanakis
Copy link
Contributor Author

I had a brief discussion with @tcaputi . He pointed out that we would like to compare the local_mac also in corner cases where user accounting but not userobj accounting is enabled. We decided that following a different strategy may be more appropriate: start from the code base of 2.0.0 (ie revert d1d4769) and make sure that whenever we raw-receive an encrypted dataset we zero-out user accounting and userobj accouting. That way we should cover all corner-cases.

@behlendorf behlendorf added Status: Work in Progress Not yet ready for general review and removed Status: Accepted Ready to integrate (reviewed, tested) labels Dec 15, 2020
@gamanakis
Copy link
Contributor Author

I am working on how to implement the new approach. In zio_crypt.c in 2.0.0 we have:

	if ((datalen >= OBJSET_PHYS_SIZE_V3 &&
	    osp->os_userused_dnode.dn_type == DMU_OT_NONE &&
	    osp->os_groupused_dnode.dn_type == DMU_OT_NONE &&
	    osp->os_projectused_dnode.dn_type == DMU_OT_NONE) ||
	    (datalen >= OBJSET_PHYS_SIZE_V2 &&
	    osp->os_userused_dnode.dn_type == DMU_OT_NONE &&
	    osp->os_groupused_dnode.dn_type == DMU_OT_NONE) ||
	    (datalen <= OBJSET_PHYS_SIZE_V1)) {
		bzero(local_mac, ZIO_OBJSET_MAC_LEN);
		return (0);
	}

This essentially means we need to zero-out the dnode_phys_t in the objset of the dataset which is being received (along with any os_flags signaling that user accounting is present). We can do this in dsl_crypt.c right after the local_mac of the received dataset is set to zero (https://github.com/openzfs/zfs/blob/zfs-2.0-release/module/zfs/dsl_crypt.c#L2108). This approach actually works and the raw-received dataset can now be mounted.

However I fail to see why zfs userspace on the received dataset does not return any output, which essentially means that userspace metadata are not updated, not even after a re-mount of the dataset. Any suggestions?

@gamanakis gamanakis marked this pull request as draft December 24, 2020 07:57
@gamanakis gamanakis changed the title Do not compare the local_mac if user accounting has not been completed [WIP] Do not compare the local_mac if user accounting has not been completed Dec 24, 2020
@gamanakis
Copy link
Contributor Author

gamanakis commented Jan 6, 2021

Due to the previous problem I followed a new approach:
I just defined a new os_flag which is set when we raw receive a dataset (right after the local_mac of that dataset is set to 0) and results in skipping comparing the local_mac when mounting. After the first mount this new flag is cleared when user(obj)accounting is updated and thus it should not interfere with any corner case.

Issue #11294 should be resolved that way as the newly defined flag defaults to 0 (unset) and previously encrypted datasets using intermediate encryption versions would be normally mounted by taking into account the local_mac.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely prefer this approach, it's nice that it only applies to the raw recv case and relies on all of the existing logic. @tcaputi would you mind taking another look at this.

@gamanakis
Copy link
Contributor Author

gamanakis commented Feb 1, 2021

@behlendorf thank you for taking a look! I had a brief discussion with @tcaputi today. He would prefer to avoid introducing a new flag and instead zero out the dnode_phys_t structures. The problem is that although the dataset can be mounted this way, the userspace metadata on the receiving dataset are not updated upon remounting. I am looking into the code to see why this is happening and how we could make it work.

@gamanakis
Copy link
Contributor Author

gamanakis commented Feb 3, 2021

@behlendorf @tcaputi Zero-ing out the dnode_phys_t this way poses 2 problems:

  1. Upon destroying the receiving dataset (zfs destroy -r test/source) we have a panic:
[  271.714041] VERIFY3(0 == ba.ba_phys->bt_bytes) failed (0 == 3072)
[  271.715414] PANIC at bptree.c:295:bptree_iterate()
[  271.716503] Showing stack for process 3756
[  271.716510] CPU: 9 PID: 3756 Comm: txg_sync Tainted: P           OE     5.4.30-1-vfio-debug #1
[  271.716512] Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS ELITE/Z390 AORUS ELITE-CF, BIOS F10g 09/16/202
0
[  271.716514] Call Trace:
[  271.716528]  dump_stack+0x66/0x90
[  271.716560]  spl_panic+0xef/0x117 [spl]
[  271.716818]  ? queued_spin_unlock+0x5/0x10 [zfs]
[  271.717040]  ? __raw_spin_unlock+0x5/0x10 [zfs]
[  271.717260]  bptree_iterate+0x423/0x4f0 [zfs]
[  271.717525]  ? dsl_scan_obsolete_block_cb+0xc0/0xc0 [zfs]
[  271.717781]  dsl_process_async_destroys+0xaa/0x730 [zfs]
[  271.718021]  dsl_scan_sync+0x1e1/0xb00 [zfs]
[  271.718307]  spa_sync_iterate_to_convergence+0x139/0x310 [zfs]
[  271.718572]  spa_sync+0x38d/0x870 [zfs]
[  271.718848]  txg_sync_thread+0x2f7/0x450 [zfs]
[  271.719113]  ? txg_dispatch_callbacks+0xf0/0xf0 [zfs]
[  271.719153]  thread_generic_wrapper+0x78/0xb0 [spl]
[  271.719166]  kthread+0xfb/0x130
[  271.719190]  ? IS_ERR+0x10/0x10 [spl]
[  271.719195]  ? kthread_park+0x90/0x90
[  271.719201]  ret_from_fork+0x35/0x40
  1. zfs userspace on receiving dataset returns no output.
% sudo zfs userspace test/source
%

Is there a recommended way to zero-out the dnode_phys_t?

@tcaputi
Copy link
Contributor

tcaputi commented Feb 3, 2021

Ah ok. so the problem here is that you haven't frreed the blocks that the dnode is pointing to. The assert is telling you that it detected that the destroy didn't destroy everything it was supposed to. @behlendorf im a little rusty on the exact function, but he should just have to free the dnode's data before zeroing it out correct?

For the second problem, did you mount the dataset before checking the space?

@gamanakis
Copy link
Contributor Author

gamanakis commented Feb 3, 2021

For the second problem, did you mount the dataset before checking the space?

Yes, exported the whole pool, imported again, reentered keys, and mounted the dataset.

@gamanakis
Copy link
Contributor Author

gamanakis commented Feb 13, 2021

3c3e60c: The above panic is unresolved, zfs userspace test/source after receiving back the raw snapshot returns no output, even after remounting the dataset. There is still a leak of dnode_t structures upon unloading the zfs module.

@gamanakis
Copy link
Contributor Author

6b51856: managed to reset the dnode_phys_t, now fails when issuing a zio_encrypt() and cannot lookup the key mapping:

[ +10.506741] VERIFY3(0 == spa_do_crypt_objset_mac_abd(B_TRUE, spa, dsobj, zio->io_abd, psize, BP_SHOULD_BYTESWAP(bp))) failed (0 == 2)
[  +0.000999] PANIC at zio.c:4127:zio_encrypt()
[  +0.000355] Showing stack for process 953
[  +0.000002] CPU: 9 PID: 953 Comm: z_wr_iss Tainted: P           OE     5.4.30-1-vfio-debug #1
[  +0.000001] Call Trace:
[  +0.000006]  dump_stack+0x66/0x90
[  +0.000010]  spl_panic+0xef/0x117 [spl]
[  +0.000105]  zio_encrypt+0x711/0x830 [zfs]
[  +0.000007]  ? spl_kmem_cache_free+0x11b/0x1b0 [spl]
[  +0.000078]  zio_execute+0xea/0x220 [zfs]
[  +0.000007]  taskq_thread+0x254/0x4b0 [spl]
[  +0.000005]  ? wake_up_q+0x60/0x60
[  +0.000077]  ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
[  +0.000002]  kthread+0xfb/0x130
[  +0.000007]  ? param_set_taskq_kick+0xf0/0xf0 [spl]
[  +0.000001]  ? kthread_park+0x90/0x90
[  +0.000003]  ret_from_fork+0x35/0x40

@gamanakis
Copy link
Contributor Author

gamanakis commented Feb 22, 2021

Printing out the address of the failing zio and inspecting with sdb we get:

sdb> echo 0xffff95d5b881a220 | cast zio_t * | deref
(zio_t){
        .io_bookmark = (zbookmark_phys_t){
                .zb_objset = (uint64_t)538,
                .zb_object = (uint64_t)0,
                .zb_level = (int64_t)-1,
                .zb_blkid = (uint64_t)0,
        },
        .io_prop = (zio_prop_t){
                .zp_checksum = (enum zio_checksum)ZIO_CHECKSUM_FLETCHER_4,
                .zp_compress = (enum zio_compress)ZIO_COMPRESS_EMPTY,
                .zp_type = (dmu_object_type_t)DMU_OT_NONE,
                .zp_level = (uint8_t)11,
                .zp_copies = (uint8_t)0,
                .zp_dedup = (boolean_t)512,
                .zp_dedup_verify = (boolean_t)B_FALSE,
                .zp_nopwrite = (boolean_t)B_FALSE,
                .zp_encrypt = (boolean_t)B_FALSE,
                .zp_byteorder = (boolean_t)B_TRUE,
                .zp_salt = (uint8_t [8]){ 1 },
                .zp_iv = (uint8_t [12]){},
                .zp_mac = (uint8_t [16]){},
                .zp_zpl_smallblk = (uint32_t)0,
        },

I suspect this may be due to one of the objects (DMU_USERUSED_OBJECT and DMU_GROUPUSED_OBJECT) we zeroed out leading to the failure of looking up its key mapping.

@gamanakis
Copy link
Contributor Author

gamanakis commented Mar 2, 2021

511ca43: Solved the problem looking up the key by generating a new key mapping. Now the panic when destroying the source dataset is resolved. Looking to see why zfs userspace test/source still doesn't generate output.

9132142: cleanup

@gamanakis
Copy link
Contributor Author

gamanakis commented Mar 6, 2021

efba406: added some debugging functions, upon writing new files in the dataset (that raw received the incremental snapshot) zfs userspace returns a meaningful output. Meaning the corresponding objects are not created/updated in the right order after mounting the dataset.

@gamanakis
Copy link
Contributor Author

2d70177: If we print out the dn_type of os_userused_dnode we get:

[  +2.136665] NOTICE: create obj
[  +0.076103] NOTICE: upgrade, dn_type_uu: 39
[  +4.300333] NOTICE: create obj
[  +0.052383] NOTICE: upgrade, dn_type_uu: 39
[  +0.551302] --SENDING BACK--
[  +0.138830] ---MOUNTING---
[  +0.039028] NOTICE: upgrade, dn_type_uu: 0
[  +0.003418] NOTICE: create obj
[  +0.000614] ---ZFS USERSPACE---

Meaning that in the case of the dataset which raw-received the snapshot the zeroed out dnodes are upgraded before they are populated/created.

@gamanakis
Copy link
Contributor Author

I thought that we could zero out the os_flags of the dataset receiving the raw-snapshot in dmu_recv_end_sync() to force recalculation of user accounting. Zeroing out those flags can be validated by missing objects (-1, -2 and -3) with zdb -dd. After mounting the dataset user accounting is recalculated and can be seen by zdb -dd as present objects (-1, -2 and -3).

However despite all this, zfs userspace still returns no output in this PR. Notably, if we create a new file in the dataset, only the new file appears in zfs userspace.

Signed-off-by: George Amanakis <gamanakis@gmail.com>
@gamanakis gamanakis changed the title [WIP] Do not compare the local_mac if user accounting has not been completed [WIP] Reset user accounting when raw receiving a snapshot Jun 24, 2021
@gamanakis gamanakis closed this Jul 23, 2021
@jumbi77
Copy link
Contributor

jumbi77 commented Jul 23, 2021

@gamanakis Was it intented to close this? If there is a lack of review maybe just ping behlendorf and mmaybee politely again?

behlendorf pushed a commit that referenced this pull request Jan 21, 2022
Raw receiving a snapshot back to the originating dataset is currently
impossible because of user accounting being present in the originating
dataset.

One solution would be resetting user accounting when raw receiving on
the receiving dataset. However, to recalculate it we would have to dirty
all dnodes, which may not be preferable on big datasets.

Instead, we rely on the os_phys flag
OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is
incomplete when raw receiving. Thus, on the next mount of the receiving
dataset the local mac protecting user accounting is zeroed out.
The flag is then cleared when user accounting of the raw received
snapshot is calculated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #12981 
Closes #10523
Closes #11221
Closes #11294
Closes #12594
Issue #11300
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Feb 5, 2022
Raw receiving a snapshot back to the originating dataset is currently
impossible because of user accounting being present in the originating
dataset.

One solution would be resetting user accounting when raw receiving on
the receiving dataset. However, to recalculate it we would have to dirty
all dnodes, which may not be preferable on big datasets.

Instead, we rely on the os_phys flag
OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is
incomplete when raw receiving. Thus, on the next mount of the receiving
dataset the local mac protecting user accounting is zeroed out.
The flag is then cleared when user accounting of the raw received
snapshot is calculated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes openzfs#12981 
Closes openzfs#10523
Closes openzfs#11221
Closes openzfs#11294
Closes openzfs#12594
Issue openzfs#11300
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
Raw receiving a snapshot back to the originating dataset is currently
impossible because of user accounting being present in the originating
dataset.

One solution would be resetting user accounting when raw receiving on
the receiving dataset. However, to recalculate it we would have to dirty
all dnodes, which may not be preferable on big datasets.

Instead, we rely on the os_phys flag
OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is
incomplete when raw receiving. Thus, on the next mount of the receiving
dataset the local mac protecting user accounting is zeroed out.
The flag is then cleared when user accounting of the raw received
snapshot is calculated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes openzfs#12981 
Closes openzfs#10523
Closes openzfs#11221
Closes openzfs#11294
Closes openzfs#12594
Issue openzfs#11300
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
Raw receiving a snapshot back to the originating dataset is currently
impossible because of user accounting being present in the originating
dataset.

One solution would be resetting user accounting when raw receiving on
the receiving dataset. However, to recalculate it we would have to dirty
all dnodes, which may not be preferable on big datasets.

Instead, we rely on the os_phys flag
OBJSET_FLAG_USERACCOUNTING_COMPLETE to indicate that user accounting is
incomplete when raw receiving. Thus, on the next mount of the receiving
dataset the local mac protecting user accounting is zeroed out.
The flag is then cleared when user accounting of the raw received
snapshot is calculated.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes openzfs#12981 
Closes openzfs#10523
Closes openzfs#11221
Closes openzfs#11294
Closes openzfs#12594
Issue openzfs#11300
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Work in Progress Not yet ready for general review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants