Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix message for sending snapshots from ... about explicitly disabled datasets, and further in-vivo research #623

Merged
merged 8 commits into from
Feb 7, 2024

Conversation

jimklimov
Copy link
Contributor

@jimklimov jimklimov commented Jan 15, 2024

I am debugging my setup with partially-disabled trees of datasets (home dir with setup of a build agent should be backed up, but scratch working areas and caches should not).

Currently znapzend lists all datasets as "sending snapshots from..." which is a bit misleading.

With this PR I externalize listDisabledSourceDescendants() so it applies not only to createSnapshot() but also to e.g. sendRecvCleanup(). In fact, after some back and forth, the array of known-disabled local (source) descendant dataset names is queried from ZFS once and attached to $backupSet as @{$backupSet->{srcDisabledDescendants}} so it can be re-used quickly and in different places.

So now the report goes like:

[2024-01-15 10:08:05.52188] [2405200] [info] refreshing backup plans for dataset "rpool/home/abuild" ...
[2024-01-15 10:08:05.87011] [2405200] [info] checking for explicitly excluded ZFS dependent datasets under 'rpool/home/abuild'
[2024-01-15 10:08:06.29950] [2405200] [info] Found disabled sub: rpool/home/abuild/.ccache
[2024-01-15 10:08:06.29973] [2405200] [info] Found disabled sub: rpool/home/abuild/jenkins-nut
[2024-01-15 10:08:06.29977] [2405200] [info] Found disabled sub: rpool/home/abuild/jenkins-nut-altroots
[2024-01-15 10:08:06.29981] [2405200] [info] Found disabled sub: rpool/home/abuild/jenkins-nut-doc
[2024-01-15 10:08:06.30024] [2405200] [info] found a valid backup plan for rpool/home/abuild...
...
[2024-01-15 10:13:25.93202] [2410053] [debug] sending snapshots from rpool/home/abuild to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild
[2024-01-15 10:13:42.82858] [2410053] [debug] sending snapshots from rpool/home/abuild/.ccache to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/.ccache: not enabled, should be skipped
...

Barring any bugs, this PR should not change the znapzend end-user behavior beyond such cosmetics.

Looking at some further work ahead, I see a couple of issues with the existing logic (screenshot below):

  • Currently each dataset not slated for retention must have an explicit org.znapzend:enabled=off - and this is not inherited by its descendants (well, it is a "storage=inherited" attribute, but not a "storage=local" so ignored by znapzend). This is by design so far, but is cumbersome for large setups where I'd want a whole tree pruned with whatever datasets appear there over time, so I propose to add handling for such datasets that can optionally declare both enabled='on|off' and recursion=on for such purpose (currently there is special handling for datasets that declare only one property and that is enabled).

  • Logic for these disablements is such that a recursive snapshot of the backupSet (the one with a full znapzendzetup schedule) is made atomically, data is sent, and then disabled snapshots get removed locally and remotely.

    • I believe, in non-oracleMode this is handled as one recursive send, hence the trickery. With it in place, each dataset goes one by one so might be skipped cleanly - especially now that we have a way to know?..
    • Maybe for backupSets with recursion and some not-enabled descendants, we should fall back to oracleMode even if it is not asked for in config (and then exclusions quickly skipped from sending)? @oetiker : WDYT? :) UPDATE: A proposal about this quick skip is tackled in Handle not-sending of not-enabled datasets #626
    • In some but not all cases I see it goes to try sending out the snapshots for disabled datasets as well: e.g. rpool/home/abuild/.ccache is sent below, rpool/home/abuild/jenkins-nut-altroots is not, both are disabled. In fact it seems that the sending (with attempt to unmount and redefine the dataset?) happens if there is no snapshot on the destination (backup pool), and does not happen if there are snapshots (maybe it helps that they are compatible between the two hosts, as well).
[2024-01-15 10:19:14.48192] [2414199] [info] starting work on backupSet rpool/home/abuild
[2024-01-15 10:19:14.51896] [2414199] [debug] sending snapshots from rpool/home/abuild to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild
[2024-01-15 10:19:19.43557] [2414199] [debug] sending snapshots from rpool/home/abuild/.ccache to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/.ccache: not enabled, should be skipped
cannot unmount '/srv/libvirt/abuild/.ccache': permission denied
warning: cannot send 'rpool/home/abuild/.ccache@znapzend-auto-2024-01-15T10:11:47Z': signal received
[2024-01-15 10:19:20.09820] [2414199] [warn] ERROR: cannot send snapshots to pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/.ccache on znapzend
[2024-01-15 10:19:20.09856] [2414199] [debug] sending snapshots from rpool/home/abuild/jenkins-nut to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut: not enabled, should be skipped
[2024-01-15 10:19:21.43644] [2414199] [debug] sending snapshots from rpool/home/abuild/jenkins-nut-altroots to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots: not enabled, should be skipped
[2024-01-15 10:19:21.84354] [2414199] [debug] sending snapshots from rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh
[2024-01-15 10:19:42.23397] [2414199] [debug] sending snapshots from rpool/home/abuild/jenkins-nut-altroots/jenkins-archlinux-amd64 to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots/jenkins-archlinux-amd64
...

Or in more detail:

...
[2024-01-15 10:33:02.29285] [2423768] [info] starting work on backupSet rpool/home/abuild
# zfs list -H -r -o name -t filesystem,volume rpool/home/abuild

[2024-01-15 10:33:02.33838] [2423768] [debug] sending snapshots from rpool/home/abuild to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild
[2024-01-15 10:33:02.33871] [2423768] [debug] Are we sending "--since"? since=="0", skipIntermediates=="0", forbidDestRollback=="0", justCreated=="false"
# zfs list -H -o name -t snapshot -s creation -d 1 rpool/home/abuild
# ssh -o batchMode=yes -o ConnectTimeout=30 znapzend zfs list -H -o name -t snapshot -s creation -d 1 pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild
# zfs send -Lce -I 'rpool/home/abuild@znapzend-auto-2024-01-15T10:19:14Z' 'rpool/home/abuild@znapzend-auto-2024-01-15T10:33:01Z'|ssh -o batchMode=yes -o ConnectTimeout=30 znapzend 'zfs recv -u -F pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild'
# zfs list -H -o name -t snapshot rpool/home/abuild@znapzend-auto-2024-01-15T10:33:01Z
# zfs set org.znapzend:dst_0=znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild rpool/home/abuild@znapzend-auto-2024-01-15T10:33:01Z
# zfs set org.znapzend:dst_0_synced=1 rpool/home/abuild@znapzend-auto-2024-01-15T10:33:01Z

[2024-01-15 10:33:07.86728] [2423768] [debug] sending snapshots from rpool/home/abuild/.ccache to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/.ccache: not enabled, should be skipped
[2024-01-15 10:33:07.86772] [2423768] [debug] Are we sending "--since"? since=="0", skipIntermediates=="0", forbidDestRollback=="0", justCreated=="false"
# zfs list -H -o name -t snapshot -s creation -d 1 rpool/home/abuild/.ccache
# ssh -o batchMode=yes -o ConnectTimeout=30 znapzend zfs list -H -o name -t snapshot -s creation -d 1 'pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/.ccache'
# zfs send -Lce 'rpool/home/abuild/.ccache@znapzend-auto-2024-01-15T10:11:47Z'|ssh -o batchMode=yes -o ConnectTimeout=30 znapzend 'zfs recv -u -F '"'"'pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/.ccache'"'"''
cannot unmount '/srv/libvirt/abuild/.ccache': permission denied
warning: cannot send 'rpool/home/abuild/.ccache@znapzend-auto-2024-01-15T10:11:47Z': signal received
[2024-01-15 10:33:08.50054] [2423768] [warn] ERROR: cannot send snapshots to pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/.ccache on znapzend

[2024-01-15 10:33:08.50091] [2423768] [debug] sending snapshots from rpool/home/abuild/jenkins-nut to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut: not enabled, should be skipped
[2024-01-15 10:33:08.50104] [2423768] [debug] Are we sending "--since"? since=="0", skipIntermediates=="0", forbidDestRollback=="0", justCreated=="false"
# zfs list -H -o name -t snapshot -s creation -d 1 rpool/home/abuild/jenkins-nut
# ssh -o batchMode=yes -o ConnectTimeout=30 znapzend zfs list -H -o name -t snapshot -s creation -d 1 pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut
# zfs list -H -o name -t snapshot rpool/home/abuild/jenkins-nut@znapzend-auto-2024-01-15T10:11:47Z
# zfs set org.znapzend:dst_0=znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut rpool/home/abuild/jenkins-nut@znapzend-auto-2024-01-15T10:11:47Z
# zfs set org.znapzend:dst_0_synced=1 rpool/home/abuild/jenkins-nut@znapzend-auto-2024-01-15T10:11:47Z

[2024-01-15 10:33:09.36190] [2423768] [debug] sending snapshots from rpool/home/abuild/jenkins-nut-altroots to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots: not enabled, should be skipped
[2024-01-15 10:33:09.36220] [2423768] [debug] Are we sending "--since"? since=="0", skipIntermediates=="0", forbidDestRollback=="0", justCreated=="false"
# zfs list -H -o name -t snapshot -s creation -d 1 rpool/home/abuild/jenkins-nut-altroots
# ssh -o batchMode=yes -o ConnectTimeout=30 znapzend zfs list -H -o name -t snapshot -s creation -d 1 pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots
# zfs list -H -o name -t snapshot rpool/home/abuild/jenkins-nut-altroots@znapzend-auto-2024-01-15T10:11:47Z
# zfs set org.znapzend:dst_0_synced=1 rpool/home/abuild/jenkins-nut-altroots@znapzend-auto-2024-01-15T10:11:47Z
# zfs set org.znapzend:dst_0=znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots rpool/home/abuild/jenkins-nut-altroots@znapzend-auto-2024-01-15T10:11:47Z

[2024-01-15 10:33:09.80435] [2423768] [debug] sending snapshots from rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh to znapzend:pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh
[2024-01-15 10:33:09.80467] [2423768] [debug] Are we sending "--since"? since=="0", skipIntermediates=="0", forbidDestRollback=="0", justCreated=="false"
# zfs list -H -o name -t snapshot -s creation -d 1 rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh
# ssh -o batchMode=yes -o ConnectTimeout=30 znapzend zfs list -H -o name -t snapshot -s creation -d 1 pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh
# zfs send -Lce -I 'rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh@znapzend-auto-2024-01-15T10:19:14Z' 'rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh@znapzend-auto-2024-01-15T10:33:01Z'|ssh -o batchMode=yes -o ConnectTimeout=30 znapzend 'zfs recv -u -F pond/export/DUMP/NUTCI/znapzend/ci-deb/rpool/home/abuild/jenkins-nut-altroots/ci-debian-altroot--jenkins-debian11-s390x-ssh'
...

…of org.znapzend:enabled" for sub-datasets

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
… cleanup of snapshots on enabled=off sub-datasets

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
…teSnapshot() so it can also be used in sendRecvCleanup()

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
… done in refreshBackupPlans() once

Track the list of names as @{$backupSet->{srcDisabledDescendants}}

Signed-off-by: Jim Klimov <jimklimov@gmail.com>
@oetiker oetiker merged commit 62394b2 into oetiker:master Feb 7, 2024
4 checks passed
@jimklimov jimklimov deleted the fix-msg-sending branch February 7, 2024 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants