Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incremental receive silently failing for recursive sends #14568

Merged
merged 5 commits into from
Mar 10, 2023

Conversation

pcd1193182
Copy link
Contributor

Motivation and Context

See #12661

Description

The problem occurs because dmu_recv_begin pulls in the payload and next header from the input stream in order to use the contents of the begin record's nvlist. However, the change to do that before the other checks in dmu_recv_begin occur caused a regression where an empty send stream in a recursive send could have its END record consumed by this, which broke the logic of recv_skip. A test is also included to protect against this case in the future.

How Has This Been Tested?

Using the reproducer script from the issue.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Signed-off-by: Paul Dagnelie <pcd@delphix.com>
module/zfs/dmu_recv.c Outdated Show resolved Hide resolved
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
@pcd1193182
Copy link
Contributor Author

In the latest commit I had to remove the "heal" option from the libzfs test, because the heal parameter only works with an existing dataset (and not having it requires not having an existing dataset), which the libzfs input test infrastructure isn't set up for.

module/zfs/dmu_recv.c Show resolved Hide resolved
module/zfs/dmu_recv.c Outdated Show resolved Hide resolved
tests/zfs-tests/cmd/libzfs_input_check.c Outdated Show resolved Hide resolved
tests/zfs-tests/tests/functional/rsend/rsend_031_pos.ksh Outdated Show resolved Hide resolved
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Mar 6, 2023
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
@behlendorf
Copy link
Contributor

There's always something, reported by the CI with this change.

http://build.zfsonlinux.org/builders/CentOS%208%20x86_64%20%28TEST%29/builds/11694/steps/shell_4/logs/summary

Test: /usr/share/zfs/zfs-tests/tests/functional/rsend/send-c_verify_contents (run as root) [00:02] [FAIL]
04:33:11.79 ASSERTION: zfs send -c -R send replication stream up to the named snap.
04:33:12.14 SUCCESS: eval zfs send -c -R testpool@final > /mnt/backdir-rsend/pool-final-R
04:33:13.75 SUCCESS: eval zfs receive -d -F testpool2 < /mnt/backdir-rsend/pool-final-R
04:33:13.81 11,15d10
04:33:13.81 < PREFIX/sendfs
04:33:13.81 < PREFIX/sendfs@A
04:33:13.81 < PREFIX/sendfs@B
04:33:13.81 < PREFIX/sendfs@D
04:33:13.81 < PREFIX/sendfs@E
04:33:13.81 ERROR: cmp_ds_subs testpool testpool2 exited 1
04:33:13.81 NOTE: Performing test-fail callback (/usr/share/zfs/zfs-tests/callbacks/zfs_dbgmsg.ksh)

Signed-off-by: Paul Dagnelie <pcd@delphix.com>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Mar 10, 2023
@behlendorf behlendorf merged commit da19d91 into openzfs:master Mar 10, 2023
mcmilk pushed a commit to mcmilk/zfs that referenced this pull request Mar 13, 2023
The problem occurs because dmu_recv_begin pulls in the payload and 
next header from the input stream in order to use the contents of 
the begin record's nvlist. However, the change to do that before the 
other checks in dmu_recv_begin occur caused a regression where an 
empty send stream in a recursive send could have its END record 
consumed by this, which broke the logic of recv_skip. A test is 
also included to protect against this case in the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#12661
Closes openzfs#14568
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Mar 16, 2023
The problem occurs because dmu_recv_begin pulls in the payload and 
next header from the input stream in order to use the contents of 
the begin record's nvlist. However, the change to do that before the 
other checks in dmu_recv_begin occur caused a regression where an 
empty send stream in a recursive send could have its END record 
consumed by this, which broke the logic of recv_skip. A test is 
also included to protect against this case in the future.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#12661
Closes openzfs#14568
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants