Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for txg sync if the last DRR_FREEOBJECTS might result in a hole #14358

Merged
merged 1 commit into from
Jan 23, 2023

Conversation

dhedberg
Copy link
Contributor

@dhedberg dhedberg commented Jan 7, 2023

Motivation, Context and Description

If we receive a DRR_FREEOBJECTS as the first entry in an object range, this might end up producing a hole if the freed objects were the only existing objects in the block.

If the txg starts syncing before we've processed any following DRR_OBJECT records, this leads to a possible race where the backing arc_buf_t gets its psize set to 0 in the arc_write_ready() callback while still being referenced from a dirty record in the open txg.

To prevent this, we insert a txg_wait_synced call if the first record in the range was a DRR_FREEOBJECTS that actually resulted in one or more freed objects.

Sponsored by: Findity AB
Closes: #11893

How Has This Been Tested?

The patch has been tested in a VM running a cloud image of Ubuntu 22.04 with kernel 5.15.0-56. The issue can be triggered either by simply by starting a few zfs receive/zfs rollback in a loop with an affected incremental stream and waiting, or by inserting a busy loop before dmu_tx_create() in object_receive():

     // Something like
     if (drro->drr_object == <some-affected-object>) {
         uint64_t wait_txg = 0;
         wait_txg = dmu_objset_spa(rwa->os)->spa_syncing_txg;
         while (dmu_objset_spa(rwa->os)->spa_syncing_txg <= wait_txg) {}
     }

The patch has been tested both by leaving the hacky trigger in place, and also by just letting the final patch run with a few receive threads looping for a while without detecting any issues.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Jan 10, 2023
@behlendorf behlendorf added the Component: Send/Recv "zfs send/recv" feature label Jan 17, 2023
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jan 21, 2023
@behlendorf
Copy link
Contributor

If you can construct a small stream which triggers this it would be nice to add a test case for this.

@dhedberg
Copy link
Contributor Author

I have added a test with the potential of triggering the issue, which can be verified by running it without the fix applied and with the following hack in its stead:

diff --git a/module/zfs/dmu_recv.c b/module/zfs/dmu_recv.c
index f2ff415aa..39d8ae45f 100644
--- a/module/zfs/dmu_recv.c
+++ b/module/zfs/dmu_recv.c
@@ -1959,6 +1959,11 @@ receive_object(struct receive_writer_arg *rwa, struct drr_object *drro,
                        txg_wait_synced(dmu_objset_pool(rwa->os), 0);
        }

+        if (drro->drr_object == 129) {
+                uint64_t wait_txg = dmu_objset_spa(rwa->os)->spa_syncing_txg;
+                while (dmu_objset_spa(rwa->os)->spa_syncing_txg <= wait_txg) {}
+        }
+
        tx = dmu_tx_create(rwa->os);
        dmu_tx_hold_bonus(tx, object_to_hold);
        dmu_tx_hold_write(tx, object_to_hold, 0, 0);

Given the nature of the issue I'm not sure if there's a way to write a test that reliably (or perhaps even likely) triggers it under real world conditions, but this will at least exercise the related code paths.

Does this seem OK?

If we receive a DRR_FREEOBJECTS as the first entry in an object range,
this might end up producing a hole if the freed objects were the
only existing objects in the block.

If the txg starts syncing before we've processed any following
DRR_OBJECT records, this leads to a possible race where the backing
arc_buf_t gets its psize set to 0 in the arc_write_ready() callback
while still being referenced from a dirty record in the open txg.

To prevent this, we insert a txg_wait_synced call if the first
record in the range was a DRR_FREEOBJECTS that actually
resulted in one or more freed objects.

Signed-off-by: David Hedberg <david.hedberg@findity.com>
Sponsored by: Findity AB
Closes: openzfs#11893
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think exercising the relevant code paths should be sufficient. It'd be nice to be able to hit it every time, but I agree it's probably not worth adding additional code to the kmod to be able to trigger exactly this issue. This looks good to me. Thanks.

done
if [[ $i -eq $tries ]]; then
log_fail "Failed to create object with number $num"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned this won't be entirely reliable since it depends on some assumptions about how object numbers are allocated. But as you pointed out we already do this same trick in send_freeobjects.ksh so I'm okay with doing the same thing here.

@behlendorf behlendorf merged commit 37a27b4 into openzfs:master Jan 23, 2023
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Mar 3, 2023
If we receive a DRR_FREEOBJECTS as the first entry in an object range,
this might end up producing a hole if the freed objects were the
only existing objects in the block.

If the txg starts syncing before we've processed any following
DRR_OBJECT records, this leads to a possible race where the backing
arc_buf_t gets its psize set to 0 in the arc_write_ready() callback
while still being referenced from a dirty record in the open txg.

To prevent this, we insert a txg_wait_synced call if the first
record in the range was a DRR_FREEOBJECTS that actually
resulted in one or more freed objects.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: David Hedberg <david.hedberg@findity.com>
Sponsored by: Findity AB
Closes openzfs#11893
Closes openzfs#14358
dhedberg added a commit to dhedberg/zfs that referenced this pull request May 9, 2023
If we receive a DRR_FREEOBJECTS as the first entry in an object range,
this might end up producing a hole if the freed objects were the
only existing objects in the block.

If the txg starts syncing before we've processed any following
DRR_OBJECT records, this leads to a possible race where the backing
arc_buf_t gets its psize set to 0 in the arc_write_ready() callback
while still being referenced from a dirty record in the open txg.

To prevent this, we insert a txg_wait_synced call if the first
record in the range was a DRR_FREEOBJECTS that actually
resulted in one or more freed objects.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: David Hedberg <david.hedberg@findity.com>
Sponsored by: Findity AB
Closes openzfs#11893
Closes openzfs#14358
behlendorf pushed a commit that referenced this pull request May 9, 2023
If we receive a DRR_FREEOBJECTS as the first entry in an object range,
this might end up producing a hole if the freed objects were the
only existing objects in the block.

If the txg starts syncing before we've processed any following
DRR_OBJECT records, this leads to a possible race where the backing
arc_buf_t gets its psize set to 0 in the arc_write_ready() callback
while still being referenced from a dirty record in the open txg.

To prevent this, we insert a txg_wait_synced call if the first
record in the range was a DRR_FREEOBJECTS that actually
resulted in one or more freed objects.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: David Hedberg <david.hedberg@findity.com>
Sponsored by: Findity AB
Closes #11893
Closes #14358
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Send/Recv "zfs send/recv" feature Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PANIC at range_tree.c:368:range_tree_remove_impl()
2 participants