Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add delay between zpool add zvol and zpool destroy #14052

Merged
merged 1 commit into from
Oct 21, 2022

Conversation

youzhongyang
Copy link
Contributor

Signed-off-by: Youzhong Yang yyang@mathworks.com

Motivation and Context

As investigated by #14026, the zpool_add_004_pos can reliably hang if the timing is not right. It is caused by a race condition between zed doing zpool reopen (due to the zvol being added to the zpool), and the command zpool destroy.

Description

This PR adds a delay between zpool add zvol and zpool destroy.

How Has This Been Tested?

Manually run zpool_add_004_pos test case. Without the fix, it can easily hang; with the fix, it's no longer reproducible.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must never hit this in the CI because the ZED is always stopped when running the test suite and then started/stopped as needed by specific test cases. Out of curiosity are there other test cases which are a problem? I'd think having multiple instances of the ZED running could potentially cause other unexpected failures.

I'm fine with applying this workaround, but let's go ahead and leave the original issue open until the root cause can also be resolved.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 20, 2022
@youzhongyang
Copy link
Contributor Author

I will run a full test suite and report back.

@youzhongyang
Copy link
Contributor Author

I ran the full suite twice, one on a VM with root on ext4, another on a VM with root on zfs, no more hang was observed.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 21, 2022
@behlendorf
Copy link
Contributor

@youzhongyang good to know, thanks.

@behlendorf behlendorf merged commit c5a388a into openzfs:master Oct 21, 2022
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Oct 22, 2022
As investigated by openzfs#14026, the zpool_add_004_pos can reliably hang if 
the timing is not right. This is caused by a race condition between 
zed doing zpool reopen (due to the zvol being added to the zpool), 
and the command zpool destroy.

This change adds a delay between zpool add zvol and zpool destroy to
avoid these issue, but does not address the underlying problem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Issue openzfs#14026
Closes openzfs#14052
@ryao
Copy link
Contributor

ryao commented Oct 23, 2022

I suspect that a less racy solution would be to start a scrub and use the zpool wait command to wait until the scrub has finished. That in theory should close the race without relying on a sleep.

I have not looked at the test to see if that would be okay for the purposes of the test. I am just posting my initial thought to share for others that might have more time to pursue this.

@youzhongyang I might get to this at some point, but that would be at least a couple weeks away. Feel free to consider the idea ahead of that and if it is good, implement it in a follow up patch.

@ixhamza
Copy link
Contributor

ixhamza commented Oct 25, 2022

@youzhongyang I investigated this and the issue seems to be related to zvol instead of zed, however, zed is now able to reproduce this issue as we post a change event once a vdev is added 55c1272, which causes zed to call zpool_reopen_one() from zfsdle_vdev_online() if an auto-expand property is enabled. Before digging into the root cause of this issue, there is a reliable way to reproduce this without zed as well by the following method:

# Create a testpool and add zvol as a vdev
$ zpool create testpool loop0;
$ zfs create -V 512M testpool/testvol;
$ sleep 1;
$ zpool create testpool1 /dev/zvol/testpool/testvol
# Call zpool_reopen_one() and zpool status testpool1 in parallel to reproduce this issue
$ while true; do zpool reopen testpool1 & zpool status testpool1; done

The above while loop would reproduce the same issue in just 1-2 iterations with zed disabled.
The issue is caused because of lock order inversion with one thread calls the zfs_ioc_pool_reopen() while another calls the zfs_ioc_pool_stats(). The following call hierarchy explains the issue:
zpool reopen -- thread-1: zpool_reopen_one() => zfs_ioc_pool_reopen() {Acquires SCL_STATE lock through spa_vdev_state_enter()} => vdev_reopen() => vdev_open() => vdev_disk_open() <=> zvol_open() {Tries to acquire spa_namespace_lock otherwise returns ERESTART and vdev_disk_open() retries}
zpool status -- thread-2: zpool_do_status() => zpool_refresh_stats() => zfs_ioc_pool_stats() => spa_get_stats() => spa_open_common() {Acquires spa_namespace_lock} => spa_config_generate() => spa_config_enter() {Wait for SCL_STATE to be released that was held by zfs_ioc_pool_reopen() from thread-1 context}

In the above scenario, zvol_open() waits for spa_namespace_lock to be freed and keeps returning ERESTART, and vdev_disk_open() retries until the timeout. However, spa_namespace_lockis already acquired by spa_open_common() and it waits on SCL_STATE lock to be released (through spa_config_generate() => spa_config_enter()) and thus creates a deadlock kind of situation. After the timeout, vdev is forcibly removed from the pool and the pool goes into an unrecoverable state.
The correct way is to fix the locking scheme for zvol but it's not straightforward to do so due to the complex locking logic used by the zvol. In my opinion, until we have a proper fix for this, zed should avoid the zvol events in zfsdle_vdev_online(), which can prevent zvol vdev to be reopened by the zed against the udev change event. @behlendorf I can post a patch to ignore zvol events in zfsdle_vdev_online() if you agree. This should be fine as we are not supposed to use auto expand property on zvol vdev.
@freqlabs

@youzhongyang
Copy link
Contributor Author

@ixhamza Good to know, thanks. FYI, #14026 is still open.

andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Nov 9, 2022
As investigated by openzfs#14026, the zpool_add_004_pos can reliably hang if 
the timing is not right. This is caused by a race condition between 
zed doing zpool reopen (due to the zvol being added to the zpool), 
and the command zpool destroy.

This change adds a delay between zpool add zvol and zpool destroy to
avoid these issue, but does not address the underlying problem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Issue openzfs#14026
Closes openzfs#14052
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Nov 9, 2022
As investigated by openzfs#14026, the zpool_add_004_pos can reliably hang if 
the timing is not right. This is caused by a race condition between 
zed doing zpool reopen (due to the zvol being added to the zpool), 
and the command zpool destroy.

This change adds a delay between zpool add zvol and zpool destroy to
avoid these issue, but does not address the underlying problem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Issue openzfs#14026
Closes openzfs#14052
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Nov 9, 2022
As investigated by openzfs#14026, the zpool_add_004_pos can reliably hang if 
the timing is not right. This is caused by a race condition between 
zed doing zpool reopen (due to the zvol being added to the zpool), 
and the command zpool destroy.

This change adds a delay between zpool add zvol and zpool destroy to
avoid these issue, but does not address the underlying problem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Issue openzfs#14026
Closes openzfs#14052
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Nov 9, 2022
As investigated by openzfs#14026, the zpool_add_004_pos can reliably hang if 
the timing is not right. This is caused by a race condition between 
zed doing zpool reopen (due to the zvol being added to the zpool), 
and the command zpool destroy.

This change adds a delay between zpool add zvol and zpool destroy to
avoid these issue, but does not address the underlying problem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Issue openzfs#14026
Closes openzfs#14052
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Nov 9, 2022
As investigated by openzfs#14026, the zpool_add_004_pos can reliably hang if 
the timing is not right. This is caused by a race condition between 
zed doing zpool reopen (due to the zvol being added to the zpool), 
and the command zpool destroy.

This change adds a delay between zpool add zvol and zpool destroy to
avoid these issue, but does not address the underlying problem.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Issue openzfs#14026
Closes openzfs#14052
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants