Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reschedule processes on error. #6312

Merged
merged 2 commits into from
Jul 6, 2017
Merged

Conversation

ab-oe
Copy link
Contributor

@ab-oe ab-oe commented Jul 5, 2017

Description

On the single core machine the system may hang when the spa_namespare_lock acquisition fails in the zvol_first_open function. It returns ERESTARTSYS error what causes the endless loop in __blkdev_get function.

Motivation and Context

Solves system hang on single core machines.

How Has This Been Tested?

On KVM single core virtual machine following test has been run:

zpool create p0 sdb
for i in `seq 1 100`; do zfs create -s -V10G p0/zvi${1}; done
zpool export p0
zpool import p0

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

@mention-bot
Copy link

@ab-oe, thanks for your PR! By analyzing the history of the files in this pull request, we identified @behlendorf, @bprotopopov and @tuxoko to be potential reviewers.

On the single core machine the system may hang when the
spa_namespare_lock acquisition fails in the zvol_first_open
function. It returns ERESTARTSYS error what causes the endless
loop in __blkdev_get function.

Signed-off-by: Arkadiusz Bubała <arkadiusz.bubala@open-e.com>
@kernelOfTruth
Copy link
Contributor

kernelOfTruth commented Jul 5, 2017

@behlendorf I've added this to 0.7.0-rc5 since this is a potential stability/reliability blocker for UP boxes (hang), hope that's okay

edit:
the issue (#6283) is there already anyway - so it's more or less redundant

@@ -1358,6 +1358,8 @@ zvol_open(struct block_device *bdev, fmode_t flag)
mutex_exit(&zv->zv_state_lock);
if (drop_suspend)
rw_exit(&zv->zv_suspend_lock);
if (error)
schedule();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's restrict this to the -ERESTARTSYS case. This makes it clear exactly why this additional schedule call is here, all other error values should be returned to sys_open().

if (error == -ERESTARTSYS)
       schedule();

Signed-off-by: Arkadiusz Bubała <arkadiusz.bubala@open-e.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants