Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS stuck #1620

Closed
edillmann opened this issue Jul 31, 2013 · 16 comments
Closed

ZFS stuck #1620

edillmann opened this issue Jul 31, 2013 · 16 comments
Milestone

Comments

@edillmann
Copy link
Contributor

Hi,

I'm did pull the following pull-requests over current master
#1610
#1612
#1614
#1496

zfs build fine but after some time (several hours). zfs is stuck (no io), and I get the following kernel traces

http://pastebin.com/jv3qMVv9

ps|grep zfs show's only a zfs snapshot command

kernel is 3.9.2

@edillmann
Copy link
Contributor Author

@dweeezil i did of course pull dweeezil/zfs@a0dc667

regards,
Eric

@dweeezil
Copy link
Contributor

dweeezil commented Aug 1, 2013

@edillmann I've been meaning to try running the Illumos 2882/2883/2900 with some of those recent changes but haven't had a chance to yet. I'm leaving for vacation today so I'm going to be off the grid for a week. I've been trying to keep the patch rebased to current master as well as possible and maybe by the time I get back, some of those other things will have been committed.

Taking a quick look at your stacks, I'll take a shot in the dark and ask whether you've got all your devices using the noop scheduler? Presumably this hang is hit-or-miss to reproduce?

@edillmann
Copy link
Contributor Author

In fact it seems to be the deadline scheduler :-( !!!

@dweeezil
Copy link
Contributor

dweeezil commented Aug 3, 2013

@edillmann De-cloaking on my vacation for a moment: The one thing I think is a pain in the butt when using partitions (I always use my own GPT partitions) is to set noop scheduler on the underlying disks. I have a feeling it would be very difficult or hack-ish to get ZFS to do that job automatically.

@casualfish
Copy link
Contributor

@edillmann Does this bug also happen without pulling these changes? Since there are multiple commits involved perhaps you can use git bisect to locate the specific commit causing this issue.

@edillmann
Copy link
Contributor Author

@casualfish yes, in fact this bug first appears under last master (without patches applied)
In fact i do no understand the relation between io-scheduler noop and zfs, but it seem's to but that way.
For now i force the io-scheduler to be noop by appending (elevator=noop) to kernel command line.
I lost this param when I did upgrade the kernel.

@casualfish
Copy link
Contributor

@edillmann The interaction between vdevs and linux kernel block layer is in file vdev_disk.c. You needn't have to manually set elevator=noop since it's the default scheduler when opening a vdev(disk type) https://github.com/zfsonlinux/zfs/blob/master/module/zfs/vdev_disk.c#L312. According to your kernel traces it looks like deadlock has happened. Could you describe your workload and the exact steps to trigger this bug?

@atonkyra
Copy link

atonkyra commented Aug 6, 2013

Out of curiosity, are you running a pre-emptible kernel?

@edillmann
Copy link
Contributor Author

@atonkyra no for now i'am using "No Forced Preemption (Server)" Preemption model.
Should i use Voluntary Kernel Preemption ?

@atonkyra
Copy link

atonkyra commented Aug 8, 2013

@edillmann I think trying it out might be worth the shot.

@edillmann
Copy link
Contributor Author

@atonkyra i will give it a try

@atonkyra
Copy link

atonkyra commented Aug 9, 2013

@edillmann Okay, please report back with any results you get :)

@atonkyra
Copy link

Any updates @edillmann ?

@edillmann
Copy link
Contributor Author

Hi, for now (5 days), changing preemption model to Voluntary Kernel Preemption did the trick, but for how long, who knows, let wait severals days more

@edillmann
Copy link
Contributor Author

For now 19 days uptime, i'm closing this one.
definitively changing preemption model to Voluntary Kernel Preemption did the trick.

@behlendorf
Copy link
Contributor

@edillmann Thanks for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants