-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.6.0.93 causing soft-lockups with Ubuntu 12.04 #1227
Comments
Ok I can confirm I'm still having problems with it on 0.6.0.93 with kernel 3.5.0-22-generic on Ubuntu 12.10 as well - the symptom is that services like nginx and nfsd deadlock waiting on "lookup_slow" and "vfs_readdir". This was all working on the previous daily build, so there's been some type of regression since then. I'm rolling back to the ZFS stable PPA to see if it fixes it. EDIT: This is all on a system with a 20-disk pool split across two RAID-Z3 volumes. Everything else seems to work properly - but nginx and the nfs-kernel-server both seemed to deadlock. I was having earlier issues with a dataset not being able to be mounted or visible as well (it kept claiming to be mounted in /proc/mounts, but wasn't accessible or unmountable). |
@wrouesnel Yes, there were some regressions which were missing, we'll get them nailed down. If you get a change can you please apply openzfs/spl@27d2633 to your spl source and set the new module option spl_kmem_cache_aging=0. This will effectively disable the cache aging in favor of reclaiming the caches where there is memory pressure. I expect it will resolved the issues your seeing with the stack traces but I'd love to verify that. |
I tried. [ 5016.154628] zfs: Unknown symbol invalidate_inodes_fn (err 0) |
You'll need to update ZFS to the latest code as well or revert openzfs/spl@769528f |
Revert didn't help with the same error. I tried ppa:zfs-native/daily, but with the zfs.ko cannot be loaded. [14562.389826] SPL: Loaded module v0.6.0.93-rc13 |
Cache aging was implemented because it was part of the default Solaris kmem_cache behavior. The idea is that per-cpu objects which haven't been accessed in several seconds should be returned to the cache. On the other hand Linux slabs never move objects back to the slabs unless there is memory pressure on the system. This behavior is now configurable through the 'spl_kmem_cache_expire' module option. The value is a bit mask with the following meaning. 0x1 - Solaris style cache aging eviction is enabled. 0x2 - Linux style low memory eviction is enabled. Both methods may be safely enabled simultaneously, but by default both are disabled. It has never been clear if the kmem cache aging (which has been around from day one) actually does any good. It has however been the source of numerous bugs so I wouldn't mind retiring it entirely. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs/zfs#1227 Closes #210
@wrouesnel OK, the expected fixes have been merged in to master. If you want to give it a try you'll need to update both the spl and zfs to the latest master source. |
Hello, I am new to here. have similar issue on my machine. Can you check if it is same issue? NSF not response in random time (max healthy time is 1 day), Basically no io activity when zpool stop response. I have to reset machine to bring everything up. Thanks setup info: Apr 7 05:50:43 mirror kernel: [83348.432001] CPU 4 |
The original cause of these issues was resolved in the master source. This issue has also been muddled by a few different problem reports so I'm closing this issue. If there are still issues in this area let's open a new issue. |
Just installed the newest ZoL daily package on my server running Ubuntu 12.04 (kernel 3.2) today, and got inundated with soft-lockup bugs. I've attached a 1-second sample from my logs (since I don't quite know what I'm looking at). This same machine was fine with the previous version of ZoL.
I'm trying an update to kernel 3.5 to see if that fixes it (since I have a desktop and laptop with ZoL which show no problems).
The text was updated successfully, but these errors were encountered: