-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: unable to handle kernel NULL pointer dereference #5226
Comments
Got another null pointer. This time with the following ZFS version:
|
@tuomari can you fairly consistently reproduce this? If so can you describe the workload. The failure from the stack clearly involves the l2arc device. So if you need to immediately resolve this issue you could probably disable the l2arc device for the time being. However, if you're in a position where you could enable debugging on the system in question that might help us identify the root cause. You'll just need to pass the --enable-debug flag when building the spl and zfs code. For future reference:
|
Likely a duplicate of #5071 |
It seems to pop up every week, or so. I will gladly enable debugging. My payload is mostly generated by Zoneminder, which creates a lot of small files. I have 11 cameras and each camera generates around ~1GB and 6000 files per 10 minute period, so in total roughly 18MB/s and 110 files/s. While the system tries to keep disk usage all the time at 95%, it generates about the same amount of deletes as it has writes. The system is also a netatalk server and stores timemachine backups for ~10 machines and receives offsite zfs snapshots nightly. |
Great, that should help. Give what you've reported thus far I'd expect you to hit one of the ASSERTs.
You mean in term of capacity correct? |
Yes. I hit this, early morning. I'll gladly help in any way I can to resolve this.
|
@tuomari Are you using large blocks? This trace and the fact that you're keeping the pool very full point at a problem with syncing gang blocks. As I mentioned in #5071, it sounds like these problems started after you upgraded from some older version of ZoL. If so, what was the older version? I'll try running some gang block stress tests. |
No (I think). enabled but not active, Sorry, I don't have specific version when this started. I had some problems before, which caused the system to crash almost daily, and I kept upgrading zfs on almost every boot, so I have no specific version when this problem started. |
@tuomari You're correct, "enabled" means you're not using any large blocks. The reason I asked is because they'd amplify the number of gang blocks needed to satisfy allocations on a fragmented and/or almost full pool. Going under the assumption this is the same or is related to #5071, the ingredients seem to be:
|
@tuomari No, I'd be curious to know how many gang blocks your pool has and, much more interestingly, how deeply they're nested. Gang blocks can theoretically be arbitrarily nested and it turns out that causes recursion in the zio pipeline when running I've added a few hacks to my dweeezil:zdb branch. It adds 2 features to the |
I started running zdb, and will report after it has run for a while, or anything intresting shows up. I'm not sure if there are related to this, or not, but I got two asserts on zdb with @dweeezil zdb branch. I first started
|
@tuomari Traversing a live pool, particularly one that's close to full and in use, is going to be hit or miss. I was hoping it would run long enough that you could get some data from the new gang block output in the per-second status reports. |
@dweeezil: zdb is still running, and I will keep it running for the time being. Here are the current results:
I will report back periodically, or if anything intresting pops up. |
@tuomari It's traversing very slowly. No gang block so far, either. There definitely are some, however, it may be impractical to ever find them based on how large your pool is. It might be a better idea to put the gang depth tracking patch into the kernel module and expose it as a per-pool kstat. In any case, we're looking for nonzero values where it says "0[0]". The first zero is the number of gang blocks and the second is the maximum nesting depth seen so far. |
@dweeezil I hit another assert. Is this a known problem?
|
I also hit an assert today. zfs version: 0.7.0-rc1_73_g5cc78dc
|
@behlendorf Great! I'll see if would get the patch running.. I just hit this couple days ago, and might be related:
|
I had a machine hit what looks like a similar panic as the original post last night. Running latest git of spl/zfs and kernel 4.9.31: [52443.038324] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [52443.038345] IP: [] buf_hash_remove+0x65/0xb0 [zfs] [52443.038368] PGD 0 [52443.038370] [52443.038374] Oops: 0000 [#1] SMP [52443.038376] Modules linked in: dm_crypt algif_skcipher af_alg dm_mod ext4 crc16 jbd2 fscrypto mbcache loop vhost_net vhost macvtap macvlan tun ebtable_filter ebtables ip6_tables iptable_filter netconsole 8021q mrp bonding bridge stp llc msr uinput nct6775 hwmon_vid snd_hda_codec_hdmi eeepc_wmi asus_wmi sparse_keymap iTCO_wdt iTCO_vendor_support rfkill mxm_wmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel uvcvideo nls_iso8859_1 aesni_intel videobuf2_vmalloc aes_x86_64 videobuf2_memops lrw gf128mul videobuf2_v4l2 glue_helper videobuf2_core ablk_helper cryptd nls_cp437 snd_usb_audio videodev i2c_i801 vfat snd_usbmidi_lib intel_cstate fat pcspkr psmouse i2c_smbus mousedev media snd_rawmidi input_leds intel_rapl_perf snd_hda_intel led_class snd_seq_device evdev joydev snd_hda_codec mac_hid snd_hda_core snd_hwdep r8169 snd_pcm snd_timer mii snd soundcore e1000e lpc_ich ie31200_edac mei_me mei edac_core ptp shpchp fan pps_core thermal battery tpm_tis fjes tpm_tis_core wmi tpm video button sch_fq_codel usbip_host usbip_core sg nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables xfs libcrc32c crc32c_generic hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid sr_mod cdrom sd_mod uas usb_storage serio_raw atkbd libps2 ahci xhci_pci libahci ehci_pci xhci_hcd ehci_hcd libata crc32c_intel scsi_mod usbcore usb_common i8042 serio nvidia_drm(PO) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm nvidia_uvm(PO) nvidia_modeset(PO) nvidia(PO) zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) [52443.038685] CPU: 2 PID: 75 Comm: kswapd0 Tainted: P O 4.9.31-mgm #1 [52443.038688] Hardware name: System manufacturer System Product Name/P8Z77-V DELUXE, BIOS 2104 08/13/2013 [52443.038692] task: ffff880602bfce00 task.stack: ffffc900037e8000 [52443.038694] RIP: 0010:[] [] buf_hash_remove+0x65/0xb0 [zfs] [52443.038708] RSP: 0018:ffffc900037ebb78 EFLAGS: 00010207 [52443.038711] RAX: 00000000003e0776 RBX: ffff880600004000 RCX: 0000000000000000 [52443.038714] RDX: ffffc90009f04bb0 RSI: ffffffffa0260e40 RDI: ffff880600004000 [52443.038717] RBP: ffffffffa02e1a40 R08: ffff880600004010 R09: c82a532345e3bf21 [52443.038719] R10: 0000000000046ab6 R11: ffff88061ec97f20 R12: 0000000000000000 [52443.038721] R13: ffffffffa02e19c0 R14: 0000000000000001 R15: 0000000000000001 [52443.038724] FS: 0000000000000000(0000) GS:ffff88061ec80000(0000) knlGS:0000000000000000 [52443.038726] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [52443.038729] CR2: 0000000000000020 CR3: 0000000002807000 CR4: 00000000001426e0 [52443.038732] Stack: [52443.038735] ffffffffa00ffc85 0000000201314148 000000000001dd80 ffff8805ff1ca980 [52443.038748] 0000000000004000 ffff880600004000 0000000000000001 ffff88028c6a87b0 [52443.038758] ffffffffa01040ef 00000000005f2a70 0000000000000000 0000000500000003 [52443.038767] Call Trace: [52443.038777] [] ? arc_change_state.isra.13+0x195/0x3d0 [zfs] [52443.038786] [] ? arc_evict_state+0x60f/0x7c0 [zfs] [52443.038799] [] ? spa_get_random+0x34/0x60 [zfs] [52443.038804] [] ? mutex_lock+0x9/0x30 [52443.038813] [] ? arc_adjust+0x398/0x520 [zfs] [52443.038822] [] ? __arc_shrinker_func.isra.26+0xd2/0x140 [zfs] [52443.038832] [] ? arc_shrinker_func_scan_objects+0x1f/0x30 [zfs] [52443.038836] [] ? shrink_slab.part.6+0x1b4/0x2b0 [52443.038840] [] ? shrink_node+0x2d9/0x2f0 [52443.038843] [] ? kswapd+0x2b2/0x680 [52443.038846] [] ? mem_cgroup_shrink_node+0xb0/0xb0 [52443.038850] [] ? kthread+0xd2/0xf0 [52443.038853] [] ? kthread_park+0x50/0x50 [52443.038857] [] ? ret_from_fork+0x25/0x30 [52443.038860] Code: 89 ca 48 c1 ea 08 4c 31 d2 48 31 d0 48 8b 15 9b 3a 16 00 48 23 05 8c 3a 16 00 48 8d 14 c2 48 8b 0a 48 39 f9 75 05 eb 10 48 89 d1 <48> 8b 51 20 48 39 d7 75 f4 48 8d 51 20 48 8b 4f 20 48 89 0a 83 [52443.038972] RIP [] buf_hash_remove+0x65/0xb0 [zfs] [52443.038982] RSP [52443.038984] CR2: 0000000000000020 [52443.038989] ---[ end trace 51076e126836969e ]--- [52443.038992] Kernel panic - not syncing: Fatal exception |
I hit the previously mentioned BUG again: The system had almost two months of uptime after I hit #6220
|
I believe I'm seeing this with:
The culprit appears to have been the move from kernel 4.4.14 to 4.11.12 (Fedora 22 to 25). I rebuilt the zfs modules after I did a system upgrade, but if git advanced at all, it was very minor, and I see this bug dates back to October. I believe I should have the #5071 fix in my pull. I notice @tuomari was on 4.4.18, though it would be surprising if 4.4.14->18 would have had a big regression nobody noticed. I only have cache and log devices on my backups zpool, but corruption has been spread out to other pools with neither log nor cache (one SAS-SSD on LUKS, one SATA-rust on raw devices) so I don't know if this is really an L2ARC problem or if it's a symptom of some underlying problem that's corrupting all the pools. For now I removed cache and log and am saturating the pool with parallel rsyncs (100% utilization on the underlying devices) to see what happens (and to get a good backup in case I need to move the main drives). I did a big downtime last night to run extensive memory tests (never were any mcelog events), controller buffer tests, drive pattern tests, etc. and everything seems to check out, hardware-wise, so while it could be a hardware coincidence, the OS upgrade does appear to have been the trigger. I've literally had more ZoL corruption in the past few weeks (so far, always on zvols on raidz2, not files on mirrors) than in the previous five years.
|
We just encountered this issue with We have dozens of other machines (all on Google Cloud) running the exact same setup, this is the first occurrence:
|
Related ? https://lkml.org/lkml/2017/3/9/401 [PATCH 1/2] blk-mq: don't complete un-started request in timeout handler |
Looks like similar call trace I experienced on 0.7.1-1 with kernel 3.10 and 4.4. 3.10.103:
4.4.45:
I think that I have full scenario which leads to this issue so if any debug is necessary we can collaborate. |
As I mentioned I had full scenario which leads to this issue (connected with buf_hash_remove) so I enabled debug and experienced below call trace:
so seems that it is the same as in #6238 I tried fix f763c3d and issue did not occur again. |
Hi, there, I am looking at a backtrace very similar to several ones listed above, in a build off zfs-0.7.2 on centos 7:
The pool configuration is as follows:
It is not full:
The pool backing disks are Samsumg MZ7LM1T9HMJP0D3 1.9T SSDs, the L2ARC is an NVME S39YNX0J701360 Dell Express Flash PM1725a 800GB SFF. I see this fairly regularly with a VM running on KVM/qemu with two ZVOLs attached as raw disks with 'none' caching policy:
The memory is multi-bit ECC. I'll enable debugging and post more info. Does anyone know when this problem has been introduced ? |
P.S. The fix f763c3d is included in the build (included in 0.7.2). |
Actually, with more testing, it looks like f763c3d has addressed the issue. |
@bprotopopov thanks for confirming this. Shall we close this then? |
@behlendorf my stack trace looks very similar to o some of the above stack traces and to the top part of the traces at illumos https://www.illumos.org/issues/8648. The test provided by @loli10K for https://www.illumos.org/issues/8648 that reproduces the issue, passes, as well as my workload, so I feel it's reasonable to say this is addressed. |
Running linux 4.4.19 with following ZFS versions
I am currently replacing a faulty drive and my pool is set up as follows:
The text was updated successfully, but these errors were encountered: