Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdev_disk_io_start NULL pointer dereference #9546

Closed
tonyhutter opened this issue Nov 1, 2019 · 5 comments · Fixed by #9577
Closed

vdev_disk_io_start NULL pointer dereference #9546

tonyhutter opened this issue Nov 1, 2019 · 5 comments · Fixed by #9577
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@tonyhutter
Copy link
Contributor

System information

Type Version/Name
Distribution Name RHEL
Distribution Version 8
Linux Kernel 4.18.0-80.11.2.el8_0.x86_64
Architecture x86-64
ZFS Version 0.8.2
SPL Version 0.8.2

Describe the problem you're observing

I'm hitting a NULL pointer dereference while testing Centos 8 0.8.2 packages on RHEL 8 (for #9287). I noticed my build slave unexpectedly lost contact, so I re-ran the test and it lost contact again. I examined the system logs in AWS and saw the NULL pointer dereference. Since I hit this twice, I'm guessing

Describe how to reproduce the problem

Run test suite using experimental Centos 8 RPMs on RHEL 8.

Include any warning/errors/backtraces from the system logs

[ 7358.980661] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[ 7362.954494] scsi 2:0:0:0: Direct-Access     Linux    scsi_debug       0188 PQ: 0 ANSI: 7
[ 7362.961477] sd 2:0:0:0: Power-on or device reset occurred
[ 7362.967940] sd 2:0:0:0: Attached scsi generic sg0 type 0
[ 7362.970196] sd 2:0:0:0: [sda] 196608 512-byte logical blocks: (101 MB/96.0 MiB)
[ 7362.984747] sd 2:0:0:0: [sda] Write Protect is off
[ 7362.993955] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 7363.027492]  sda: sda1 sda9
[ 7363.053953] sd 2:0:0:0: [sda] Attached SCSI disk
[ 7377.950301]  sda:
[ 7378.171287]  sda: sda1 sda9
[ 7382.765523] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[ 7383.255367] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[ 7383.263660] PGD 0 P4D 0 
[ 7383.267867] Oops: 0000 [#1] SMP PTI
[ 7383.274420] CPU: 0 PID: 15212 Comm: z_wr_iss Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-80.11.2.el8_0.x86_64 #1
[ 7383.288669] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[ 7383.295487] RIP: 0010:vdev_disk_io_start+0x2cc/0xad0 [zfs]
[ 7383.301742] Code: 74 24 48 48 89 50 08 49 8b 44 24 18 0f b6 56 6c 88 50 1b 49 8b 4c 24 18 48 8b 41 08 48 8b b0 20 05 00 00 48 8b be 78 04 00 00 <48> 8b 47 48 a8 03 0f 85 99 04 00 00 65 48 ff 00 48 8b 86 78 04 00
[ 7383.319804] RSP: 0000:ffffaa3784f6b9f0 EFLAGS: 00010246
[ 7383.325769] RAX: ffff997b244c4000 RBX: 0000000000000600 RCX: ffff997b21e95900
[ 7383.333116] RDX: 0000000000000001 RSI: ffff997af9115580 RDI: 0000000000000000
[ 7383.340480] RBP: ffffaa3784f6bab8 R08: ffff997b21e95900 R09: ffff9979fd9df180
[ 7383.347551] R10: ffff997a47c07100 R11: 0000000001700000 R12: ffff9979fd9df180
[ 7383.355097] R13: 0000000000000600 R14: ffff9979fd9df180 R15: ffff997a944c36a8
[ 7383.362516] FS:  0000000000000000(0000) GS:ffff997b28000000(0000) knlGS:0000000000000000
[ 7383.370545] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7383.377107] CR2: 0000000000000048 CR3: 00000000b580a002 CR4: 00000000001606f0
[ 7383.384677] Call Trace:
[ 7383.388727]  ? vdev_queue_io+0x18f/0x230 [zfs]
[ 7383.394184]  zio_vdev_io_start+0xf4/0x2f0 [zfs]
[ 7383.399852]  zio_nowait+0xc3/0x160 [zfs]
[ 7383.405170]  ? vdev_config_sync+0x230/0x230 [zfs]
[ 7383.410584]  vdev_mirror_io_start+0x97/0x180 [zfs]
[ 7383.415514]  zio_vdev_io_start+0xf4/0x2f0 [zfs]
[ 7383.420306]  ? vdev_config_sync+0x230/0x230 [zfs]
[ 7383.425318]  zio_nowait+0xc3/0x160 [zfs]
[ 7383.429817]  ? vdev_config_sync+0x230/0x230 [zfs]
[ 7383.434840]  vdev_mirror_io_start+0x97/0x180 [zfs]
[ 7383.439783]  zio_vdev_io_start+0xf4/0x2f0 [zfs]
[ 7383.444451]  ? vdev_config_sync+0x230/0x230 [zfs]
[ 7383.449324]  zio_nowait+0xc3/0x160 [zfs]
[ 7383.453697]  ? vdev_config_sync+0x230/0x230 [zfs]
[ 7383.458556]  vdev_mirror_io_start+0x97/0x180 [zfs]
[ 7383.463369]  ? spa_config_enter+0xba/0x100 [zfs]
[ 7383.468166]  zio_vdev_io_start+0x269/0x2f0 [zfs]
[ 7383.472960]  ? tsd_get_by_thread+0x2a/0x40 [spl]
[ 7383.478109]  zio_execute+0x90/0xf0 [zfs]
[ 7383.482700]  taskq_thread+0x2e5/0x530 [spl]
[ 7383.487192]  ? wake_up_q+0x70/0x70
[ 7383.491081]  ? zio_taskq_member.isra.8.constprop.13+0x70/0x70 [zfs]
[ 7383.496832]  ? taskq_destroy+0x180/0x180 [spl]
[ 7383.501566]  kthread+0x112/0x130
[ 7383.505558]  ? kthread_bind+0x30/0x30
[ 7383.509556]  ret_from_fork+0x35/0x40
[ 7383.513529] Modules linked in: scsi_debug sd_mod sg loop zfs(POE) icp(POE) zlua(POE) zcommon(POE) zunicode(POE) znvpair(POE) zavl(POE) spl(OE) ext4 mbcache jbd2 cirrus ttm drm_kms_helper intel_rapl syscopyarea sysfillrect sb_edac sysimgblt fb_sys_fops drm intel_rapl_perf pcspkr i2c_piix4 xfs libcrc32c ata_generic ata_piix crct10dif_pclmul crc32_pclmul libata crc32c_intel xen_netfront xen_blkfront ghash_clmulni_intel serio_raw dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_debug]
[ 7383.554663] CR2: 0000000000000048

Here's the last couple lines of the test log:

Test: /usr/share/zfs/zfs-tests/tests/functional/exec/exec_002_neg (run as root) [00:00] [PASS]
23:11:42.35 ASSERTION: Setting exec=off on a filesystem, processes can not be executed  from this file system.
23:11:42.35 SUCCESS: cp /var/tmp/constrained_path.PFXF/ls /mnt/testdir/myls
23:11:42.38 SUCCESS: zfs set exec=off testpool/testfs
23:11:42.39 SUCCESS: exec_n_check 126 /mnt/testdir/myls
23:11:42.39 SUCCESS: exec_n_check 1 mmap_exec /mnt/testdir/myls
23:11:42.39 Setting exec=off on filesystem testing passed.
23:11:42.39 NOTE: Performing local cleanup via log_onexit (cleanup)
23:11:42.40 SUCCESS: rm /mnt/testdir/myls
Test: /usr/share/zfs/zfs-tests/tests/functional/exec/cleanup (run as root) [00:00] [PASS]
23:11:42.55 SUCCESS: zpool destroy -f testpool
23:11:42.57 SUCCESS: rm -rf /mnt/testdir
Test: /usr/share/zfs/zfs-tests/tests/functional/fault/setup (run as root) [00:00] [PASS]
23:11:42.69 SUCCESS: mkdir /var/tmp/zed
23:11:42.70 SUCCESS: touch /var/tmp/zed/vdev_id.conf
23:11:42.70 SUCCESS: ln -s /var/tmp/zed/vdev_id.conf /etc/zfs/vdev_id.conf
23:11:42.71 SUCCESS: cp /etc/zfs/zed.d/zed.rc /var/tmp/zed
23:11:42.71 SUCCESS: cp /etc/zfs/zed.d/zed-functions.sh /var/tmp/zed
23:11:42.72 SUCCESS: umask 0022
23:11:42.72 SUCCESS: cp /usr/libexec/zfs/zed.d/resilver_finish-start-scrub.sh /var/tmp/zed
23:11:42.73 SUCCESS: umask 0022
23:11:42.73 SUCCESS: sed -i /\#ZED_DEBUG_LOG=.*/d /var/tmp/zed/zed.rc
23:11:42.74 NOTE: Starting ZED
23:11:42.74 SUCCESS: truncate -s 0 /var/tmp/zed/zed.debug.log
23:11:42.75 SUCCESS: eval zed -vF -d /var/tmp/zed -p /var/tmp/zed/zed.pid -P /var/tmp/constrained_path.PFXF -s /var/tmp/zed/state 2>/var/tmp/zed/zed.log &
@behlendorf behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label Nov 2, 2019
@loli10K
Copy link
Contributor

loli10K commented Nov 3, 2019

This commit suggests where to find the root cause: c12ea77

Not all block devices, notably scsi_debug, set a root_blkg on the request queue.

Running "fault/auto_offline_001_pos" under linux-4.18.0-80.11.2.el8_0.x86_64 / zfs-0.8.2:

Thread 171 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 15054]
blkg_tryget (blkg=0x0 <irq_stack_union>) at ./include/linux/blk-cgroup.h:511
511	./include/linux/blk-cgroup.h: No such file or directory.
(gdb) bt
#0  blkg_tryget (blkg=0x0 <irq_stack_union>) at ./include/linux/blk-cgroup.h:511
#1  0xffffffffc0936dff in vdev_bio_associate_blkg (bio=0xffff909452123c00) at /usr/src/zfs/module/zfs/vdev_disk.c:548
#2  0xffffffffc0937171 in __vdev_disk_physio (bdev=0xffff90936877a1c0, zio=0xffff909359ce31b0, io_size=1536, io_offset=4271616, rw=1, flags=1792) at /usr/src/zfs/module/zfs/vdev_disk.c:652
#3  0xffffffffc0937b8f in vdev_disk_io_start (zio=0xffff909359ce31b0) at /usr/src/zfs/module/zfs/vdev_disk.c:846
#4  0xffffffffc09b4e21 in zio_vdev_io_start (zio=0xffff909359ce31b0) at /usr/src/zfs/module/zfs/zio.c:3755
#5  0xffffffffc09af579 in __zio_execute (zio=0xffff909359ce31b0) at /usr/src/zfs/module/zfs/zio.c:2087
#6  zio_nowait (zio=0xffff909359ce31b0) at /usr/src/zfs/module/zfs/zio.c:2161
#7  0xffffffffc0944772 in vdev_mirror_io_start (zio=0xffff909359ce5970) at /usr/src/zfs/module/zfs/vdev_mirror.c:618
#8  0xffffffffc09b4e21 in zio_vdev_io_start (zio=0xffff909359ce5970) at /usr/src/zfs/module/zfs/zio.c:3755
#9  0xffffffffc09af579 in __zio_execute (zio=0xffff909359ce5970) at /usr/src/zfs/module/zfs/zio.c:2087
#10 zio_nowait (zio=0xffff909359ce5970) at /usr/src/zfs/module/zfs/zio.c:2161
#11 0xffffffffc0944772 in vdev_mirror_io_start (zio=0xffff909359ce7740) at /usr/src/zfs/module/zfs/vdev_mirror.c:618
#12 0xffffffffc09b4e21 in zio_vdev_io_start (zio=0xffff909359ce7740) at /usr/src/zfs/module/zfs/zio.c:3755
#13 0xffffffffc09af579 in __zio_execute (zio=0xffff909359ce7740) at /usr/src/zfs/module/zfs/zio.c:2087
#14 zio_nowait (zio=0xffff909359ce7740) at /usr/src/zfs/module/zfs/zio.c:2161
#15 0xffffffffc0944772 in vdev_mirror_io_start (zio=0xffff909359ce09f0) at /usr/src/zfs/module/zfs/vdev_mirror.c:618
#16 0xffffffffc09b50a2 in zio_vdev_io_start (zio=0xffff909359ce09f0) at /usr/src/zfs/module/zfs/zio.c:3645
#17 0xffffffffc09aedd4 in __zio_execute (zio=0xffff909359ce09f0) at /usr/src/zfs/module/zfs/zio.c:2087
#18 zio_execute (zio=0x0 <irq_stack_union>) at /usr/src/zfs/module/zfs/zio.c:2000
#19 0xffffffffc04e7492 in taskq_thread (args=0xffff909446255780) at /usr/src/zfs/module/spl/spl-taskq.c:927
#20 0xffffffffa0ece683 in ?? ()
#21 0xffff90935b519c98 in ?? ()
#22 0xffffffffa0ece570 in ?? ()
#23 0x0000000000000000 in ?? ()
(gdb) up
#1  0xffffffffc0936dff in vdev_bio_associate_blkg (bio=0xffff909452123c00) at /usr/src/zfs/module/zfs/vdev_disk.c:548
548		if (blkg_tryget(q->root_blkg))
(gdb) p q->root_blkg
$1 = (struct blkcg_gq *) 0x0 <irq_stack_union>

blkg_tryget() source according to the el8 kernel src.rpm:

/**
 * blkg_tryget - try and get a blkg reference
 * @blkg: blkg to get
 *
 * This is for use when doing an RCU lookup of the blkg.  We may be in the midst
 * of freeing this blkg, so we can only use it if the refcnt is not zero.
 */
static inline bool blkg_tryget(struct blkcg_gq *blkg)
{
        return percpu_ref_tryget(&blkg->refcnt);
}

Which is different from the mainline version of blkg_tryget():

/**
 * blkg_tryget - try and get a blkg reference
 * @blkg: blkg to get
 *
 * This is for use when doing an RCU lookup of the blkg.  We may be in the midst
 * of freeing this blkg, so we can only use it if the refcnt is not zero.
 */
static inline bool blkg_tryget(struct blkcg_gq *blkg)
{
	return blkg && percpu_ref_tryget(&blkg->refcnt);
}

The following change seems to avoid the issue:

root@linux:/usr/src/zfs# git diff
diff --git a/module/zfs/vdev_disk.c b/module/zfs/vdev_disk.c
index 46437f21f..0877db1b2 100644
--- a/module/zfs/vdev_disk.c
+++ b/module/zfs/vdev_disk.c
@@ -545,7 +545,7 @@ vdev_bio_associate_blkg(struct bio *bio)
        ASSERT3P(q, !=, NULL);
        ASSERT3P(bio->bi_blkg, ==, NULL);
 
-       if (blkg_tryget(q->root_blkg))
+       if (q->root_blkg && blkg_tryget(q->root_blkg))
                bio->bi_blkg = q->root_blkg;
 }
 #define        bio_associate_blkg vdev_bio_associate_blkg

@tonyhutter
Copy link
Contributor Author

@loli10K very nice work! Hopefully the Centos 8 folks will backport the kernel change so 0.8.2 will work on it. No matter what though, we should include your fix in 0.8.3.

@behlendorf
Copy link
Contributor

Absolutely, @loli10K would you mind opening a PR with the proposed change so we can get it merged to master.

@loli10K
Copy link
Contributor

loli10K commented Nov 12, 2019

Will open PR later today

behlendorf pushed a commit that referenced this issue Nov 13, 2019
blkg_tryget() as shipped in EL8 kernels does not seem to handle NULL
@blkg as input; this is different from its mainline counterpart where
NULL is accepted.  To prevent dereferencing a NULL pointer when dealing
with block devices which do not set a root_blkg on the request queue
perform the NULL check in vdev_bio_associate_blkg().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9546 
Closes #9577
@tonyhutter
Copy link
Contributor Author

FYI - I can confirm that blkg_tryget() is NOT fixed in the RHEL 8.1 kernel (4.18.0-147.0.3.el8_1)

tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Dec 26, 2019
blkg_tryget() as shipped in EL8 kernels does not seem to handle NULL
@blkg as input; this is different from its mainline counterpart where
NULL is accepted.  To prevent dereferencing a NULL pointer when dealing
with block devices which do not set a root_blkg on the request queue
perform the NULL check in vdev_bio_associate_blkg().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#9546
Closes openzfs#9577
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Dec 27, 2019
blkg_tryget() as shipped in EL8 kernels does not seem to handle NULL
@blkg as input; this is different from its mainline counterpart where
NULL is accepted.  To prevent dereferencing a NULL pointer when dealing
with block devices which do not set a root_blkg on the request queue
perform the NULL check in vdev_bio_associate_blkg().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#9546
Closes openzfs#9577
tonyhutter pushed a commit that referenced this issue Jan 23, 2020
blkg_tryget() as shipped in EL8 kernels does not seem to handle NULL
@blkg as input; this is different from its mainline counterpart where
NULL is accepted.  To prevent dereferencing a NULL pointer when dealing
with block devices which do not set a root_blkg on the request queue
perform the NULL check in vdev_bio_associate_blkg().

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #9546
Closes #9577
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants