-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vdev_disk_io_start NULL pointer dereference #9546
Comments
This commit suggests where to find the root cause: c12ea77
Running "fault/auto_offline_001_pos" under linux-4.18.0-80.11.2.el8_0.x86_64 / zfs-0.8.2:
/**
* blkg_tryget - try and get a blkg reference
* @blkg: blkg to get
*
* This is for use when doing an RCU lookup of the blkg. We may be in the midst
* of freeing this blkg, so we can only use it if the refcnt is not zero.
*/
static inline bool blkg_tryget(struct blkcg_gq *blkg)
{
return percpu_ref_tryget(&blkg->refcnt);
} Which is different from the mainline version of /**
* blkg_tryget - try and get a blkg reference
* @blkg: blkg to get
*
* This is for use when doing an RCU lookup of the blkg. We may be in the midst
* of freeing this blkg, so we can only use it if the refcnt is not zero.
*/
static inline bool blkg_tryget(struct blkcg_gq *blkg)
{
return blkg && percpu_ref_tryget(&blkg->refcnt);
} The following change seems to avoid the issue: root@linux:/usr/src/zfs# git diff
diff --git a/module/zfs/vdev_disk.c b/module/zfs/vdev_disk.c
index 46437f21f..0877db1b2 100644
--- a/module/zfs/vdev_disk.c
+++ b/module/zfs/vdev_disk.c
@@ -545,7 +545,7 @@ vdev_bio_associate_blkg(struct bio *bio)
ASSERT3P(q, !=, NULL);
ASSERT3P(bio->bi_blkg, ==, NULL);
- if (blkg_tryget(q->root_blkg))
+ if (q->root_blkg && blkg_tryget(q->root_blkg))
bio->bi_blkg = q->root_blkg;
}
#define bio_associate_blkg vdev_bio_associate_blkg |
@loli10K very nice work! Hopefully the Centos 8 folks will backport the kernel change so 0.8.2 will work on it. No matter what though, we should include your fix in 0.8.3. |
Absolutely, @loli10K would you mind opening a PR with the proposed change so we can get it merged to master. |
Will open PR later today |
blkg_tryget() as shipped in EL8 kernels does not seem to handle NULL @blkg as input; this is different from its mainline counterpart where NULL is accepted. To prevent dereferencing a NULL pointer when dealing with block devices which do not set a root_blkg on the request queue perform the NULL check in vdev_bio_associate_blkg(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #9546 Closes #9577
FYI - I can confirm that |
blkg_tryget() as shipped in EL8 kernels does not seem to handle NULL @blkg as input; this is different from its mainline counterpart where NULL is accepted. To prevent dereferencing a NULL pointer when dealing with block devices which do not set a root_blkg on the request queue perform the NULL check in vdev_bio_associate_blkg(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes openzfs#9546 Closes openzfs#9577
blkg_tryget() as shipped in EL8 kernels does not seem to handle NULL @blkg as input; this is different from its mainline counterpart where NULL is accepted. To prevent dereferencing a NULL pointer when dealing with block devices which do not set a root_blkg on the request queue perform the NULL check in vdev_bio_associate_blkg(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes openzfs#9546 Closes openzfs#9577
blkg_tryget() as shipped in EL8 kernels does not seem to handle NULL @blkg as input; this is different from its mainline counterpart where NULL is accepted. To prevent dereferencing a NULL pointer when dealing with block devices which do not set a root_blkg on the request queue perform the NULL check in vdev_bio_associate_blkg(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #9546 Closes #9577
System information
Describe the problem you're observing
I'm hitting a NULL pointer dereference while testing Centos 8 0.8.2 packages on RHEL 8 (for #9287). I noticed my build slave unexpectedly lost contact, so I re-ran the test and it lost contact again. I examined the system logs in AWS and saw the NULL pointer dereference. Since I hit this twice, I'm guessing
Describe how to reproduce the problem
Run test suite using experimental Centos 8 RPMs on RHEL 8.
Include any warning/errors/backtraces from the system logs
Here's the last couple lines of the test log:
The text was updated successfully, but these errors were encountered: