Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in spa_update_space #12380

Closed
jwk404 opened this issue Jul 16, 2021 · 6 comments
Closed

Panic in spa_update_space #12380

jwk404 opened this issue Jul 16, 2021 · 6 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@jwk404
Copy link
Contributor

jwk404 commented Jul 16, 2021

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 18.04
Kernel Version 5.4.0-77-dx2021071302-6908b31aa-generic
Architecture x86_64
OpenZFS Version zfs-2.1.99-delphix+2021.07.13.22-83d8d85

Describe the problem you're observing

Panic during the device removal tests.

Describe how to reproduce the problem

The panic is intermittent, but we've seen it twice since yesterday.

Include any warning/errors/backtraces from the system logs

[12381.984418] BUG: kernel NULL pointer dereference, address: 0000000000000088
[12381.985588] #PF: supervisor read access in kernel mode
[12381.986529] #PF: error_code(0x0000) - not-present page
[12381.987379] PGD 0 P4D 0
[12381.988022] Oops: 0000 [#1] SMP NOPTI
[12381.988872] CPU: 1 PID: 86206 Comm: vdev_load Kdump: loaded Tainted: P           OE     5.4.0-77-dx2021071302-6908b31aa-generic #86~18.04.1
[12381.991152] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[12381.993416] RIP: 0010:spa_update_dspace+0x79/0xc0 [zfs]
[12381.994635] Code: e8 5c eb ff ff 48 8b 83 58 0b 00 00 48 89 df 48 8b 30 e8 1a a3 00 00 48 89 df 49 89 c4 e8 cf e7 ff ff 49 8b 94 24 80 2b 00 00 <48> 39 82 88 00 00 00 74 19 48 c7 c2 50 90 89 c0 be 40 00 00 00 48
[12381.998136] RSP: 0018:ffffbac20197bbd0 EFLAGS: 00010206
[12381.999058] RAX: ffff9c6f57beac00 RBX: ffff9c6fd9a28000 RCX: 0000000000000000
[12382.000237] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9c6fd9a28000
[12382.001474] RBP: ffffbac20197bbe0 R08: 0000000000000000 R09: ffff9c7024c06840
[12382.002665] R10: ffff9c6f71df8000 R11: 0000000000000008 R12: ffff9c700f530000
[12382.003890] R13: ffff9c6fd9a28000 R14: ffff9c700e575800 R15: 0000000000000000
[12382.005038] FS:  0000000000000000(0000) GS:ffff9c7025040000(0000) knlGS:0000000000000000
[12382.006416] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12382.007343] CR2: 0000000000000088 CR3: 0000000181420005 CR4: 00000000001606e0
[12382.009046] Call Trace:
[12382.009696]  spa_get_slop_space+0x8e/0x90 [zfs]
[12382.011061]  metaslab_sync_done+0x8a/0x430 [zfs]
[12382.012429]  metaslab_init+0x36e/0x3c0 [zfs]
[12382.013631]  vdev_metaslab_init+0x112/0x3d0 [zfs]
[12382.014883]  vdev_load+0x25c/0x500 [zfs]
[12382.015988]  ? __schedule+0x2ed/0x6f0
[12382.017064]  vdev_load_child+0x12/0x20 [zfs]
[12382.018238]  taskq_thread+0x23b/0x430 [spl]
[12382.019353]  ? wake_up_q+0x80/0x80
[12382.020333]  kthread+0x120/0x140
[12382.021278]  ? param_set_taskq_kick+0xf0/0xf0 [spl]
[12382.022473]  ? kthread_park+0x90/0x90
[12382.023488]  ret_from_fork+0x1f/0x40
[12382.024480] Modules linked in: iscsi_target_mod target_core_mod vsock_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag aufs overlay vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel binfmt_misc crypto_simd cryptd glue_helper rapl joydev input_leds serio_raw vmw_vmci mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core nf_tables nfnetlink iscsi_tcp libiscsi_tcp libiscsi nfsd auth_rpcgss nfs_acl lockd scsi_transport_iscsi grace sunrpc ip_tables x_tables autofs4 zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear vmwgfx psmouse ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops vmxnet3 drm vmw_pvscsi i2c_piix4 pata_acpi [last unloaded: scsi_debug]

I believe it's in this section of spa_update_dspace():

                if (vd->vdev_mg->mg_class == spa_normal_class(spa)) {
                        spa->spa_dspace -= spa_deflate(spa) ?
                            vd->vdev_stat.vs_dspace : vd->vdev_stat.vs_space;
                } 

when we try to dereference mg_class at the start of the if block.

@jwk404 jwk404 added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jul 16, 2021
@behlendorf
Copy link
Contributor

@rincebrain it looks like commit 1325434, PR #12271, introduced this panic. Mind taking a look? If we can sort it out fairly quickly that's great, otherwise we should consider reverting it for now.

@rincebrain
Copy link
Contributor

Ruh-roh, I'll see about reproducing on my 18.04 box.

I don't recognize that kernel version - is that a periodic snapshot of some git tree?

@jwk404
Copy link
Contributor Author

jwk404 commented Jul 22, 2021

That kernel version is an internal representation used at Delphix. It was based on 1325434 running on a modified version of 18.04.

@rincebrain
Copy link
Contributor

I have a fix, let me just run it through some more testing and I'll cut a PR...

@behlendorf
Copy link
Contributor

@rincebrain any update on this fix? Not to rush you, but this issue is causing a fair bit of heartburn for the CI so it'd be nice to either fix it, or revert 1325434 until we do have a complete fix.

@rincebrain
Copy link
Contributor

Oh sorry, I thought the CI wasn't hitting it, which rather surprised me with how often I hit it on my specific VM for it.

My fix is the trivial one, just NULL checking before the dereference, I was just trying to confirm there wasn't anything exciting lurking from doing it either in "a different crash" sense or the "nonfatal unwanted behavior" sense, but I haven't seen it do anything obviously interesting yet in loops of just the removal tests or all of -T functional, so I'll just cut the PR.

rincebrain added a commit to rincebrain/zfs that referenced this issue Jul 24, 2021
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Closes openzfs#12380

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
rincebrain added a commit to rincebrain/zfs that referenced this issue Jul 24, 2021
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Closes openzfs#12380

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
rincebrain added a commit to rincebrain/zfs that referenced this issue Jul 24, 2021
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Closes openzfs#12380

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
rincebrain added a commit to rincebrain/zfs that referenced this issue Jul 26, 2021
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Closes openzfs#12380

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Aug 23, 2021
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12380 
Closes openzfs#12428
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Aug 24, 2021
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12380 
Closes openzfs#12428
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Aug 24, 2021
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12380 
Closes openzfs#12428
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Aug 24, 2021
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12380
Closes openzfs#12428
rincebrain added a commit to rincebrain/zfs that referenced this issue Aug 26, 2021
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12380 
Closes openzfs#12428
behlendorf pushed a commit that referenced this issue Aug 30, 2021
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12380 
Closes #12428
behlendorf pushed a commit that referenced this issue Aug 31, 2021
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12380
Closes #12428
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 15, 2021
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12380
Closes openzfs#12428
tonyhutter pushed a commit that referenced this issue Sep 22, 2021
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12380 
Closes #12428
datacore-rm pushed a commit to DataCoreSoftware/openzfs that referenced this issue Oct 4, 2022
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12380 
Closes openzfs#12428
datacore-rm pushed a commit to DataCoreSoftware/openzfs that referenced this issue Oct 13, 2022
After 1325434, we can in certain circumstances end up calling
spa_update_dspace with vd->vdev_mg NULL, which ends poorly during
vdev removal.

So let's not do that further space adjustment when we can't.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12380 
Closes openzfs#12428
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants