Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic reporting checksum error from zio_done #11872

Closed
PaulZ-98 opened this issue Apr 9, 2021 · 1 comment · Fixed by #11896
Closed

panic reporting checksum error from zio_done #11872

PaulZ-98 opened this issue Apr 9, 2021 · 1 comment · Fixed by #11896
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@PaulZ-98
Copy link
Contributor

PaulZ-98 commented Apr 9, 2021

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 16.04
Linux Kernel 4.15.0-54-generic
Architecture x86_64
ZFS Version 0.8.4
SPL Version

Describe the problem you're observing

Pool has multiple mirror pair vdevs. In zio_done for a ZIO_TYPE_FREE of a gang block, with io_error == 0, the code for reporting a checksum error is reached, and because it's the free of a gang block, io_abd is NULL, and the code to abd_copy for reporting purposes dereferences the NULL pointer.

Describe how to reproduce the problem

Hard to reproduce, but we have a server with checksum errors in gang blocks, but the zio succeeds because of good copies on the other mirror half.

Include any warning/errors/backtraces from the system logs

Instrumented ZFS to print info about the zio.

gbe2-bfyii-359 ntpd[5208]: Soliciting pool server 206.201.138.15
gbe2-bfyii-359 kernel: [43100.888915] NOTICE: zio: io_type 3 io_orig_abd           (null) io_size 130560 io_orig_size 130560
gbe2-bfyii-359 kernel: [43100.888915] 
gbe2-bfyii-359 kernel: [43100.888919] NOTICE: zio_done: io_abd           (null) adata 0000000080760d63 asize 131072 psize 130560 align 4096
gbe2-bfyii-359 kernel: [43100.888919] 
gbe2-bfyii-359 kernel: [43100.888922] NOTICE: zio: io_bookmark objset 0 object 0 level 0 blkid 0
gbe2-bfyii-359 kernel: [43100.888922] 
gbe2-bfyii-359 kernel: [43100.888934] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
gbe2-bfyii-359 kernel: [43100.896990] IP: abd_verify+0x9/0x270 [zfs]
gbe2-bfyii-359 kernel: [43100.901112] PGD 0 P4D 0 
gbe2-bfyii-359 kernel: [43100.903669] Oops: 0000 [#1] SMP PTI
gbe2-bfyii-359 kernel: [43100.907186] Modules linked in: ipt_REJECT nf_reject_ipv4 kcare(OE) dm_crypt algif_skcipher af_alg mptctl mptbase aufs overlay xt_set ip_set_hash_net ip_set_hash_ip ip_set nfnetlink xt_multiport xt_conntrack iptable_filter bonding ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_CHECKSUM xt_tcpudp binfmt_misc iptable_mangle ip_tables x_tables nls_iso8859_1 zfs(POE) zunicode(POE) zlua(POE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc joydev input_leds aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ipmi_ssif acpi_pad zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) ipmi_si ipmi_devintf ipmi_msghandler autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
2021-04-02T06:33:52.191407+00:00 kern.warning gbe2-bfyii-359 kernel: [43100.978684]  libcrc32c raid0 multipath linear raid1 hid_generic usbhid hid ast drm_kms_helper syscopyarea igb sysfillrect sysimgblt fb_sys_fops ttm dca mpt3sas ahci ptp drm raid_class libahci pps_core scsi_transport_sas i2c_algo_bit
gbe2-bfyii-359 kernel: [43100.999404] CPU: 26 PID: 27265 Comm: z_fr_iss Tainted: P           OE    4.15.0-54-generic #58~16.04.1-Ubuntu
gbe2-bfyii-359 kernel: [43101.009361] Hardware name: Supermicro X10DRH LN4/X10DRH-CLN4, BIOS 2.0 01/30/2016
gbe2-bfyii-359 kernel: [43101.016946] RIP: 0010:abd_verify+0x9/0x270 [zfs]
gbe2-bfyii-359 kernel: [43101.021596] RSP: 0018:ffffac438ed03c10 EFLAGS: 00010286
gbe2-bfyii-359 kernel: [43101.026854] RAX: 0000000000000001 RBX: ffffffffc1012290 RCX: 0000000000000000
gbe2-bfyii-359 kernel: [43101.034031] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
gbe2-bfyii-359 kernel: [43101.041206] RBP: ffffac438ed03c28 R08: 0000000000000027 R09: 0000000000000027
gbe2-bfyii-359 kernel: [43101.048380] R10: 0000000000001000 R11: 0000000000000961 R12: 0000000000000000
gbe2-bfyii-359 kernel: [43101.055557] R13: 0000000000000000 R14: ffff981a72159068 R15: 000000000001fe00
gbe2-bfyii-359 kernel: [43101.062735] FS:  0000000000000000(0000) GS:ffff98277f480000(0000) knlGS:0000000000000000
gbe2-bfyii-359 kernel: [43101.070867] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
gbe2-bfyii-359 kernel: [43101.076643] CR2: 0000000000000004 CR3: 00000036d980a005 CR4: 00000000003606e0
gbe2-bfyii-359 kernel: [43101.083818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
gbe2-bfyii-359 kernel: [43101.090994] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
gbe2-bfyii-359 kernel: [43101.098171] Call Trace:
gbe2-bfyii-359 kernel: [43101.100704]  ? abd_copy_from_buf_off_cb+0x30/0x30 [zfs]
gbe2-bfyii-359 kernel: [43101.106027]  abd_iterate_func2+0x50/0x210 [zfs]
gbe2-bfyii-359 kernel: [43101.110650]  ? abd_alloc_pages.constprop.10+0x322/0x390 [zfs]
gbe2-bfyii-359 kernel: [43101.116453]  ? cmn_err+0x4e/0x70 [spl]
gbe2-bfyii-359 kernel: [43101.120295]  abd_copy_off+0x17/0x20 [zfs]
gbe2-bfyii-359 kernel: [43101.124459]  abd_copy+0x10/0x20 [zfs]
gbe2-bfyii-359 kernel: [43101.128266]  zio_done+0x25d/0x1920 [zfs]
gbe2-bfyii-359 kernel: [43101.132332]  ? __raw_spin_unlock+0x9/0x10 [zfs]
gbe2-bfyii-359 kernel: [43101.138709]  ? zio_ready+0x24f/0x670 [zfs]
gbe2-bfyii-359 kernel: [43101.144651]  zio_execute+0xeb/0x230 [zfs]
gbe2-bfyii-359 kernel: [43101.150376]  taskq_thread+0x24f/0x4c0 [spl]
gbe2-bfyii-359 kernel: [43101.156244]  ? wake_up_q+0x70/0x70
gbe2-bfyii-359 kernel: [43101.161420]  ? zio_execute_stack_check+0x10/0x10 [zfs]
gbe2-bfyii-359 kernel: [43101.168206]  kthread+0x105/0x140
gbe2-bfyii-359 kernel: [43101.173054]  ? taskq_thread_spawn+0x50/0x50 [spl]
gbe2-bfyii-359 kernel: [43101.179361]  ? kthread_destroy_worker+0x50/0x50
gbe2-bfyii-359 kernel: [43101.185493]  ret_from_fork+0x35/0x40
gbe2-bfyii-359 kernel: [43101.190639] Code: 29 e0 4c 39 e8 4c 0f 46 e8 4c 89 6b 08 48 8b 3a e8 3d fd ff ff 48 89 c7 e8 25 fa ff ff e9 73 ff ff ff 55 48 89 e5 41 55 41 54 53 <44> 8b 47 04 49 89 fc 45 85 c0 0f 84 2a 01 00 00 49 81 f8 00 00 
gbe2-bfyii-359 kernel: [43101.212812] RIP: abd_verify+0x9/0x270 [zfs] RSP: ffffac438ed03c10
gbe2-bfyii-359 kernel: [43101.220505] CR2: 0000000000000004

In another instance I instrumented the code to print the io_bp.

2021-04-07T23:16:34.025280+00:00 kern.notice gbe2-bfyii-359 kernel: [120372.854702] NOTICE: zio: io_type 3 io_orig_abd           (null) io_size 127488 io_orig_size 127488
2021-04-07T23:16:34.025324+00:00 kern.notice gbe2-bfyii-359 kernel: [120372.854702] DVA[0]=<3:aa77743b000:22000> DVA[1]=<4:30bcce60000:1000> [L0 ZFS plain file] fletcher4 uncompressed unencrypted LE gang unique single size=1f200L/1f200P birth=8206136L/8
206136P
2021-04-07T23:16:34.025328+00:00 kern.notice gbe2-bfyii-359 kernel: [120372.854707] NOTICE: zio_done: io_abd           (null) adata 000000003539ef6b asize 131072 psize 127488 align 4096
2021-04-07T23:16:34.025330+00:00 kern.notice gbe2-bfyii-359 kernel: [120372.854707] 
2021-04-07T23:16:34.025333+00:00 kern.notice gbe2-bfyii-359 kernel: [120372.854710] NOTICE: zio: io_bookmark objset 0 object 0 level 0 blkid 0
2021-04-07T23:16:34.025358+00:00 kern.notice gbe2-bfyii-359 kernel: [120372.854710] 
2021-04-07T23:16:34.025362+00:00 kern.alert gbe2-bfyii-359 kernel: [120372.854721] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
2021-04-07T23:16:34.025364+00:00 kern.alert gbe2-bfyii-359 kernel: [120372.862835] IP: abd_verify+0x9/0x270 [zfs]
@PaulZ-98 PaulZ-98 added Type: Defect Incorrect behavior (e.g. crash, hang) Status: Triage Needed New issue which needs to be triaged labels Apr 9, 2021
@PaulZ-98
Copy link
Contributor Author

PaulZ-98 commented Apr 9, 2021

For now I have changed zio_done and I'm letting it run.

        /*
         * If the I/O on the transformed data was successful, generate any
         * checksum reports now while we still have the transformed data.
         */
        if (zio->io_error == 0) {
                while (zio->io_cksum_report != NULL) {
                        zio_cksum_report_t *zcr = zio->io_cksum_report;
                        uint64_t align = zcr->zcr_align;
                        uint64_t asize = P2ROUNDUP(psize, align);
                        /* possible NULL io_abd for gang block */
                        abd_t *adata = zio->io_abd;

         ==>           if (adata != NULL && asize != psize) {
                                adata = abd_alloc(asize, B_TRUE);
                                abd_copy(adata, zio->io_abd, psize);
                                abd_zero_off(adata, psize, asize - psize);
                        }

                        zio->io_cksum_report = zcr->zcr_next;
                        zcr->zcr_next = NULL;
                        zcr->zcr_finish(zcr, adata);
                        zfs_ereport_free_checksum(zcr);

      ==>              if (adata != NULL && asize != psize)
                                abd_free(adata);
                }
        }

PaulZ-98 added a commit to datto/zfs that referenced this issue Apr 13, 2021
error for gang block in zio_done.

Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Fixes openzfs#11872
PaulZ-98 added a commit to datto/zfs that referenced this issue Apr 13, 2021
checksum error for gang block in zio_done.

Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Fixes openzfs#11872
PaulZ-98 added a commit to datto/zfs that referenced this issue Apr 13, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Fixes openzfs#11872
behlendorf pushed a commit that referenced this issue Apr 16, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #11872
Closes #11896
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Apr 21, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes openzfs#11872
Closes openzfs#11896
ghost pushed a commit to truenas/zfs that referenced this issue May 6, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes openzfs#11872
Closes openzfs#11896
ghost pushed a commit to truenas/zfs that referenced this issue May 6, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes openzfs#11872
Closes openzfs#11896
ghost pushed a commit to truenas/zfs that referenced this issue May 6, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes openzfs#11872
Closes openzfs#11896
ghost pushed a commit to truenas/zfs that referenced this issue May 7, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes openzfs#11872
Closes openzfs#11896
ghost pushed a commit to truenas/zfs that referenced this issue May 10, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes openzfs#11872
Closes openzfs#11896
ghost pushed a commit to truenas/zfs that referenced this issue May 10, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes openzfs#11872
Closes openzfs#11896
ghost pushed a commit to truenas/zfs that referenced this issue May 10, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes openzfs#11872
Closes openzfs#11896
ghost pushed a commit to truenas/zfs that referenced this issue May 13, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes openzfs#11872
Closes openzfs#11896
ghost pushed a commit to truenas/zfs that referenced this issue May 17, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes openzfs#11872
Closes openzfs#11896
behlendorf pushed a commit that referenced this issue May 20, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #11872
Closes #11896
sempervictus pushed a commit to sempervictus/zfs that referenced this issue May 31, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes openzfs#11872
Closes openzfs#11896
tonyhutter pushed a commit that referenced this issue Jun 23, 2021
Fix NULL pointer dereference when reporting
checksum error for gang block in zio_done.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #11872
Closes #11896
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant