Eliminate two duplicated lines codes to enhance the xt_hashlimit.c of ne... #128

gfreewind · 2014-10-24T07:20:55Z

There are two duplicated lines of hashlimit_mt. Now I enhance it to eliminate the duplicated codes.

… netfilter

st_kim_ref() does not take care of the fact that platform_get_drvdata() might return NULL. On AM437x EVM, this causes the platform to stop booting as soon as the module is inserted. This patch fixes the issue by checking for NULL return value. Oops log follows. I have not tested BT functionality after this patch. But at least the platform boots now. [ 12.675697] Unable to handle kernel NULL pointer dereference at virtual address 0000005 [ 12.684310] pgd = c0004000 [ 12.687157] [0000005] *pgd=00000000 [ 12.690927] Internal error: Oops: 17 [#1] SMP ARM [ 12.695873] Modules linked in: btwilink bluetooth ti_vpfe dwc3(+) ov2659 videobuf2_core v4l2_common videodev ti_am335x_adc 6lowpan_iphc matrix_keypad panel_dpi kfifo_buf pixcir_i2c_ts media industrialio videobuf2_dma_contig c_can_platform videobuf2_memops dwc3_omap c_can can_dev [ 12.721969] CPU: 0 PID: 1235 Comm: kworker/u3:0 Not tainted 3.14.25-02445-g9036ac6daed6 torvalds#128 [ 12.730937] Workqueue: hci0 hci_power_on [bluetooth] [ 12.736165] task: ebd93b40 ti: ecd7c000 task.ti: ecd7c000 [ 12.741856] PC is at st_kim_ref+0x30/0x40 [ 12.746071] LR is at st_kim_ref+0x30/0x40 [ 12.750289] pc : [<c03caf58>] lr : [<c03caf58>] psr: a0000013 [ 12.750289] sp : ecd7de08 ip : ecd7de08 fp : ecd7de1c [ 12.762365] r10: bf1e710c r9 : bf1e70ec r8 : bf1e7964 [ 12.767858] r7 : ebd2fd50 r6 : bf1e7964 r5 : 00000000 r4 : ecd7de24 [ 12.774723] r3 : c0957208 r2 : 00000000 r1 : c0957208 r0 : 00000000 [ 12.781589] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [ 12.789274] Control: 10c5387d Table: abde4059 DAC: 00000015 [ 12.795315] Process kworker/u3:0 (pid: 1235, stack limit = 0xecd7c248) Signed-off-by: Sekhar Nori <nsekhar@ti.com>

Elizafox · 2015-01-08T13:50:04Z

Linus doesn't accept pull requests from GitHub. Consult https://github.com/torvalds/linux/blob/master/Documentation/HOWTO instead.

Tested with the Debian snapshot #128 and the binary user-side drivers available at the public download page: http://malideveloper.arm.com/develop-for-mali/features/mali-t6xx-gpu-user-space-drivers/

Unlike scan_async_group(), soc_of_bind() doesn't allocate its soc_camera_async_client structure using devm_kzalloc(), but has it embedded inside the soc_of_info structure. Hence on failure, it must free the whole soc_of_info structure, and not just the embedded soc_camera_async_client structure, as the latter causes a warning, and may cause slab corruption: soc-camera-pdrv soc-camera-pdrv.0: Probing soc-camera-pdrv.0 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at drivers/base/devres.c:887 devm_kfree+0x30/0x40() CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-shmobile-08386-g37feb0d093cb2d8e #128 Hardware name: Generic R8A7791 (Flattened Device Tree) Backtrace: [<c0011e7c>] (dump_backtrace) from [<c0012024>] (show_stack+0x18/0x1c) r6:c05a923b r5:00000009 r4:00000000 r3:00204140 [<c001200c>] (show_stack) from [<c048ed30>] (dump_stack+0x78/0x94) [<c048ecb8>] (dump_stack) from [<c002687c>] (warn_slowpath_common+0x8c/0xb8) r4:00000000 r3:00000000 [<c00267f0>] (warn_slowpath_common) from [<c0026980>] (warn_slowpath_null+0x24/0x2c) r8:ee7d8214 r7:ed83b810 r6:ed83bc20 r5:fffffffa r4:ed83e510 [<c002695c>] (warn_slowpath_null) from [<c025e0cc>] (devm_kfree+0x30/0x40) [<c025e09c>] (devm_kfree) from [<c032bbf4>] (soc_of_bind.isra.14+0x194/0x1d4) [<c032ba60>] (soc_of_bind.isra.14) from [<c032c6b8>] (soc_camera_host_register+0x208/0x31c) r9:00000070 r8:ee7e05d0 r7:ee153210 r6:00000000 r5:ee7e0218 r4:ed83bc20 [<c032c4b0>] (soc_camera_host_register) from [<c032e80c>] (rcar_vin_probe+0x1f4/0x238) r8:ee153200 r7:00000008 r6:ee153210 r5:ed83bc10 r4:c066319c r3:000000c0 [<c032e618>] (rcar_vin_probe) from [<c025c334>] (platform_drv_probe+0x50/0xa0) r10:00000000 r9:c0662fa8 r8:00000000 r7:c06a3700 r6:c0662fa8 r5:ee153210 r4:00000000 [<c025c2e4>] (platform_drv_probe) from [<c025af08>] (driver_probe_device+0xc4/0x208) r6:c06a36f4 r5:00000000 r4:ee153210 r3:c025c2e4 [<c025ae44>] (driver_probe_device) from [<c025b108>] (__driver_attach+0x70/0x94) r9:c066f9c0 r8:c0624a98 r7:c065b790 r6:c0662fa8 r5:ee153244 r4:ee153210 [<c025b098>] (__driver_attach) from [<c025984c>] (bus_for_each_dev+0x74/0x98) r6:c025b098 r5:c0662fa8 r4:00000000 r3:00000001 [<c02597d8>] (bus_for_each_dev) from [<c025b1dc>] (driver_attach+0x20/0x28) r6:ed83c200 r5:00000000 r4:c0662fa8 [<c025b1bc>] (driver_attach) from [<c025a00c>] (bus_add_driver+0xdc/0x1c4) [<c0259f30>] (bus_add_driver) from [<c025b8f4>] (driver_register+0xa4/0xe8) r7:c0624a98 r6:00000000 r5:c060b010 r4:c0662fa8 [<c025b850>] (driver_register) from [<c025ccd0>] (__platform_driver_register+0x50/0x64) r5:c060b010 r4:ed8394c0 [<c025cc80>] (__platform_driver_register) from [<c060b028>] (rcar_vin_driver_init+0x18/0x20) [<c060b010>] (rcar_vin_driver_init) from [<c05edde8>] (do_one_initcall+0x108/0x1b8) [<c05edce0>] (do_one_initcall) from [<c05edfb4>] (kernel_init_freeable+0x11c/0x1e4) r9:c066f9c0 r8:c066f9c0 r7:c062eab0 r6:c06252c4 r5:000000ad r4:00000006 [<c05ede98>] (kernel_init_freeable) from [<c048c3d0>] (kernel_init+0x10/0xec) r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c048c3c0 r4:00000000 [<c048c3c0>] (kernel_init) from [<c000eba0>] (ret_from_fork+0x14/0x34) r4:00000000 r3:ee04e000 ---[ end trace e3a984cc0335c8a0 ]--- rcar_vin e6ef1000.video: group probe failed: -6 Fixes: 1ddc6a6 ("[media] soc_camera: add support for dt binding soc_camera drivers") Cc: <stable@vger.kernel.org> # 3.17+ Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>

commit 8e48a2d upstream. Unlike scan_async_group(), soc_of_bind() doesn't allocate its soc_camera_async_client structure using devm_kzalloc(), but has it embedded inside the soc_of_info structure. Hence on failure, it must free the whole soc_of_info structure, and not just the embedded soc_camera_async_client structure, as the latter causes a warning, and may cause slab corruption: soc-camera-pdrv soc-camera-pdrv.0: Probing soc-camera-pdrv.0 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at drivers/base/devres.c:887 devm_kfree+0x30/0x40() CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-shmobile-08386-g37feb0d093cb2d8e #128 Hardware name: Generic R8A7791 (Flattened Device Tree) Backtrace: [<c0011e7c>] (dump_backtrace) from [<c0012024>] (show_stack+0x18/0x1c) r6:c05a923b r5:00000009 r4:00000000 r3:00204140 [<c001200c>] (show_stack) from [<c048ed30>] (dump_stack+0x78/0x94) [<c048ecb8>] (dump_stack) from [<c002687c>] (warn_slowpath_common+0x8c/0xb8) r4:00000000 r3:00000000 [<c00267f0>] (warn_slowpath_common) from [<c0026980>] (warn_slowpath_null+0x24/0x2c) r8:ee7d8214 r7:ed83b810 r6:ed83bc20 r5:fffffffa r4:ed83e510 [<c002695c>] (warn_slowpath_null) from [<c025e0cc>] (devm_kfree+0x30/0x40) [<c025e09c>] (devm_kfree) from [<c032bbf4>] (soc_of_bind.isra.14+0x194/0x1d4) [<c032ba60>] (soc_of_bind.isra.14) from [<c032c6b8>] (soc_camera_host_register+0x208/0x31c) r9:00000070 r8:ee7e05d0 r7:ee153210 r6:00000000 r5:ee7e0218 r4:ed83bc20 [<c032c4b0>] (soc_camera_host_register) from [<c032e80c>] (rcar_vin_probe+0x1f4/0x238) r8:ee153200 r7:00000008 r6:ee153210 r5:ed83bc10 r4:c066319c r3:000000c0 [<c032e618>] (rcar_vin_probe) from [<c025c334>] (platform_drv_probe+0x50/0xa0) r10:00000000 r9:c0662fa8 r8:00000000 r7:c06a3700 r6:c0662fa8 r5:ee153210 r4:00000000 [<c025c2e4>] (platform_drv_probe) from [<c025af08>] (driver_probe_device+0xc4/0x208) r6:c06a36f4 r5:00000000 r4:ee153210 r3:c025c2e4 [<c025ae44>] (driver_probe_device) from [<c025b108>] (__driver_attach+0x70/0x94) r9:c066f9c0 r8:c0624a98 r7:c065b790 r6:c0662fa8 r5:ee153244 r4:ee153210 [<c025b098>] (__driver_attach) from [<c025984c>] (bus_for_each_dev+0x74/0x98) r6:c025b098 r5:c0662fa8 r4:00000000 r3:00000001 [<c02597d8>] (bus_for_each_dev) from [<c025b1dc>] (driver_attach+0x20/0x28) r6:ed83c200 r5:00000000 r4:c0662fa8 [<c025b1bc>] (driver_attach) from [<c025a00c>] (bus_add_driver+0xdc/0x1c4) [<c0259f30>] (bus_add_driver) from [<c025b8f4>] (driver_register+0xa4/0xe8) r7:c0624a98 r6:00000000 r5:c060b010 r4:c0662fa8 [<c025b850>] (driver_register) from [<c025ccd0>] (__platform_driver_register+0x50/0x64) r5:c060b010 r4:ed8394c0 [<c025cc80>] (__platform_driver_register) from [<c060b028>] (rcar_vin_driver_init+0x18/0x20) [<c060b010>] (rcar_vin_driver_init) from [<c05edde8>] (do_one_initcall+0x108/0x1b8) [<c05edce0>] (do_one_initcall) from [<c05edfb4>] (kernel_init_freeable+0x11c/0x1e4) r9:c066f9c0 r8:c066f9c0 r7:c062eab0 r6:c06252c4 r5:000000ad r4:00000006 [<c05ede98>] (kernel_init_freeable) from [<c048c3d0>] (kernel_init+0x10/0xec) r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c048c3c0 r4:00000000 [<c048c3c0>] (kernel_init) from [<c000eba0>] (ret_from_fork+0x14/0x34) r4:00000000 r3:ee04e000 ---[ end trace e3a984cc0335c8a0 ]--- rcar_vin e6ef1000.video: group probe failed: -6 Fixes: 1ddc6a6 ("[media] soc_camera: add support for dt binding soc_camera drivers") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Unlike scan_async_group(), soc_of_bind() doesn't allocate its soc_camera_async_client structure using devm_kzalloc(), but has it embedded inside the soc_of_info structure. Hence on failure, it must free the whole soc_of_info structure, and not just the embedded soc_camera_async_client structure, as the latter causes a warning, and may cause slab corruption: soc-camera-pdrv soc-camera-pdrv.0: Probing soc-camera-pdrv.0 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at drivers/base/devres.c:887 devm_kfree+0x30/0x40() CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-shmobile-08386-g37feb0d093cb2d8e torvalds#128 Hardware name: Generic R8A7791 (Flattened Device Tree) Backtrace: [<c0011e7c>] (dump_backtrace) from [<c0012024>] (show_stack+0x18/0x1c) r6:c05a923b r5:00000009 r4:00000000 r3:00204140 [<c001200c>] (show_stack) from [<c048ed30>] (dump_stack+0x78/0x94) [<c048ecb8>] (dump_stack) from [<c002687c>] (warn_slowpath_common+0x8c/0xb8) r4:00000000 r3:00000000 [<c00267f0>] (warn_slowpath_common) from [<c0026980>] (warn_slowpath_null+0x24/0x2c) r8:ee7d8214 r7:ed83b810 r6:ed83bc20 r5:fffffffa r4:ed83e510 [<c002695c>] (warn_slowpath_null) from [<c025e0cc>] (devm_kfree+0x30/0x40) [<c025e09c>] (devm_kfree) from [<c032bbf4>] (soc_of_bind.isra.14+0x194/0x1d4) [<c032ba60>] (soc_of_bind.isra.14) from [<c032c6b8>] (soc_camera_host_register+0x208/0x31c) r9:00000070 r8:ee7e05d0 r7:ee153210 r6:00000000 r5:ee7e0218 r4:ed83bc20 [<c032c4b0>] (soc_camera_host_register) from [<c032e80c>] (rcar_vin_probe+0x1f4/0x238) r8:ee153200 r7:00000008 r6:ee153210 r5:ed83bc10 r4:c066319c r3:000000c0 [<c032e618>] (rcar_vin_probe) from [<c025c334>] (platform_drv_probe+0x50/0xa0) r10:00000000 r9:c0662fa8 r8:00000000 r7:c06a3700 r6:c0662fa8 r5:ee153210 r4:00000000 [<c025c2e4>] (platform_drv_probe) from [<c025af08>] (driver_probe_device+0xc4/0x208) r6:c06a36f4 r5:00000000 r4:ee153210 r3:c025c2e4 [<c025ae44>] (driver_probe_device) from [<c025b108>] (__driver_attach+0x70/0x94) r9:c066f9c0 r8:c0624a98 r7:c065b790 r6:c0662fa8 r5:ee153244 r4:ee153210 [<c025b098>] (__driver_attach) from [<c025984c>] (bus_for_each_dev+0x74/0x98) r6:c025b098 r5:c0662fa8 r4:00000000 r3:00000001 [<c02597d8>] (bus_for_each_dev) from [<c025b1dc>] (driver_attach+0x20/0x28) r6:ed83c200 r5:00000000 r4:c0662fa8 [<c025b1bc>] (driver_attach) from [<c025a00c>] (bus_add_driver+0xdc/0x1c4) [<c0259f30>] (bus_add_driver) from [<c025b8f4>] (driver_register+0xa4/0xe8) r7:c0624a98 r6:00000000 r5:c060b010 r4:c0662fa8 [<c025b850>] (driver_register) from [<c025ccd0>] (__platform_driver_register+0x50/0x64) r5:c060b010 r4:ed8394c0 [<c025cc80>] (__platform_driver_register) from [<c060b028>] (rcar_vin_driver_init+0x18/0x20) [<c060b010>] (rcar_vin_driver_init) from [<c05edde8>] (do_one_initcall+0x108/0x1b8) [<c05edce0>] (do_one_initcall) from [<c05edfb4>] (kernel_init_freeable+0x11c/0x1e4) r9:c066f9c0 r8:c066f9c0 r7:c062eab0 r6:c06252c4 r5:000000ad r4:00000006 [<c05ede98>] (kernel_init_freeable) from [<c048c3d0>] (kernel_init+0x10/0xec) r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c048c3c0 r4:00000000 [<c048c3c0>] (kernel_init) from [<c000eba0>] (ret_from_fork+0x14/0x34) r4:00000000 r3:ee04e000 ---[ end trace e3a984cc0335c8a0 ]--- rcar_vin e6ef1000.video: group probe failed: -6 Fixes: 1ddc6a6 ("[media] soc_camera: add support for dt binding soc_camera drivers") Cc: <stable@vger.kernel.org> # 3.17+ Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>

WARNING: line over 80 characters torvalds#128: FILE: mm/cma.c:186: + alignment = PAGE_SIZE << max_t(unsigned long, MAX_ORDER - 1, pageblock_order); WARNING: line over 80 characters torvalds#137: FILE: mm/cma.c:270: + (phys_addr_t)PAGE_SIZE << max_t(unsigned long, MAX_ORDER - 1, pageblock_order)); total: 0 errors, 2 warnings, 16 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/mm-cma-silence-warnings-due-to-max-usage.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

st_kim_ref() does not take care of the fact that platform_get_drvdata() might return NULL. On AM437x EVM, this causes the platform to stop booting as soon as the module is inserted. This patch fixes the issue by checking for NULL return value. Oops log follows. I have not tested BT functionality after this patch. But at least the platform boots now. [ 12.675697] Unable to handle kernel NULL pointer dereference at virtual address 0000005 [ 12.684310] pgd = c0004000 [ 12.687157] [0000005] *pgd=00000000 [ 12.690927] Internal error: Oops: 17 [#1] SMP ARM [ 12.695873] Modules linked in: btwilink bluetooth ti_vpfe dwc3(+) ov2659 videobuf2_core v4l2_common videodev ti_am335x_adc 6lowpan_iphc matrix_keypad panel_dpi kfifo_buf pixcir_i2c_ts media industrialio videobuf2_dma_contig c_can_platform videobuf2_memops dwc3_omap c_can can_dev [ 12.721969] CPU: 0 PID: 1235 Comm: kworker/u3:0 Not tainted 3.14.25-02445-g9036ac6daed6 Freescale#128 [ 12.730937] Workqueue: hci0 hci_power_on [bluetooth] [ 12.736165] task: ebd93b40 ti: ecd7c000 task.ti: ecd7c000 [ 12.741856] PC is at st_kim_ref+0x30/0x40 [ 12.746071] LR is at st_kim_ref+0x30/0x40 [ 12.750289] pc : [<c03caf58>] lr : [<c03caf58>] psr: a0000013 [ 12.750289] sp : ecd7de08 ip : ecd7de08 fp : ecd7de1c [ 12.762365] r10: bf1e710c r9 : bf1e70ec r8 : bf1e7964 [ 12.767858] r7 : ebd2fd50 r6 : bf1e7964 r5 : 00000000 r4 : ecd7de24 [ 12.774723] r3 : c0957208 r2 : 00000000 r1 : c0957208 r0 : 00000000 [ 12.781589] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [ 12.789274] Control: 10c5387d Table: abde4059 DAC: 00000015 [ 12.795315] Process kworker/u3:0 (pid: 1235, stack limit = 0xecd7c248) Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Gigi Joseph <gigi.joseph@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

lkl: Fix Kconfig error

(1) use cpu id from bl31 delivers; (2) sp_el0 should point to kernel address in EL1 mode. On ARM64, kernel uses sp_el0 to store current_thread_info(), we see a problem: when fiq occurs, cpu is EL1 mode but sp_el0 point to userspace address. At this moment, if we read 'current_thread_info()->cpu' or other, it leads an error. We find above situation happens when save/restore cpu context between system mode and user mode under heavy load. Like 'ret_fast_syscall()', kernel restore context of user mode, but fiq occurs before the instruction 'eret', so this causes the above situation. Assembly code: ffffff80080826c8 <ret_fast_syscall>: ...skipping... ffffff80080826fc: d503201f nop ffffff8008082700: d5384100 mrs x0, sp_el0 ffffff8008082704: f9400c00 ldr x0, [x0,torvalds#24] ffffff8008082708: d5182000 msr ttbr0_el1, x0 ffffff800808270c: d5033fdf isb ffffff8008082710: f9407ff7 ldr x23, [sp,torvalds#248] ffffff8008082714: d5184117 msr sp_el0, x23 ffffff8008082718: d503201f nop ffffff800808271c: d503201f nop ffffff8008082720: d5184035 msr elr_el1, x21 ffffff8008082724: d5184016 msr spsr_el1, x22 ffffff8008082728: a94007e0 ldp x0, x1, [sp] ffffff800808272c: a9410fe2 ldp x2, x3, [sp,torvalds#16] ffffff8008082730: a94217e4 ldp x4, x5, [sp,torvalds#32] ffffff8008082734: a9431fe6 ldp x6, x7, [sp,torvalds#48] ffffff8008082738: a94427e8 ldp x8, x9, [sp,torvalds#64] ffffff800808273c: a9452fea ldp x10, x11, [sp,torvalds#80] ffffff8008082740: a94637ec ldp x12, x13, [sp,torvalds#96] ffffff8008082744: a9473fee ldp x14, x15, [sp,torvalds#112] ffffff8008082748: a94847f0 ldp x16, x17, [sp,torvalds#128] ffffff800808274c: a9494ff2 ldp x18, x19, [sp,torvalds#144] ffffff8008082750: a94a57f4 ldp x20, x21, [sp,torvalds#160] ffffff8008082754: a94b5ff6 ldp x22, x23, [sp,torvalds#176] ffffff8008082758: a94c67f8 ldp x24, x25, [sp,torvalds#192] ffffff800808275c: a94d6ffa ldp x26, x27, [sp,torvalds#208] ffffff8008082760: a94e77fc ldp x28, x29, [sp,torvalds#224] ffffff8008082764: f9407bfe ldr x30, [sp,torvalds#240] ffffff8008082768: 9104c3ff add sp, sp, #0x130 ffffff800808276c: d69f03e0 eret Change-Id: I071e899f8a407764e166ca0403199c9d87d6ce78 Signed-off-by: chenjh <chenjh@rock-chips.com>

Rebase patches onto 4.14.11

[ Upstream commit 4adfa79 ] When we dump the ip6mr mfc entries via proc, we initialize an iterator with the table to dump but we don't clear the cache pointer which might be initialized from a prior read on the same descriptor that ended. This can result in lock imbalance (an unnecessary unlock) leading to other crashes and hangs. Clear the cache pointer like ipmr does to fix the issue. Thanks for the reliable reproducer. Here's syzbot's trace: WARNING: bad unlock balance detected! 4.15.0-rc3+ torvalds#128 Not tainted syzkaller971460/3195 is trying to release lock (mrt_lock) at: [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553 but there are no more locks to release! other info that might help us debug this: 1 lock held by syzkaller971460/3195: #0: (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0 fs/seq_file.c:165 stack backtrace: CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561 __lock_release kernel/locking/lockdep.c:3775 [inline] lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023 __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline] _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255 ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553 traverse+0x3bc/0xa00 fs/seq_file.c:135 seq_read+0x96a/0x13d0 fs/seq_file.c:189 proc_reg_read+0xef/0x170 fs/proc/inode.c:217 do_loop_readv_writev fs/read_write.c:673 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:897 compat_readv+0x1bf/0x270 fs/read_write.c:1140 do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189 C_SYSC_preadv fs/read_write.c:1209 [inline] compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125 RIP: 0023:0xf7f73c79 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 BUG: sleeping function called from invalid context at lib/usercopy.c:25 in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460 INFO: lockdep is turned off. CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060 __might_sleep+0x95/0x190 kernel/sched/core.c:6013 __might_fault+0xab/0x1d0 mm/memory.c:4525 _copy_to_user+0x2c/0xc0 lib/usercopy.c:25 copy_to_user include/linux/uaccess.h:155 [inline] seq_read+0xcb4/0x13d0 fs/seq_file.c:279 proc_reg_read+0xef/0x170 fs/proc/inode.c:217 do_loop_readv_writev fs/read_write.c:673 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:897 compat_readv+0x1bf/0x270 fs/read_write.c:1140 do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189 C_SYSC_preadv fs/read_write.c:1209 [inline] compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125 RIP: 0023:0xf7f73c79 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0 lib/usercopy.c:26 Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 4adfa79 ] When we dump the ip6mr mfc entries via proc, we initialize an iterator with the table to dump but we don't clear the cache pointer which might be initialized from a prior read on the same descriptor that ended. This can result in lock imbalance (an unnecessary unlock) leading to other crashes and hangs. Clear the cache pointer like ipmr does to fix the issue. Thanks for the reliable reproducer. Here's syzbot's trace: WARNING: bad unlock balance detected! 4.15.0-rc3+ torvalds#128 Not tainted syzkaller971460/3195 is trying to release lock (mrt_lock) at: [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553 but there are no more locks to release! other info that might help us debug this: 1 lock held by syzkaller971460/3195: #0: (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0 fs/seq_file.c:165 stack backtrace: CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561 __lock_release kernel/locking/lockdep.c:3775 [inline] lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023 __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline] _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255 ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553 traverse+0x3bc/0xa00 fs/seq_file.c:135 seq_read+0x96a/0x13d0 fs/seq_file.c:189 proc_reg_read+0xef/0x170 fs/proc/inode.c:217 do_loop_readv_writev fs/read_write.c:673 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:897 compat_readv+0x1bf/0x270 fs/read_write.c:1140 do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189 C_SYSC_preadv fs/read_write.c:1209 [inline] compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125 RIP: 0023:0xf7f73c79 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 BUG: sleeping function called from invalid context at lib/usercopy.c:25 in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460 INFO: lockdep is turned off. CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060 __might_sleep+0x95/0x190 kernel/sched/core.c:6013 __might_fault+0xab/0x1d0 mm/memory.c:4525 _copy_to_user+0x2c/0xc0 lib/usercopy.c:25 copy_to_user include/linux/uaccess.h:155 [inline] seq_read+0xcb4/0x13d0 fs/seq_file.c:279 proc_reg_read+0xef/0x170 fs/proc/inode.c:217 do_loop_readv_writev fs/read_write.c:673 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:897 compat_readv+0x1bf/0x270 fs/read_write.c:1140 do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189 C_SYSC_preadv fs/read_write.c:1209 [inline] compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125 RIP: 0023:0xf7f73c79 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0 lib/usercopy.c:26 Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>

[ Upstream commit 4adfa79 ] When we dump the ip6mr mfc entries via proc, we initialize an iterator with the table to dump but we don't clear the cache pointer which might be initialized from a prior read on the same descriptor that ended. This can result in lock imbalance (an unnecessary unlock) leading to other crashes and hangs. Clear the cache pointer like ipmr does to fix the issue. Thanks for the reliable reproducer. Here's syzbot's trace: WARNING: bad unlock balance detected! 4.15.0-rc3+ torvalds#128 Not tainted syzkaller971460/3195 is trying to release lock (mrt_lock) at: [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553 but there are no more locks to release! other info that might help us debug this: 1 lock held by syzkaller971460/3195: #0: (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0 fs/seq_file.c:165 stack backtrace: CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561 __lock_release kernel/locking/lockdep.c:3775 [inline] lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023 __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline] _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255 ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553 traverse+0x3bc/0xa00 fs/seq_file.c:135 seq_read+0x96a/0x13d0 fs/seq_file.c:189 proc_reg_read+0xef/0x170 fs/proc/inode.c:217 do_loop_readv_writev fs/read_write.c:673 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:897 compat_readv+0x1bf/0x270 fs/read_write.c:1140 do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189 C_SYSC_preadv fs/read_write.c:1209 [inline] compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125 RIP: 0023:0xf7f73c79 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 BUG: sleeping function called from invalid context at lib/usercopy.c:25 in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460 INFO: lockdep is turned off. CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060 __might_sleep+0x95/0x190 kernel/sched/core.c:6013 __might_fault+0xab/0x1d0 mm/memory.c:4525 _copy_to_user+0x2c/0xc0 lib/usercopy.c:25 copy_to_user include/linux/uaccess.h:155 [inline] seq_read+0xcb4/0x13d0 fs/seq_file.c:279 proc_reg_read+0xef/0x170 fs/proc/inode.c:217 do_loop_readv_writev fs/read_write.c:673 [inline] do_iter_read+0x3db/0x5b0 fs/read_write.c:897 compat_readv+0x1bf/0x270 fs/read_write.c:1140 do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189 C_SYSC_preadv fs/read_write.c:1209 [inline] compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline] do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389 entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125 RIP: 0023:0xf7f73c79 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0 lib/usercopy.c:26 Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

These NEON and non-NEON implementations come from Andy Polyakov's implementation. They are exactly the same as Andy Polyakov's original, with the following exceptions: - Entries and exits use the proper kernel convention macro. - CPU feature checking is done in C by the glue code, so that has been removed from the assembly. - The function names have been renamed to fit kernel conventions. - Labels have been renamed (prefixed with .L) to fit kernel conventions. - Constants have been rearranged so that they are closer to the code that is using them. [ARM only] - The neon code can jump to the scalar code when it makes sense to do so. - The neon_512 function as a separate function has been removed, leaving the decision up to the main neon entry point. [ARM64 only] After '/^#/d;/^\..*[^:]$/d', the code has the following diff in actual instructions from the original. ARM: -ChaCha20_ctr32: -.LChaCha20_ctr32: +ENTRY(chacha20_arm) ldr r12,[sp,#0] @ pull pointer to counter and nonce stmdb sp!,{r0-r2,r4-r11,lr} - sub r14,pc,torvalds#16 @ ChaCha20_ctr32 - adr r14,.LChaCha20_ctr32 cmp r2,#0 @ len==0? itt eq addeq sp,sp,#4*3 - beq .Lno_data - cmp r2,torvalds#192 @ test len - bls .Lshort - ldr r4,[r14,#-32] - ldr r4,[r14,r4] - ldr r4,[r4] - tst r4,#ARMV7_NEON - bne .LChaCha20_neon + beq .Lno_data_arm .Lshort: ldmia r12,{r4-r7} @ load counter and nonce sub sp,sp,#4*(16) @ off-load area - sub r14,r14,torvalds#64 @ .Lsigma + sub r14,pc,torvalds#100 @ .Lsigma + adr r14,.Lsigma @ .Lsigma stmdb sp!,{r4-r7} @ copy counter and nonce ldmia r3,{r4-r11} @ load key ldmia r14,{r0-r3} @ load sigma @@ -617,14 +615,25 @@ .Ldone: add sp,sp,#4*(32+3) -.Lno_data: +.Lno_data_arm: ldmia sp!,{r4-r11,pc} +ENDPROC(chacha20_arm) -ChaCha20_neon: +ENTRY(chacha20_neon) ldr r12,[sp,#0] @ pull pointer to counter and nonce stmdb sp!,{r0-r2,r4-r11,lr} -.LChaCha20_neon: - adr r14,.Lsigma + cmp r2,#0 @ len==0? + itt eq + addeq sp,sp,#4*3 + beq .Lno_data_neon + cmp r2,torvalds#192 @ test len + bls .Lshort +.Lchacha20_neon_begin: + adr r14,.Lsigma2 vstmdb sp!,{d8-d15} @ ABI spec says so stmdb sp!,{r0-r3} @@ -1265,4 +1274,6 @@ add sp,sp,#4*(32+4) vldmia sp,{d8-d15} add sp,sp,#4*(16+3) +.Lno_data_neon: ldmia sp!,{r4-r11,pc} +ENDPROC(chacha20_neon) ARM64: -ChaCha20_ctr32: +ENTRY(chacha20_arm) cbz x2,.Labort - adr x5,.LOPENSSL_armcap_P - cmp x2,torvalds#192 - b.lo .Lshort - ldrsw x6,[x5] - ldr x6,[x5] - ldr w17,[x6,x5] - tst w17,#ARMV7_NEON - b.ne ChaCha20_neon - .Lshort: stp x29,x30,[sp,#-96]! add x29,sp,#0 @@ -279,8 +274,13 @@ ldp x27,x28,[x29,torvalds#80] ldp x29,x30,[sp],torvalds#96 ret +ENDPROC(chacha20_arm) + +ENTRY(chacha20_neon) + cbz x2,.Labort_neon + cmp x2,torvalds#192 + b.lo .Lshort -ChaCha20_neon: stp x29,x30,[sp,#-96]! add x29,sp,#0 @@ -763,16 +763,6 @@ ldp x27,x28,[x29,torvalds#80] ldp x29,x30,[sp],torvalds#96 ret -ChaCha20_512_neon: - stp x29,x30,[sp,#-96]! - add x29,sp,#0 - - adr x5,.Lsigma - stp x19,x20,[sp,torvalds#16] - stp x21,x22,[sp,torvalds#32] - stp x23,x24,[sp,torvalds#48] - stp x25,x26,[sp,torvalds#64] - stp x27,x28,[sp,torvalds#80] .L512_or_more_neon: sub sp,sp,torvalds#128+64 @@ -1920,4 +1910,6 @@ ldp x25,x26,[x29,torvalds#64] ldp x27,x28,[x29,torvalds#80] ldp x29,x30,[sp],torvalds#96 +.Labort_neon: ret +ENDPROC(chacha20_neon) Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Samuel Neves <sneves@dei.uc.pt> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Cc: Andy Polyakov <appro@openssl.org> Cc: Russell King <linux@armlinux.org.uk> Cc: linux-arm-kernel@lists.infradead.org

These NEON and non-NEON implementations come from Andy Polyakov's implementation. They are exactly the same as Andy Polyakov's original, with the following exceptions: - Entries and exits use the proper kernel convention macro. - CPU feature checking is done in C by the glue code, so that has been removed from the assembly. - The function names have been renamed to fit kernel conventions. - Labels have been renamed to fit kernel conventions. - The neon code can jump to the scalar code when it makes sense to do so. After '/^#/d;/^\..*[^:]$/d', the code has the following diff in actual instructions from the original. ARM: -poly1305_init: -.Lpoly1305_init: +ENTRY(poly1305_init_arm) stmdb sp!,{r4-r11} eor r3,r3,r3 @@ -18,8 +25,6 @@ moveq r0,#0 beq .Lno_key - adr r11,.Lpoly1305_init - ldr r12,.LOPENSSL_armcap ldrb r4,[r1,#0] mov r10,#0x0fffffff ldrb r5,[r1,#1] @@ -34,8 +39,6 @@ ldrb r7,[r1,torvalds#6] and r4,r4,r10 - ldr r12,[r11,r12] @ OPENSSL_armcap_P - ldr r12,[r12] ldrb r8,[r1,torvalds#7] orr r5,r5,r6,lsl#8 ldrb r6,[r1,torvalds#8] @@ -45,22 +48,6 @@ ldrb r8,[r1,torvalds#10] and r5,r5,r3 - tst r12,#ARMV7_NEON @ check for NEON - adr r9,poly1305_blocks_neon - adr r11,poly1305_blocks - it ne - movne r11,r9 - adr r12,poly1305_emit - adr r10,poly1305_emit_neon - it ne - movne r12,r10 - itete eq - addeq r12,r11,#(poly1305_emit-.Lpoly1305_init) - addne r12,r11,#(poly1305_emit_neon-.Lpoly1305_init) - addeq r11,r11,#(poly1305_blocks-.Lpoly1305_init) - addne r11,r11,#(poly1305_blocks_neon-.Lpoly1305_init) - orr r12,r12,#1 @ thumb-ify address - orr r11,r11,#1 ldrb r9,[r1,torvalds#11] orr r6,r6,r7,lsl#8 ldrb r7,[r1,torvalds#12] @@ -79,17 +66,16 @@ str r6,[r0,torvalds#8] and r7,r7,r3 str r7,[r0,torvalds#12] - stmia r2,{r11,r12} @ fill functions table - mov r0,#1 - mov r0,#0 .Lno_key: ldmia sp!,{r4-r11} bx lr @ bx lr tst lr,#1 moveq pc,lr @ be binary compatible with V4, yet .word 0xe12fff1e @ interoperable with Thumb ISA:-) -poly1305_blocks: -.Lpoly1305_blocks: +ENDPROC(poly1305_init_arm) + +ENTRY(poly1305_blocks_arm) +.Lpoly1305_blocks_arm: stmdb sp!,{r3-r11,lr} ands r2,r2,#-16 @@ -231,10 +217,11 @@ tst lr,#1 moveq pc,lr @ be binary compatible with V4, yet .word 0xe12fff1e @ interoperable with Thumb ISA:-) -poly1305_emit: +ENDPROC(poly1305_blocks_arm) + +ENTRY(poly1305_emit_arm) stmdb sp!,{r4-r11} .Lpoly1305_emit_enter: - ldmia r0,{r3-r7} adds r8,r3,#5 @ compare to modulus adcs r9,r4,#0 @@ -305,8 +292,12 @@ tst lr,#1 moveq pc,lr @ be binary compatible with V4, yet .word 0xe12fff1e @ interoperable with Thumb ISA:-) +ENDPROC(poly1305_emit_arm) + + -poly1305_init_neon: +ENTRY(poly1305_init_neon) +.Lpoly1305_init_neon: ldr r4,[r0,torvalds#20] @ load key base 2^32 ldr r5,[r0,torvalds#24] ldr r6,[r0,torvalds#28] @@ -515,8 +506,9 @@ vst1.32 {d8[1]},[r7] bx lr @ bx lr +ENDPROC(poly1305_init_neon) -poly1305_blocks_neon: +ENTRY(poly1305_blocks_neon) ldr ip,[r0,torvalds#36] @ is_base2_26 ands r2,r2,#-16 beq .Lno_data_neon @@ -524,7 +516,7 @@ cmp r2,torvalds#64 bhs .Lenter_neon tst ip,ip @ is_base2_26? - beq .Lpoly1305_blocks + beq .Lpoly1305_blocks_arm .Lenter_neon: stmdb sp!,{r4-r7} @@ -534,7 +526,7 @@ bne .Lbase2_26_neon stmdb sp!,{r1-r3,lr} - bl poly1305_init_neon + bl .Lpoly1305_init_neon ldr r4,[r0,#0] @ load hash value base 2^32 ldr r5,[r0,#4] @@ -989,8 +981,9 @@ ldmia sp!,{r4-r7} .Lno_data_neon: bx lr @ bx lr +ENDPROC(poly1305_blocks_neon) -poly1305_emit_neon: +ENTRY(poly1305_emit_neon) ldr ip,[r0,torvalds#36] @ is_base2_26 stmdb sp!,{r4-r11} @@ -1055,6 +1048,6 @@ ldmia sp!,{r4-r11} bx lr @ bx lr +ENDPROC(poly1305_emit_neon) ARM64: -poly1305_init: +ENTRY(poly1305_init_arm) cmp x1,xzr stp xzr,xzr,[x0] // zero hash value stp xzr,xzr,[x0,torvalds#16] // [along with is_base2_26] @@ -11,14 +15,9 @@ csel x0,xzr,x0,eq b.eq .Lno_key - ldrsw x11,.LOPENSSL_armcap_P - ldr x11,.LOPENSSL_armcap_P - adr x10,.LOPENSSL_armcap_P - ldp x7,x8,[x1] // load key mov x9,#0xfffffffc0fffffff movk x9,#0x0fff,lsl#48 - ldr w17,[x10,x11] rev x7,x7 // flip bytes rev x8,x8 and x7,x7,x9 // &=0ffffffc0fffffff @@ -26,24 +25,11 @@ and x8,x8,x9 // &=0ffffffc0ffffffc stp x7,x8,[x0,torvalds#32] // save key value - tst w17,#ARMV7_NEON - - adr x12,poly1305_blocks - adr x7,poly1305_blocks_neon - adr x13,poly1305_emit - adr x8,poly1305_emit_neon - - csel x12,x12,x7,eq - csel x13,x13,x8,eq - - stp w12,w13,[x2] - stp x12,x13,[x2] - - mov x0,#1 .Lno_key: ret +ENDPROC(poly1305_init_arm) -poly1305_blocks: +ENTRY(poly1305_blocks_arm) ands x2,x2,#-16 b.eq .Lno_data @@ -100,8 +86,9 @@ .Lno_data: ret +ENDPROC(poly1305_blocks_arm) -poly1305_emit: +ENTRY(poly1305_emit_arm) ldp x4,x5,[x0] // load hash base 2^64 ldr x6,[x0,torvalds#16] ldp x10,x11,[x2] // load nonce @@ -124,7 +111,9 @@ stp x4,x5,[x1] // write result ret -poly1305_mult: +ENDPROC(poly1305_emit_arm) + +__poly1305_mult: mul x12,x4,x7 // h0*r0 umulh x13,x4,x7 @@ -158,7 +147,7 @@ ret -poly1305_splat: +__poly1305_splat: and x12,x4,#0x03ffffff // base 2^64 -> base 2^26 ubfx x13,x4,torvalds#26,torvalds#26 extr x14,x5,x4,#52 @@ -182,11 +171,11 @@ ret -poly1305_blocks_neon: +ENTRY(poly1305_blocks_neon) ldr x17,[x0,torvalds#24] cmp x2,torvalds#128 b.hs .Lblocks_neon - cbz x17,poly1305_blocks + cbz x17,poly1305_blocks_arm .Lblocks_neon: stp x29,x30,[sp,#-80]! @@ -232,7 +221,7 @@ adcs x5,x5,x13 adc x6,x6,x3 - bl poly1305_mult + bl __poly1305_mult ldr x30,[sp,torvalds#8] cbz x3,.Lstore_base2_64_neon @@ -274,7 +263,7 @@ adcs x5,x5,x13 adc x6,x6,x3 - bl poly1305_mult + bl __poly1305_mult .Linit_neon: and x10,x4,#0x03ffffff // base 2^64 -> base 2^26 @@ -301,19 +290,19 @@ mov x5,x8 mov x6,xzr add x0,x0,torvalds#48+12 - bl poly1305_splat + bl __poly1305_splat - bl poly1305_mult // r^2 + bl __poly1305_mult // r^2 sub x0,x0,#4 - bl poly1305_splat + bl __poly1305_splat - bl poly1305_mult // r^3 + bl __poly1305_mult // r^3 sub x0,x0,#4 - bl poly1305_splat + bl __poly1305_splat - bl poly1305_mult // r^4 + bl __poly1305_mult // r^4 sub x0,x0,#4 - bl poly1305_splat + bl __poly1305_splat ldr x30,[sp,torvalds#8] add x16,x1,torvalds#32 @@ -743,10 +732,11 @@ .Lno_data_neon: ldr x29,[sp],torvalds#80 ret +ENDPROC(poly1305_blocks_neon) -poly1305_emit_neon: +ENTRY(poly1305_emit_neon) ldr x17,[x0,torvalds#24] - cbz x17,poly1305_emit + cbz x17,poly1305_emit_arm ldp w10,w11,[x0] // load hash value base 2^26 ldp w12,w13,[x0,torvalds#8] @@ -788,6 +778,6 @@ stp x4,x5,[x1] // write result ret +ENDPROC(poly1305_emit_neon) Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Samuel Neves <sneves@dei.uc.pt> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com> Cc: Andy Polyakov <appro@openssl.org> Cc: Russell King <linux@armlinux.org.uk> Cc: linux-arm-kernel@lists.infradead.org

Fix checkpatch warnings by using pr_err(): WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... torvalds#109: FILE: drivers/tty/serial/cpm_uart/cpm_uart_cpm2.c:109: + printk(KERN_ERR WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... torvalds#128: FILE: drivers/tty/serial/cpm_uart/cpm_uart_cpm2.c:128: + printk(KERN_ERR WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... + printk(KERN_ERR WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... + printk(KERN_ERR Signed-off-by: Enrico Weigelt <info@metux.net>

Fix checkpatch warnings by using pr_err(): WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... torvalds#109: FILE: drivers/tty/serial/cpm_uart/cpm_uart_cpm2.c:109: + printk(KERN_ERR WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... torvalds#128: FILE: drivers/tty/serial/cpm_uart/cpm_uart_cpm2.c:128: + printk(KERN_ERR WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... + printk(KERN_ERR WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... + printk(KERN_ERR Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Enrico Weigelt <info@metux.net>

[BUG] Test case btrfs/124 failed if larger metadata folio is enabled, the dying message looks like this: BTRFS error (device dm-2): bad tree block start, mirror 2 want 31686656 have 0 BTRFS info (device dm-2): read error corrected: ino 0 off 31686656 (dev /dev/mapper/test-scratch2 sector 20928) BUG: kernel NULL pointer dereference, address: 0000000000000020 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page CPU: 6 PID: 350881 Comm: btrfs Tainted: G OE 6.7.0-rc3-custom+ torvalds#128 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022 RIP: 0010:btrfs_read_extent_buffer+0x106/0x180 [btrfs] PKRU: 55555554 Call Trace: <TASK> read_tree_block+0x33/0xb0 [btrfs] read_block_for_search+0x23e/0x340 [btrfs] btrfs_search_slot+0x2f9/0xe60 [btrfs] btrfs_lookup_csum+0x75/0x160 [btrfs] btrfs_lookup_bio_sums+0x21a/0x560 [btrfs] btrfs_submit_chunk+0x152/0x680 [btrfs] btrfs_submit_bio+0x1c/0x50 [btrfs] submit_one_bio+0x40/0x80 [btrfs] submit_extent_page+0x158/0x390 [btrfs] btrfs_do_readpage+0x330/0x740 [btrfs] extent_readahead+0x38d/0x6c0 [btrfs] read_pages+0x94/0x2c0 page_cache_ra_unbounded+0x12d/0x190 relocate_file_extent_cluster+0x7c1/0x9d0 [btrfs] relocate_block_group+0x2d3/0x560 [btrfs] btrfs_relocate_block_group+0x2c7/0x4b0 [btrfs] btrfs_relocate_chunk+0x4c/0x1a0 [btrfs] btrfs_balance+0x925/0x13c0 [btrfs] btrfs_ioctl+0x19f1/0x25d0 [btrfs] __x64_sys_ioctl+0x90/0xd0 do_syscall_64+0x3f/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 [CAUSE] The dying line is at btrfs_repair_io_failure() call inside btrfs_repair_eb_io_failure(). The function is still relying on the extent buffer using page sized folios. When the extent buffer is using larger folio, we go into the 2nd slot of folios[], and triggered the NULL pointer dereference. [FIX] Migrate btrfs_repair_io_failure() to folio interfaces. So that when we hit a larger folio, we just submit the whole folio in one go. This also affects data repair path through btrfs_end_repair_bio(), thankfully data is still fully page based, we can just add an ASSERT(), and use page_folio() to convert the page to folio. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

[BUG] Test case btrfs/002 would fail if larger folios are enabled for metadata: assertion failed: folio, in fs/btrfs/extent_io.c:4358 ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent_io.c:4358! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 30916 Comm: fsstress Tainted: G OE 6.7.0-rc3-custom+ torvalds#128 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022 RIP: 0010:assert_eb_folio_uptodate+0x98/0xe0 [btrfs] Call Trace: <TASK> extent_buffer_test_bit+0x3c/0x70 [btrfs] free_space_test_bit+0xcd/0x140 [btrfs] modify_free_space_bitmap+0x27a/0x430 [btrfs] add_to_free_space_tree+0x8d/0x160 [btrfs] __btrfs_free_extent.isra.0+0xef1/0x13c0 [btrfs] __btrfs_run_delayed_refs+0x786/0x13c0 [btrfs] btrfs_run_delayed_refs+0x33/0x120 [btrfs] btrfs_commit_transaction+0xa2/0x1350 [btrfs] iterate_supers+0x77/0xe0 ksys_sync+0x60/0xa0 __do_sys_sync+0xa/0x20 do_syscall_64+0x3f/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 </TASK> [CAUSE] The function extent_buffer_test_bit() is not folio compatible. It still assumes the old fixed page size, when an extent buffer with large folio passed in, only eb->folios[0] is populated. Then if the target bit range falls in the 2nd page of the folio, then we would check eb->folios[1], and trigger the ASSERT(). [FIX] Just migrate eb_bitmap_offset() to folio interfaces, using the folio_size() to replace PAGE_SIZE. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

[BUG] Test case btrfs/124 failed if larger metadata folio is enabled, the dying message looks like this: BTRFS error (device dm-2): bad tree block start, mirror 2 want 31686656 have 0 BTRFS info (device dm-2): read error corrected: ino 0 off 31686656 (dev /dev/mapper/test-scratch2 sector 20928) BUG: kernel NULL pointer dereference, address: 0000000000000020 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page CPU: 6 PID: 350881 Comm: btrfs Tainted: G OE 6.7.0-rc3-custom+ torvalds#128 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022 RIP: 0010:btrfs_read_extent_buffer+0x106/0x180 [btrfs] PKRU: 55555554 Call Trace: <TASK> read_tree_block+0x33/0xb0 [btrfs] read_block_for_search+0x23e/0x340 [btrfs] btrfs_search_slot+0x2f9/0xe60 [btrfs] btrfs_lookup_csum+0x75/0x160 [btrfs] btrfs_lookup_bio_sums+0x21a/0x560 [btrfs] btrfs_submit_chunk+0x152/0x680 [btrfs] btrfs_submit_bio+0x1c/0x50 [btrfs] submit_one_bio+0x40/0x80 [btrfs] submit_extent_page+0x158/0x390 [btrfs] btrfs_do_readpage+0x330/0x740 [btrfs] extent_readahead+0x38d/0x6c0 [btrfs] read_pages+0x94/0x2c0 page_cache_ra_unbounded+0x12d/0x190 relocate_file_extent_cluster+0x7c1/0x9d0 [btrfs] relocate_block_group+0x2d3/0x560 [btrfs] btrfs_relocate_block_group+0x2c7/0x4b0 [btrfs] btrfs_relocate_chunk+0x4c/0x1a0 [btrfs] btrfs_balance+0x925/0x13c0 [btrfs] btrfs_ioctl+0x19f1/0x25d0 [btrfs] __x64_sys_ioctl+0x90/0xd0 do_syscall_64+0x3f/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 [CAUSE] The dying line is at btrfs_repair_io_failure() call inside btrfs_repair_eb_io_failure(). The function is still relying on the extent buffer using page sized folios. When the extent buffer is using larger folio, we go into the 2nd slot of folios[], and triggered the NULL pointer dereference. [FIX] Migrate btrfs_repair_io_failure() to folio interfaces. So that when we hit a larger folio, we just submit the whole folio in one go. This also affects data repair path through btrfs_end_repair_bio(), thankfully data is still fully page based, we can just add an ASSERT(), and use page_folio() to convert the page to folio. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs") for s390x, add support for 64-bit pointers to kfuncs for LoongArch. Since the infrastructure is already implemented in BPF core, the only thing need to be done is to override bpf_jit_supports_far_kfunc_call(). Before this change, several test_verifier tests failed: # ./test_verifier | grep # | grep FAIL torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL This is because the kfuncs in the loaded module are far away from __bpf_call_base: ffff800002009440 t bpf_kfunc_call_test_fail1 [bpf_testmod] 9000000002e128d8 T __bpf_call_base The offset relative to __bpf_call_base does NOT fit in s32, which breaks the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts this limit. Note that to reproduce the above result, tools/testing/selftests/bpf/config should be applied, and run the test with JIT enabled, unpriv BPF enabled. With this change, the test_verifier tests now all passed: # ./test_verifier ... Summary: 777 PASSED, 0 SKIPPED, 0 FAILED Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

[in vfs.git#fixes] copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor torvalds#128, despite torvalds#64 being observably not open. The best way to fix that is in dup_fd() - there we both can easily check whether we might need to fix anything up and see which word needs the upper bits cleared. Reproducer follows: #define __GNU_SOURCE #include <linux/close_range.h> #include <unistd.h> #include <fcntl.h> #include <signal.h> #include <sched.h> #include <stdio.h> #include <stdbool.h> #include <sys/mman.h> void is_open(int fd) { printf("#%d is %s\n", fd, fcntl(fd, F_GETFD) >= 0 ? "opened" : "not opened"); } int child(void *unused) { while(1) { } return 0; } int main(void) { char *stack; pid_t pid; stack = mmap(NULL, 1024*1024, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); if (stack == MAP_FAILED) { perror("mmap"); return -1; } pid = clone(child, stack + 1024*1024, CLONE_FILES | SIGCHLD, NULL); if (pid == -1) { perror("clone"); return -1; } for (int i = 2; i < 128; i++) dup2(0, i); close_range(64, ~0U, CLOSE_RANGE_UNSHARE); is_open(64); printf("dup(0) => %d, expected 64\n", dup(0)); kill(pid, 9); return 0; } Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor torvalds#128, despite torvalds#64 being observably not open. The best way to fix that is in dup_fd() - there we both can easily check whether we might need to fix anything up and see which word needs the upper bits cleared. Reproducer follows: #define __GNU_SOURCE #include <linux/close_range.h> #include <unistd.h> #include <fcntl.h> #include <signal.h> #include <sched.h> #include <stdio.h> #include <stdbool.h> #include <sys/mman.h> void is_open(int fd) { printf("#%d is %s\n", fd, fcntl(fd, F_GETFD) >= 0 ? "opened" : "not opened"); } int child(void *unused) { while(1) { } return 0; } int main(void) { char *stack; pid_t pid; stack = mmap(NULL, 1024*1024, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); if (stack == MAP_FAILED) { perror("mmap"); return -1; } pid = clone(child, stack + 1024*1024, CLONE_FILES | SIGCHLD, NULL); if (pid == -1) { perror("clone"); return -1; } for (int i = 2; i < 128; i++) dup2(0, i); close_range(64, ~0U, CLOSE_RANGE_UNSHARE); is_open(64); printf("dup(0) => %d, expected 64\n", dup(0)); kill(pid, 9); return 0; } Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor torvalds#128, despite torvalds#64 being observably not open. The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first. * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset(). Reproducer added to tools/testing/selftests/core/close_range_test.c Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit 9a2fa14 upstream. copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor torvalds#128, despite torvalds#64 being observably not open. The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first. * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset(). Reproducer added to tools/testing/selftests/core/close_range_test.c Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor torvalds#128, despite torvalds#64 being observably not open. The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first. * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset(). Reproducer added to tools/testing/selftests/core/close_range_test.c Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

gfreewind added 2 commits October 24, 2014 00:10

Eliminate two duplicated lines codes to enhance the xt_hashlimit.c of…

d239f95

… netfilter

Merge branch 'master' of github.com:torvalds/linux

a392824

laijs pushed a commit to laijs/linux that referenced this pull request Feb 13, 2017

Merge pull request torvalds#128 from pscollins/stop-kconfig-error

37d0a2c

lkl: Fix Kconfig error

iaguis pushed a commit to kinvolk/linux that referenced this pull request Feb 6, 2018

Merge pull request torvalds#128 from coreosbot/v4.14.11-coreos

e2b917f

Rebase patches onto 4.14.11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate two duplicated lines codes to enhance the xt_hashlimit.c of ne... #128

Eliminate two duplicated lines codes to enhance the xt_hashlimit.c of ne... #128

gfreewind commented Oct 24, 2014

Elizafox commented Jan 8, 2015

Eliminate two duplicated lines codes to enhance the xt_hashlimit.c of ne... #128

Are you sure you want to change the base?

Eliminate two duplicated lines codes to enhance the xt_hashlimit.c of ne... #128

Conversation

gfreewind commented Oct 24, 2014

Elizafox commented Jan 8, 2015