Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate two duplicated lines codes to enhance the xt_hashlimit.c of ne... #128

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gfreewind
Copy link
Contributor

There are two duplicated lines of hashlimit_mt. Now I enhance it to eliminate the duplicated codes.

rogerq pushed a commit to rogerq/linux that referenced this pull request Dec 19, 2014
st_kim_ref() does not take care of the fact that platform_get_drvdata() might return NULL. On AM437x EVM, this causes the platform to stop booting as soon as the module is inserted.

This patch fixes the issue by checking for NULL return value. Oops log follows.

I have not tested BT functionality after this patch. But at least the platform boots now.

[   12.675697] Unable to handle kernel NULL pointer dereference at virtual address 0000005
[   12.684310] pgd = c0004000
[   12.687157] [0000005] *pgd=00000000
[   12.690927] Internal error: Oops: 17 [#1] SMP ARM
[   12.695873] Modules linked in: btwilink bluetooth ti_vpfe dwc3(+) ov2659 videobuf2_core v4l2_common videodev ti_am335x_adc 6lowpan_iphc matrix_keypad panel_dpi kfifo_buf pixcir_i2c_ts media industrialio videobuf2_dma_contig c_can_platform videobuf2_memops dwc3_omap c_can can_dev
[   12.721969] CPU: 0 PID: 1235 Comm: kworker/u3:0 Not tainted 3.14.25-02445-g9036ac6daed6 torvalds#128
[   12.730937] Workqueue: hci0 hci_power_on [bluetooth]
[   12.736165] task: ebd93b40 ti: ecd7c000 task.ti: ecd7c000
[   12.741856] PC is at st_kim_ref+0x30/0x40
[   12.746071] LR is at st_kim_ref+0x30/0x40
[   12.750289] pc : [<c03caf58>]    lr : [<c03caf58>]    psr: a0000013
[   12.750289] sp : ecd7de08  ip : ecd7de08  fp : ecd7de1c
[   12.762365] r10: bf1e710c  r9 : bf1e70ec  r8 : bf1e7964
[   12.767858] r7 : ebd2fd50  r6 : bf1e7964  r5 : 00000000  r4 : ecd7de24
[   12.774723] r3 : c0957208  r2 : 00000000  r1 : c0957208  r0 : 00000000
[   12.781589] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[   12.789274] Control: 10c5387d  Table: abde4059  DAC: 00000015
[   12.795315] Process kworker/u3:0 (pid: 1235, stack limit = 0xecd7c248)

Signed-off-by: Sekhar Nori <nsekhar@ti.com>
@Elizafox
Copy link

Elizafox commented Jan 8, 2015

Linus doesn't accept pull requests from GitHub. Consult https://github.com/torvalds/linux/blob/master/Documentation/HOWTO instead.

ldts referenced this pull request in 96boards/linux Mar 13, 2015
Tested with the Debian snapshot #128 and the binary user-side drivers available at the public download page:
http://malideveloper.arm.com/develop-for-mali/features/mali-t6xx-gpu-user-space-drivers/
torvalds pushed a commit that referenced this pull request Apr 8, 2015
Unlike scan_async_group(), soc_of_bind() doesn't allocate its
soc_camera_async_client structure using devm_kzalloc(), but has it
embedded inside the soc_of_info structure.  Hence on failure, it must
free the whole soc_of_info structure, and not just the embedded
soc_camera_async_client structure, as the latter causes a warning, and
may cause slab corruption:

    soc-camera-pdrv soc-camera-pdrv.0: Probing soc-camera-pdrv.0
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 1 at drivers/base/devres.c:887 devm_kfree+0x30/0x40()
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-shmobile-08386-g37feb0d093cb2d8e #128
    Hardware name: Generic R8A7791 (Flattened Device Tree)
    Backtrace:
    [<c0011e7c>] (dump_backtrace) from [<c0012024>] (show_stack+0x18/0x1c)
     r6:c05a923b r5:00000009 r4:00000000 r3:00204140
    [<c001200c>] (show_stack) from [<c048ed30>] (dump_stack+0x78/0x94)
    [<c048ecb8>] (dump_stack) from [<c002687c>] (warn_slowpath_common+0x8c/0xb8)
     r4:00000000 r3:00000000
    [<c00267f0>] (warn_slowpath_common) from [<c0026980>] (warn_slowpath_null+0x24/0x2c)
     r8:ee7d8214 r7:ed83b810 r6:ed83bc20 r5:fffffffa r4:ed83e510
    [<c002695c>] (warn_slowpath_null) from [<c025e0cc>] (devm_kfree+0x30/0x40)
    [<c025e09c>] (devm_kfree) from [<c032bbf4>] (soc_of_bind.isra.14+0x194/0x1d4)
    [<c032ba60>] (soc_of_bind.isra.14) from [<c032c6b8>] (soc_camera_host_register+0x208/0x31c)
     r9:00000070 r8:ee7e05d0 r7:ee153210 r6:00000000 r5:ee7e0218 r4:ed83bc20
    [<c032c4b0>] (soc_camera_host_register) from [<c032e80c>] (rcar_vin_probe+0x1f4/0x238)
     r8:ee153200 r7:00000008 r6:ee153210 r5:ed83bc10 r4:c066319c r3:000000c0
    [<c032e618>] (rcar_vin_probe) from [<c025c334>] (platform_drv_probe+0x50/0xa0)
     r10:00000000 r9:c0662fa8 r8:00000000 r7:c06a3700 r6:c0662fa8 r5:ee153210
     r4:00000000
    [<c025c2e4>] (platform_drv_probe) from [<c025af08>] (driver_probe_device+0xc4/0x208)
     r6:c06a36f4 r5:00000000 r4:ee153210 r3:c025c2e4
    [<c025ae44>] (driver_probe_device) from [<c025b108>] (__driver_attach+0x70/0x94)
     r9:c066f9c0 r8:c0624a98 r7:c065b790 r6:c0662fa8 r5:ee153244 r4:ee153210
    [<c025b098>] (__driver_attach) from [<c025984c>] (bus_for_each_dev+0x74/0x98)
     r6:c025b098 r5:c0662fa8 r4:00000000 r3:00000001
    [<c02597d8>] (bus_for_each_dev) from [<c025b1dc>] (driver_attach+0x20/0x28)
     r6:ed83c200 r5:00000000 r4:c0662fa8
    [<c025b1bc>] (driver_attach) from [<c025a00c>] (bus_add_driver+0xdc/0x1c4)
    [<c0259f30>] (bus_add_driver) from [<c025b8f4>] (driver_register+0xa4/0xe8)
     r7:c0624a98 r6:00000000 r5:c060b010 r4:c0662fa8
    [<c025b850>] (driver_register) from [<c025ccd0>] (__platform_driver_register+0x50/0x64)
     r5:c060b010 r4:ed8394c0
    [<c025cc80>] (__platform_driver_register) from [<c060b028>] (rcar_vin_driver_init+0x18/0x20)
    [<c060b010>] (rcar_vin_driver_init) from [<c05edde8>] (do_one_initcall+0x108/0x1b8)
    [<c05edce0>] (do_one_initcall) from [<c05edfb4>] (kernel_init_freeable+0x11c/0x1e4)
     r9:c066f9c0 r8:c066f9c0 r7:c062eab0 r6:c06252c4 r5:000000ad r4:00000006
    [<c05ede98>] (kernel_init_freeable) from [<c048c3d0>] (kernel_init+0x10/0xec)
     r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c048c3c0 r4:00000000
    [<c048c3c0>] (kernel_init) from [<c000eba0>] (ret_from_fork+0x14/0x34)
     r4:00000000 r3:ee04e000
    ---[ end trace e3a984cc0335c8a0 ]---
    rcar_vin e6ef1000.video: group probe failed: -6

Fixes: 1ddc6a6 ("[media] soc_camera: add support for dt binding soc_camera drivers")

Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
damentz referenced this pull request in zen-kernel/zen-kernel Apr 19, 2015
commit 8e48a2d upstream.

Unlike scan_async_group(), soc_of_bind() doesn't allocate its
soc_camera_async_client structure using devm_kzalloc(), but has it
embedded inside the soc_of_info structure.  Hence on failure, it must
free the whole soc_of_info structure, and not just the embedded
soc_camera_async_client structure, as the latter causes a warning, and
may cause slab corruption:

    soc-camera-pdrv soc-camera-pdrv.0: Probing soc-camera-pdrv.0
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 1 at drivers/base/devres.c:887 devm_kfree+0x30/0x40()
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-shmobile-08386-g37feb0d093cb2d8e #128
    Hardware name: Generic R8A7791 (Flattened Device Tree)
    Backtrace:
    [<c0011e7c>] (dump_backtrace) from [<c0012024>] (show_stack+0x18/0x1c)
     r6:c05a923b r5:00000009 r4:00000000 r3:00204140
    [<c001200c>] (show_stack) from [<c048ed30>] (dump_stack+0x78/0x94)
    [<c048ecb8>] (dump_stack) from [<c002687c>] (warn_slowpath_common+0x8c/0xb8)
     r4:00000000 r3:00000000
    [<c00267f0>] (warn_slowpath_common) from [<c0026980>] (warn_slowpath_null+0x24/0x2c)
     r8:ee7d8214 r7:ed83b810 r6:ed83bc20 r5:fffffffa r4:ed83e510
    [<c002695c>] (warn_slowpath_null) from [<c025e0cc>] (devm_kfree+0x30/0x40)
    [<c025e09c>] (devm_kfree) from [<c032bbf4>] (soc_of_bind.isra.14+0x194/0x1d4)
    [<c032ba60>] (soc_of_bind.isra.14) from [<c032c6b8>] (soc_camera_host_register+0x208/0x31c)
     r9:00000070 r8:ee7e05d0 r7:ee153210 r6:00000000 r5:ee7e0218 r4:ed83bc20
    [<c032c4b0>] (soc_camera_host_register) from [<c032e80c>] (rcar_vin_probe+0x1f4/0x238)
     r8:ee153200 r7:00000008 r6:ee153210 r5:ed83bc10 r4:c066319c r3:000000c0
    [<c032e618>] (rcar_vin_probe) from [<c025c334>] (platform_drv_probe+0x50/0xa0)
     r10:00000000 r9:c0662fa8 r8:00000000 r7:c06a3700 r6:c0662fa8 r5:ee153210
     r4:00000000
    [<c025c2e4>] (platform_drv_probe) from [<c025af08>] (driver_probe_device+0xc4/0x208)
     r6:c06a36f4 r5:00000000 r4:ee153210 r3:c025c2e4
    [<c025ae44>] (driver_probe_device) from [<c025b108>] (__driver_attach+0x70/0x94)
     r9:c066f9c0 r8:c0624a98 r7:c065b790 r6:c0662fa8 r5:ee153244 r4:ee153210
    [<c025b098>] (__driver_attach) from [<c025984c>] (bus_for_each_dev+0x74/0x98)
     r6:c025b098 r5:c0662fa8 r4:00000000 r3:00000001
    [<c02597d8>] (bus_for_each_dev) from [<c025b1dc>] (driver_attach+0x20/0x28)
     r6:ed83c200 r5:00000000 r4:c0662fa8
    [<c025b1bc>] (driver_attach) from [<c025a00c>] (bus_add_driver+0xdc/0x1c4)
    [<c0259f30>] (bus_add_driver) from [<c025b8f4>] (driver_register+0xa4/0xe8)
     r7:c0624a98 r6:00000000 r5:c060b010 r4:c0662fa8
    [<c025b850>] (driver_register) from [<c025ccd0>] (__platform_driver_register+0x50/0x64)
     r5:c060b010 r4:ed8394c0
    [<c025cc80>] (__platform_driver_register) from [<c060b028>] (rcar_vin_driver_init+0x18/0x20)
    [<c060b010>] (rcar_vin_driver_init) from [<c05edde8>] (do_one_initcall+0x108/0x1b8)
    [<c05edce0>] (do_one_initcall) from [<c05edfb4>] (kernel_init_freeable+0x11c/0x1e4)
     r9:c066f9c0 r8:c066f9c0 r7:c062eab0 r6:c06252c4 r5:000000ad r4:00000006
    [<c05ede98>] (kernel_init_freeable) from [<c048c3d0>] (kernel_init+0x10/0xec)
     r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c048c3c0 r4:00000000
    [<c048c3c0>] (kernel_init) from [<c000eba0>] (ret_from_fork+0x14/0x34)
     r4:00000000 r3:ee04e000
    ---[ end trace e3a984cc0335c8a0 ]---
    rcar_vin e6ef1000.video: group probe failed: -6

Fixes: 1ddc6a6 ("[media] soc_camera: add support for dt binding soc_camera drivers")

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kraj pushed a commit to kraj/linux that referenced this pull request Oct 17, 2015
Unlike scan_async_group(), soc_of_bind() doesn't allocate its
soc_camera_async_client structure using devm_kzalloc(), but has it
embedded inside the soc_of_info structure.  Hence on failure, it must
free the whole soc_of_info structure, and not just the embedded
soc_camera_async_client structure, as the latter causes a warning, and
may cause slab corruption:

    soc-camera-pdrv soc-camera-pdrv.0: Probing soc-camera-pdrv.0
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 1 at drivers/base/devres.c:887 devm_kfree+0x30/0x40()
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-shmobile-08386-g37feb0d093cb2d8e torvalds#128
    Hardware name: Generic R8A7791 (Flattened Device Tree)
    Backtrace:
    [<c0011e7c>] (dump_backtrace) from [<c0012024>] (show_stack+0x18/0x1c)
     r6:c05a923b r5:00000009 r4:00000000 r3:00204140
    [<c001200c>] (show_stack) from [<c048ed30>] (dump_stack+0x78/0x94)
    [<c048ecb8>] (dump_stack) from [<c002687c>] (warn_slowpath_common+0x8c/0xb8)
     r4:00000000 r3:00000000
    [<c00267f0>] (warn_slowpath_common) from [<c0026980>] (warn_slowpath_null+0x24/0x2c)
     r8:ee7d8214 r7:ed83b810 r6:ed83bc20 r5:fffffffa r4:ed83e510
    [<c002695c>] (warn_slowpath_null) from [<c025e0cc>] (devm_kfree+0x30/0x40)
    [<c025e09c>] (devm_kfree) from [<c032bbf4>] (soc_of_bind.isra.14+0x194/0x1d4)
    [<c032ba60>] (soc_of_bind.isra.14) from [<c032c6b8>] (soc_camera_host_register+0x208/0x31c)
     r9:00000070 r8:ee7e05d0 r7:ee153210 r6:00000000 r5:ee7e0218 r4:ed83bc20
    [<c032c4b0>] (soc_camera_host_register) from [<c032e80c>] (rcar_vin_probe+0x1f4/0x238)
     r8:ee153200 r7:00000008 r6:ee153210 r5:ed83bc10 r4:c066319c r3:000000c0
    [<c032e618>] (rcar_vin_probe) from [<c025c334>] (platform_drv_probe+0x50/0xa0)
     r10:00000000 r9:c0662fa8 r8:00000000 r7:c06a3700 r6:c0662fa8 r5:ee153210
     r4:00000000
    [<c025c2e4>] (platform_drv_probe) from [<c025af08>] (driver_probe_device+0xc4/0x208)
     r6:c06a36f4 r5:00000000 r4:ee153210 r3:c025c2e4
    [<c025ae44>] (driver_probe_device) from [<c025b108>] (__driver_attach+0x70/0x94)
     r9:c066f9c0 r8:c0624a98 r7:c065b790 r6:c0662fa8 r5:ee153244 r4:ee153210
    [<c025b098>] (__driver_attach) from [<c025984c>] (bus_for_each_dev+0x74/0x98)
     r6:c025b098 r5:c0662fa8 r4:00000000 r3:00000001
    [<c02597d8>] (bus_for_each_dev) from [<c025b1dc>] (driver_attach+0x20/0x28)
     r6:ed83c200 r5:00000000 r4:c0662fa8
    [<c025b1bc>] (driver_attach) from [<c025a00c>] (bus_add_driver+0xdc/0x1c4)
    [<c0259f30>] (bus_add_driver) from [<c025b8f4>] (driver_register+0xa4/0xe8)
     r7:c0624a98 r6:00000000 r5:c060b010 r4:c0662fa8
    [<c025b850>] (driver_register) from [<c025ccd0>] (__platform_driver_register+0x50/0x64)
     r5:c060b010 r4:ed8394c0
    [<c025cc80>] (__platform_driver_register) from [<c060b028>] (rcar_vin_driver_init+0x18/0x20)
    [<c060b010>] (rcar_vin_driver_init) from [<c05edde8>] (do_one_initcall+0x108/0x1b8)
    [<c05edce0>] (do_one_initcall) from [<c05edfb4>] (kernel_init_freeable+0x11c/0x1e4)
     r9:c066f9c0 r8:c066f9c0 r7:c062eab0 r6:c06252c4 r5:000000ad r4:00000006
    [<c05ede98>] (kernel_init_freeable) from [<c048c3d0>] (kernel_init+0x10/0xec)
     r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c048c3c0 r4:00000000
    [<c048c3c0>] (kernel_init) from [<c000eba0>] (ret_from_fork+0x14/0x34)
     r4:00000000 r3:ee04e000
    ---[ end trace e3a984cc0335c8a0 ]---
    rcar_vin e6ef1000.video: group probe failed: -6

Fixes: 1ddc6a6 ("[media] soc_camera: add support for dt binding soc_camera drivers")

Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
0day-ci pushed a commit to 0day-ci/linux that referenced this pull request May 27, 2016
WARNING: line over 80 characters
torvalds#128: FILE: mm/cma.c:186:
+	alignment = PAGE_SIZE << max_t(unsigned long, MAX_ORDER - 1, pageblock_order);

WARNING: line over 80 characters
torvalds#137: FILE: mm/cma.c:270:
+		(phys_addr_t)PAGE_SIZE << max_t(unsigned long, MAX_ORDER - 1, pageblock_order));

total: 0 errors, 2 warnings, 16 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

./patches/mm-cma-silence-warnings-due-to-max-usage.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
linux4kix referenced this pull request in SolidRun/linux-fslc Dec 11, 2016
st_kim_ref() does not take care of the fact that platform_get_drvdata() might return NULL. On AM437x EVM, this causes the platform to stop booting as soon as the module is inserted.

This patch fixes the issue by checking for NULL return value. Oops log follows.

I have not tested BT functionality after this patch. But at least the platform boots now.

[   12.675697] Unable to handle kernel NULL pointer dereference at virtual address 0000005
[   12.684310] pgd = c0004000
[   12.687157] [0000005] *pgd=00000000
[   12.690927] Internal error: Oops: 17 [#1] SMP ARM
[   12.695873] Modules linked in: btwilink bluetooth ti_vpfe dwc3(+) ov2659 videobuf2_core v4l2_common videodev ti_am335x_adc 6lowpan_iphc matrix_keypad panel_dpi kfifo_buf pixcir_i2c_ts media industrialio videobuf2_dma_contig c_can_platform videobuf2_memops dwc3_omap c_can can_dev
[   12.721969] CPU: 0 PID: 1235 Comm: kworker/u3:0 Not tainted 3.14.25-02445-g9036ac6daed6 Freescale#128
[   12.730937] Workqueue: hci0 hci_power_on [bluetooth]
[   12.736165] task: ebd93b40 ti: ecd7c000 task.ti: ecd7c000
[   12.741856] PC is at st_kim_ref+0x30/0x40
[   12.746071] LR is at st_kim_ref+0x30/0x40
[   12.750289] pc : [<c03caf58>]    lr : [<c03caf58>]    psr: a0000013
[   12.750289] sp : ecd7de08  ip : ecd7de08  fp : ecd7de1c
[   12.762365] r10: bf1e710c  r9 : bf1e70ec  r8 : bf1e7964
[   12.767858] r7 : ebd2fd50  r6 : bf1e7964  r5 : 00000000  r4 : ecd7de24
[   12.774723] r3 : c0957208  r2 : 00000000  r1 : c0957208  r0 : 00000000
[   12.781589] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[   12.789274] Control: 10c5387d  Table: abde4059  DAC: 00000015
[   12.795315] Process kworker/u3:0 (pid: 1235, stack limit = 0xecd7c248)

Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Gigi Joseph <gigi.joseph@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
laijs pushed a commit to laijs/linux that referenced this pull request Feb 13, 2017
wzyy2 pushed a commit to wzyy2/linux that referenced this pull request Jun 19, 2017
(1) use cpu id from bl31 delivers;
(2) sp_el0 should point to kernel address in EL1 mode.

On ARM64, kernel uses sp_el0 to store current_thread_info(),
we see a problem: when fiq occurs, cpu is EL1 mode but sp_el0
point to userspace address. At this moment, if we read
'current_thread_info()->cpu' or other, it leads an error.

We find above situation happens when save/restore cpu context
between system mode and user mode under heavy load.
Like 'ret_fast_syscall()', kernel restore context of user mode,
but fiq occurs before the instruction 'eret', so this causes the
above situation.

Assembly code:

ffffff80080826c8 <ret_fast_syscall>:

...skipping...

ffffff80080826fc:       d503201f        nop
ffffff8008082700:       d5384100        mrs     x0, sp_el0
ffffff8008082704:       f9400c00        ldr     x0, [x0,torvalds#24]
ffffff8008082708:       d5182000        msr     ttbr0_el1, x0
ffffff800808270c:       d5033fdf        isb
ffffff8008082710:       f9407ff7        ldr     x23, [sp,torvalds#248]
ffffff8008082714:       d5184117        msr     sp_el0, x23
ffffff8008082718:       d503201f        nop
ffffff800808271c:       d503201f        nop
ffffff8008082720:       d5184035        msr     elr_el1, x21
ffffff8008082724:       d5184016        msr     spsr_el1, x22
ffffff8008082728:       a94007e0        ldp     x0, x1, [sp]
ffffff800808272c:       a9410fe2        ldp     x2, x3, [sp,torvalds#16]
ffffff8008082730:       a94217e4        ldp     x4, x5, [sp,torvalds#32]
ffffff8008082734:       a9431fe6        ldp     x6, x7, [sp,torvalds#48]
ffffff8008082738:       a94427e8        ldp     x8, x9, [sp,torvalds#64]
ffffff800808273c:       a9452fea        ldp     x10, x11, [sp,torvalds#80]
ffffff8008082740:       a94637ec        ldp     x12, x13, [sp,torvalds#96]
ffffff8008082744:       a9473fee        ldp     x14, x15, [sp,torvalds#112]
ffffff8008082748:       a94847f0        ldp     x16, x17, [sp,torvalds#128]
ffffff800808274c:       a9494ff2        ldp     x18, x19, [sp,torvalds#144]
ffffff8008082750:       a94a57f4        ldp     x20, x21, [sp,torvalds#160]
ffffff8008082754:       a94b5ff6        ldp     x22, x23, [sp,torvalds#176]
ffffff8008082758:       a94c67f8        ldp     x24, x25, [sp,torvalds#192]
ffffff800808275c:       a94d6ffa        ldp     x26, x27, [sp,torvalds#208]
ffffff8008082760:       a94e77fc        ldp     x28, x29, [sp,torvalds#224]
ffffff8008082764:       f9407bfe        ldr     x30, [sp,torvalds#240]
ffffff8008082768:       9104c3ff        add     sp, sp, #0x130
ffffff800808276c:       d69f03e0        eret

Change-Id: I071e899f8a407764e166ca0403199c9d87d6ce78
Signed-off-by: chenjh <chenjh@rock-chips.com>
iaguis pushed a commit to kinvolk/linux that referenced this pull request Feb 6, 2018
tombriden pushed a commit to tombriden/linux that referenced this pull request Feb 12, 2018
[ Upstream commit 4adfa79 ]

When we dump the ip6mr mfc entries via proc, we initialize an iterator
with the table to dump but we don't clear the cache pointer which might
be initialized from a prior read on the same descriptor that ended. This
can result in lock imbalance (an unnecessary unlock) leading to other
crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
Thanks for the reliable reproducer.

Here's syzbot's trace:
 WARNING: bad unlock balance detected!
 4.15.0-rc3+ torvalds#128 Not tainted
 syzkaller971460/3195 is trying to release lock (mrt_lock) at:
 [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
 but there are no more locks to release!

 other info that might help us debug this:
 1 lock held by syzkaller971460/3195:
  #0:  (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0
 fs/seq_file.c:165

 stack backtrace:
 CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
  __lock_release kernel/locking/lockdep.c:3775 [inline]
  lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
  __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
  _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
  ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
  traverse+0x3bc/0xa00 fs/seq_file.c:135
  seq_read+0x96a/0x13d0 fs/seq_file.c:189
  proc_reg_read+0xef/0x170 fs/proc/inode.c:217
  do_loop_readv_writev fs/read_write.c:673 [inline]
  do_iter_read+0x3db/0x5b0 fs/read_write.c:897
  compat_readv+0x1bf/0x270 fs/read_write.c:1140
  do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
  C_SYSC_preadv fs/read_write.c:1209 [inline]
  compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
 RIP: 0023:0xf7f73c79
 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
 RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 BUG: sleeping function called from invalid context at lib/usercopy.c:25
 in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
 INFO: lockdep is turned off.
 CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
  __might_sleep+0x95/0x190 kernel/sched/core.c:6013
  __might_fault+0xab/0x1d0 mm/memory.c:4525
  _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
  copy_to_user include/linux/uaccess.h:155 [inline]
  seq_read+0xcb4/0x13d0 fs/seq_file.c:279
  proc_reg_read+0xef/0x170 fs/proc/inode.c:217
  do_loop_readv_writev fs/read_write.c:673 [inline]
  do_iter_read+0x3db/0x5b0 fs/read_write.c:897
  compat_readv+0x1bf/0x270 fs/read_write.c:1140
  do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
  C_SYSC_preadv fs/read_write.c:1209 [inline]
  compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
 RIP: 0023:0xf7f73c79
 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
 RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
 lib/usercopy.c:26

Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Noltari pushed a commit to Noltari/linux that referenced this pull request Feb 13, 2018
[ Upstream commit 4adfa79 ]

When we dump the ip6mr mfc entries via proc, we initialize an iterator
with the table to dump but we don't clear the cache pointer which might
be initialized from a prior read on the same descriptor that ended. This
can result in lock imbalance (an unnecessary unlock) leading to other
crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
Thanks for the reliable reproducer.

Here's syzbot's trace:
 WARNING: bad unlock balance detected!
 4.15.0-rc3+ torvalds#128 Not tainted
 syzkaller971460/3195 is trying to release lock (mrt_lock) at:
 [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
 but there are no more locks to release!

 other info that might help us debug this:
 1 lock held by syzkaller971460/3195:
  #0:  (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0
 fs/seq_file.c:165

 stack backtrace:
 CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
  __lock_release kernel/locking/lockdep.c:3775 [inline]
  lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
  __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
  _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
  ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
  traverse+0x3bc/0xa00 fs/seq_file.c:135
  seq_read+0x96a/0x13d0 fs/seq_file.c:189
  proc_reg_read+0xef/0x170 fs/proc/inode.c:217
  do_loop_readv_writev fs/read_write.c:673 [inline]
  do_iter_read+0x3db/0x5b0 fs/read_write.c:897
  compat_readv+0x1bf/0x270 fs/read_write.c:1140
  do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
  C_SYSC_preadv fs/read_write.c:1209 [inline]
  compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
 RIP: 0023:0xf7f73c79
 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
 RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 BUG: sleeping function called from invalid context at lib/usercopy.c:25
 in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
 INFO: lockdep is turned off.
 CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
  __might_sleep+0x95/0x190 kernel/sched/core.c:6013
  __might_fault+0xab/0x1d0 mm/memory.c:4525
  _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
  copy_to_user include/linux/uaccess.h:155 [inline]
  seq_read+0xcb4/0x13d0 fs/seq_file.c:279
  proc_reg_read+0xef/0x170 fs/proc/inode.c:217
  do_loop_readv_writev fs/read_write.c:673 [inline]
  do_iter_read+0x3db/0x5b0 fs/read_write.c:897
  compat_readv+0x1bf/0x270 fs/read_write.c:1140
  do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
  C_SYSC_preadv fs/read_write.c:1209 [inline]
  compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
 RIP: 0023:0xf7f73c79
 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
 RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
 lib/usercopy.c:26

Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Noltari pushed a commit to Noltari/linux that referenced this pull request Feb 16, 2018
[ Upstream commit 4adfa79 ]

When we dump the ip6mr mfc entries via proc, we initialize an iterator
with the table to dump but we don't clear the cache pointer which might
be initialized from a prior read on the same descriptor that ended. This
can result in lock imbalance (an unnecessary unlock) leading to other
crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
Thanks for the reliable reproducer.

Here's syzbot's trace:
 WARNING: bad unlock balance detected!
 4.15.0-rc3+ torvalds#128 Not tainted
 syzkaller971460/3195 is trying to release lock (mrt_lock) at:
 [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
 but there are no more locks to release!

 other info that might help us debug this:
 1 lock held by syzkaller971460/3195:
  #0:  (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0
 fs/seq_file.c:165

 stack backtrace:
 CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
  __lock_release kernel/locking/lockdep.c:3775 [inline]
  lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
  __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
  _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
  ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
  traverse+0x3bc/0xa00 fs/seq_file.c:135
  seq_read+0x96a/0x13d0 fs/seq_file.c:189
  proc_reg_read+0xef/0x170 fs/proc/inode.c:217
  do_loop_readv_writev fs/read_write.c:673 [inline]
  do_iter_read+0x3db/0x5b0 fs/read_write.c:897
  compat_readv+0x1bf/0x270 fs/read_write.c:1140
  do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
  C_SYSC_preadv fs/read_write.c:1209 [inline]
  compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
 RIP: 0023:0xf7f73c79
 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
 RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 BUG: sleeping function called from invalid context at lib/usercopy.c:25
 in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
 INFO: lockdep is turned off.
 CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
  __might_sleep+0x95/0x190 kernel/sched/core.c:6013
  __might_fault+0xab/0x1d0 mm/memory.c:4525
  _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
  copy_to_user include/linux/uaccess.h:155 [inline]
  seq_read+0xcb4/0x13d0 fs/seq_file.c:279
  proc_reg_read+0xef/0x170 fs/proc/inode.c:217
  do_loop_readv_writev fs/read_write.c:673 [inline]
  do_iter_read+0x3db/0x5b0 fs/read_write.c:897
  compat_readv+0x1bf/0x270 fs/read_write.c:1140
  do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
  C_SYSC_preadv fs/read_write.c:1209 [inline]
  compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
 RIP: 0023:0xf7f73c79
 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
 RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
 lib/usercopy.c:26

Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Noltari pushed a commit to Noltari/linux that referenced this pull request Mar 8, 2018
[ Upstream commit 4adfa79 ]

When we dump the ip6mr mfc entries via proc, we initialize an iterator
with the table to dump but we don't clear the cache pointer which might
be initialized from a prior read on the same descriptor that ended. This
can result in lock imbalance (an unnecessary unlock) leading to other
crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
Thanks for the reliable reproducer.

Here's syzbot's trace:
 WARNING: bad unlock balance detected!
 4.15.0-rc3+ torvalds#128 Not tainted
 syzkaller971460/3195 is trying to release lock (mrt_lock) at:
 [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
 but there are no more locks to release!

 other info that might help us debug this:
 1 lock held by syzkaller971460/3195:
  #0:  (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0
 fs/seq_file.c:165

 stack backtrace:
 CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
  __lock_release kernel/locking/lockdep.c:3775 [inline]
  lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
  __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
  _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
  ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
  traverse+0x3bc/0xa00 fs/seq_file.c:135
  seq_read+0x96a/0x13d0 fs/seq_file.c:189
  proc_reg_read+0xef/0x170 fs/proc/inode.c:217
  do_loop_readv_writev fs/read_write.c:673 [inline]
  do_iter_read+0x3db/0x5b0 fs/read_write.c:897
  compat_readv+0x1bf/0x270 fs/read_write.c:1140
  do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
  C_SYSC_preadv fs/read_write.c:1209 [inline]
  compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
 RIP: 0023:0xf7f73c79
 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
 RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 BUG: sleeping function called from invalid context at lib/usercopy.c:25
 in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
 INFO: lockdep is turned off.
 CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
  __might_sleep+0x95/0x190 kernel/sched/core.c:6013
  __might_fault+0xab/0x1d0 mm/memory.c:4525
  _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
  copy_to_user include/linux/uaccess.h:155 [inline]
  seq_read+0xcb4/0x13d0 fs/seq_file.c:279
  proc_reg_read+0xef/0x170 fs/proc/inode.c:217
  do_loop_readv_writev fs/read_write.c:673 [inline]
  do_iter_read+0x3db/0x5b0 fs/read_write.c:897
  compat_readv+0x1bf/0x270 fs/read_write.c:1140
  do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
  C_SYSC_preadv fs/read_write.c:1209 [inline]
  compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
 RIP: 0023:0xf7f73c79
 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
 RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
 lib/usercopy.c:26

Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
nemunaire pushed a commit to nemunaire/CI20_linux that referenced this pull request Jun 6, 2018
[ Upstream commit 4adfa79 ]

When we dump the ip6mr mfc entries via proc, we initialize an iterator
with the table to dump but we don't clear the cache pointer which might
be initialized from a prior read on the same descriptor that ended. This
can result in lock imbalance (an unnecessary unlock) leading to other
crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
Thanks for the reliable reproducer.

Here's syzbot's trace:
 WARNING: bad unlock balance detected!
 4.15.0-rc3+ torvalds#128 Not tainted
 syzkaller971460/3195 is trying to release lock (mrt_lock) at:
 [<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
 but there are no more locks to release!

 other info that might help us debug this:
 1 lock held by syzkaller971460/3195:
  #0:  (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0
 fs/seq_file.c:165

 stack backtrace:
 CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
  __lock_release kernel/locking/lockdep.c:3775 [inline]
  lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
  __raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
  _raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
  ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
  traverse+0x3bc/0xa00 fs/seq_file.c:135
  seq_read+0x96a/0x13d0 fs/seq_file.c:189
  proc_reg_read+0xef/0x170 fs/proc/inode.c:217
  do_loop_readv_writev fs/read_write.c:673 [inline]
  do_iter_read+0x3db/0x5b0 fs/read_write.c:897
  compat_readv+0x1bf/0x270 fs/read_write.c:1140
  do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
  C_SYSC_preadv fs/read_write.c:1209 [inline]
  compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
 RIP: 0023:0xf7f73c79
 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
 RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 BUG: sleeping function called from invalid context at lib/usercopy.c:25
 in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
 INFO: lockdep is turned off.
 CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ torvalds#128
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:17 [inline]
  dump_stack+0x194/0x257 lib/dump_stack.c:53
  ___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
  __might_sleep+0x95/0x190 kernel/sched/core.c:6013
  __might_fault+0xab/0x1d0 mm/memory.c:4525
  _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
  copy_to_user include/linux/uaccess.h:155 [inline]
  seq_read+0xcb4/0x13d0 fs/seq_file.c:279
  proc_reg_read+0xef/0x170 fs/proc/inode.c:217
  do_loop_readv_writev fs/read_write.c:673 [inline]
  do_iter_read+0x3db/0x5b0 fs/read_write.c:897
  compat_readv+0x1bf/0x270 fs/read_write.c:1140
  do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
  C_SYSC_preadv fs/read_write.c:1209 [inline]
  compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
  do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
  do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
  entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
 RIP: 0023:0xf7f73c79
 RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
 RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
 RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
 lib/usercopy.c:26

Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669@syzkaller.appspotmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Sep 19, 2018
These NEON and non-NEON implementations come from Andy Polyakov's
implementation. They are exactly the same as Andy Polyakov's original,
with the following exceptions:

- Entries and exits use the proper kernel convention macro.
- CPU feature checking is done in C by the glue code, so that has been
  removed from the assembly.
- The function names have been renamed to fit kernel conventions.
- Labels have been renamed (prefixed with .L) to fit kernel conventions.
- Constants have been rearranged so that they are closer to the code
  that is using them. [ARM only]
- The neon code can jump to the scalar code when it makes sense to do
  so.
- The neon_512 function as a separate function has been removed, leaving
  the decision up to the main neon entry point. [ARM64 only]

After '/^#/d;/^\..*[^:]$/d', the code has the following diff in actual
instructions from the original.

ARM:

-ChaCha20_ctr32:
-.LChaCha20_ctr32:
+ENTRY(chacha20_arm)
 	ldr	r12,[sp,#0]		@ pull pointer to counter and nonce
 	stmdb	sp!,{r0-r2,r4-r11,lr}
-	sub	r14,pc,torvalds#16		@ ChaCha20_ctr32
-	adr	r14,.LChaCha20_ctr32
 	cmp	r2,#0			@ len==0?
 	itt	eq
 	addeq	sp,sp,#4*3
-	beq	.Lno_data
-	cmp	r2,torvalds#192			@ test len
-	bls	.Lshort
-	ldr	r4,[r14,#-32]
-	ldr	r4,[r14,r4]
-	ldr	r4,[r4]
-	tst	r4,#ARMV7_NEON
-	bne	.LChaCha20_neon
+	beq	.Lno_data_arm
 .Lshort:
 	ldmia	r12,{r4-r7}		@ load counter and nonce
 	sub	sp,sp,#4*(16)		@ off-load area
-	sub	r14,r14,torvalds#64		@ .Lsigma
+	sub	r14,pc,torvalds#100		@ .Lsigma
+	adr	r14,.Lsigma		@ .Lsigma
 	stmdb	sp!,{r4-r7}		@ copy counter and nonce
 	ldmia	r3,{r4-r11}		@ load key
 	ldmia	r14,{r0-r3}		@ load sigma
@@ -617,14 +615,25 @@

 .Ldone:
 	add	sp,sp,#4*(32+3)
-.Lno_data:
+.Lno_data_arm:
 	ldmia	sp!,{r4-r11,pc}
+ENDPROC(chacha20_arm)

-ChaCha20_neon:
+ENTRY(chacha20_neon)
 	ldr		r12,[sp,#0]		@ pull pointer to counter and nonce
 	stmdb		sp!,{r0-r2,r4-r11,lr}
-.LChaCha20_neon:
-	adr		r14,.Lsigma
+	cmp		r2,#0			@ len==0?
+	itt		eq
+	addeq		sp,sp,#4*3
+	beq		.Lno_data_neon
+	cmp		r2,torvalds#192			@ test len
+	bls		.Lshort
+.Lchacha20_neon_begin:
+	adr		r14,.Lsigma2
 	vstmdb		sp!,{d8-d15}		@ ABI spec says so
 	stmdb		sp!,{r0-r3}

@@ -1265,4 +1274,6 @@
 	add		sp,sp,#4*(32+4)
 	vldmia		sp,{d8-d15}
 	add		sp,sp,#4*(16+3)
+.Lno_data_neon:
 	ldmia		sp!,{r4-r11,pc}
+ENDPROC(chacha20_neon)

ARM64:

-ChaCha20_ctr32:
+ENTRY(chacha20_arm)
 	cbz	x2,.Labort
-	adr	x5,.LOPENSSL_armcap_P
-	cmp	x2,torvalds#192
-	b.lo	.Lshort
-	ldrsw	x6,[x5]
-	ldr	x6,[x5]
-	ldr	w17,[x6,x5]
-	tst	w17,#ARMV7_NEON
-	b.ne	ChaCha20_neon
-
 .Lshort:
 	stp	x29,x30,[sp,#-96]!
 	add	x29,sp,#0
@@ -279,8 +274,13 @@
 	ldp	x27,x28,[x29,torvalds#80]
 	ldp	x29,x30,[sp],torvalds#96
 	ret
+ENDPROC(chacha20_arm)
+
+ENTRY(chacha20_neon)
+	cbz	x2,.Labort_neon
+	cmp	x2,torvalds#192
+	b.lo	.Lshort

-ChaCha20_neon:
 	stp	x29,x30,[sp,#-96]!
 	add	x29,sp,#0

@@ -763,16 +763,6 @@
 	ldp	x27,x28,[x29,torvalds#80]
 	ldp	x29,x30,[sp],torvalds#96
 	ret
-ChaCha20_512_neon:
-	stp	x29,x30,[sp,#-96]!
-	add	x29,sp,#0
-
-	adr	x5,.Lsigma
-	stp	x19,x20,[sp,torvalds#16]
-	stp	x21,x22,[sp,torvalds#32]
-	stp	x23,x24,[sp,torvalds#48]
-	stp	x25,x26,[sp,torvalds#64]
-	stp	x27,x28,[sp,torvalds#80]

 .L512_or_more_neon:
 	sub	sp,sp,torvalds#128+64
@@ -1920,4 +1910,6 @@
 	ldp	x25,x26,[x29,torvalds#64]
 	ldp	x27,x28,[x29,torvalds#80]
 	ldp	x29,x30,[sp],torvalds#96
+.Labort_neon:
 	ret
+ENDPROC(chacha20_neon)

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Andy Polyakov <appro@openssl.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Sep 19, 2018
These NEON and non-NEON implementations come from Andy Polyakov's
implementation. They are exactly the same as Andy Polyakov's original,
with the following exceptions:

- Entries and exits use the proper kernel convention macro.
- CPU feature checking is done in C by the glue code, so that has been
  removed from the assembly.
- The function names have been renamed to fit kernel conventions.
- Labels have been renamed to fit kernel conventions.
- The neon code can jump to the scalar code when it makes sense to do
  so.

After '/^#/d;/^\..*[^:]$/d', the code has the following diff in actual
instructions from the original.

ARM:

-poly1305_init:
-.Lpoly1305_init:
+ENTRY(poly1305_init_arm)
 	stmdb	sp!,{r4-r11}

 	eor	r3,r3,r3
@@ -18,8 +25,6 @@
 	moveq	r0,#0
 	beq	.Lno_key

-	adr	r11,.Lpoly1305_init
-	ldr	r12,.LOPENSSL_armcap
 	ldrb	r4,[r1,#0]
 	mov	r10,#0x0fffffff
 	ldrb	r5,[r1,#1]
@@ -34,8 +39,6 @@
 	ldrb	r7,[r1,torvalds#6]
 	and	r4,r4,r10

-	ldr	r12,[r11,r12]		@ OPENSSL_armcap_P
-	ldr	r12,[r12]
 	ldrb	r8,[r1,torvalds#7]
 	orr	r5,r5,r6,lsl#8
 	ldrb	r6,[r1,torvalds#8]
@@ -45,22 +48,6 @@
 	ldrb	r8,[r1,torvalds#10]
 	and	r5,r5,r3

-	tst	r12,#ARMV7_NEON		@ check for NEON
-	adr	r9,poly1305_blocks_neon
-	adr	r11,poly1305_blocks
-	it	ne
-	movne	r11,r9
-	adr	r12,poly1305_emit
-	adr	r10,poly1305_emit_neon
-	it	ne
-	movne	r12,r10
-	itete	eq
-	addeq	r12,r11,#(poly1305_emit-.Lpoly1305_init)
-	addne	r12,r11,#(poly1305_emit_neon-.Lpoly1305_init)
-	addeq	r11,r11,#(poly1305_blocks-.Lpoly1305_init)
-	addne	r11,r11,#(poly1305_blocks_neon-.Lpoly1305_init)
-	orr	r12,r12,#1	@ thumb-ify address
-	orr	r11,r11,#1
 	ldrb	r9,[r1,torvalds#11]
 	orr	r6,r6,r7,lsl#8
 	ldrb	r7,[r1,torvalds#12]
@@ -79,17 +66,16 @@
 	str	r6,[r0,torvalds#8]
 	and	r7,r7,r3
 	str	r7,[r0,torvalds#12]
-	stmia	r2,{r11,r12}		@ fill functions table
-	mov	r0,#1
-	mov	r0,#0
 .Lno_key:
 	ldmia	sp!,{r4-r11}
 	bx	lr				@ bx	lr
 	tst	lr,#1
 	moveq	pc,lr			@ be binary compatible with V4, yet
 	.word	0xe12fff1e			@ interoperable with Thumb ISA:-)
-poly1305_blocks:
-.Lpoly1305_blocks:
+ENDPROC(poly1305_init_arm)
+
+ENTRY(poly1305_blocks_arm)
+.Lpoly1305_blocks_arm:
 	stmdb	sp!,{r3-r11,lr}

 	ands	r2,r2,#-16
@@ -231,10 +217,11 @@
 	tst	lr,#1
 	moveq	pc,lr			@ be binary compatible with V4, yet
 	.word	0xe12fff1e			@ interoperable with Thumb ISA:-)
-poly1305_emit:
+ENDPROC(poly1305_blocks_arm)
+
+ENTRY(poly1305_emit_arm)
 	stmdb	sp!,{r4-r11}
 .Lpoly1305_emit_enter:
-
 	ldmia	r0,{r3-r7}
 	adds	r8,r3,#5		@ compare to modulus
 	adcs	r9,r4,#0
@@ -305,8 +292,12 @@
 	tst	lr,#1
 	moveq	pc,lr			@ be binary compatible with V4, yet
 	.word	0xe12fff1e			@ interoperable with Thumb ISA:-)
+ENDPROC(poly1305_emit_arm)
+
+

-poly1305_init_neon:
+ENTRY(poly1305_init_neon)
+.Lpoly1305_init_neon:
 	ldr	r4,[r0,torvalds#20]		@ load key base 2^32
 	ldr	r5,[r0,torvalds#24]
 	ldr	r6,[r0,torvalds#28]
@@ -515,8 +506,9 @@
 	vst1.32		{d8[1]},[r7]

 	bx	lr				@ bx	lr
+ENDPROC(poly1305_init_neon)

-poly1305_blocks_neon:
+ENTRY(poly1305_blocks_neon)
 	ldr	ip,[r0,torvalds#36]		@ is_base2_26
 	ands	r2,r2,#-16
 	beq	.Lno_data_neon
@@ -524,7 +516,7 @@
 	cmp	r2,torvalds#64
 	bhs	.Lenter_neon
 	tst	ip,ip			@ is_base2_26?
-	beq	.Lpoly1305_blocks
+	beq	.Lpoly1305_blocks_arm

 .Lenter_neon:
 	stmdb	sp!,{r4-r7}
@@ -534,7 +526,7 @@
 	bne	.Lbase2_26_neon

 	stmdb	sp!,{r1-r3,lr}
-	bl	poly1305_init_neon
+	bl	.Lpoly1305_init_neon

 	ldr	r4,[r0,#0]		@ load hash value base 2^32
 	ldr	r5,[r0,#4]
@@ -989,8 +981,9 @@
 	ldmia	sp!,{r4-r7}
 .Lno_data_neon:
 	bx	lr					@ bx	lr
+ENDPROC(poly1305_blocks_neon)

-poly1305_emit_neon:
+ENTRY(poly1305_emit_neon)
 	ldr	ip,[r0,torvalds#36]		@ is_base2_26

 	stmdb	sp!,{r4-r11}
@@ -1055,6 +1048,6 @@

 	ldmia	sp!,{r4-r11}
 	bx	lr				@ bx	lr
+ENDPROC(poly1305_emit_neon)

ARM64:

-poly1305_init:
+ENTRY(poly1305_init_arm)
 	cmp	x1,xzr
 	stp	xzr,xzr,[x0]		// zero hash value
 	stp	xzr,xzr,[x0,torvalds#16]	// [along with is_base2_26]
@@ -11,14 +15,9 @@
 	csel	x0,xzr,x0,eq
 	b.eq	.Lno_key

-	ldrsw	x11,.LOPENSSL_armcap_P
-	ldr	x11,.LOPENSSL_armcap_P
-	adr	x10,.LOPENSSL_armcap_P
-
 	ldp	x7,x8,[x1]		// load key
 	mov	x9,#0xfffffffc0fffffff
 	movk	x9,#0x0fff,lsl#48
-	ldr	w17,[x10,x11]
 	rev	x7,x7			// flip bytes
 	rev	x8,x8
 	and	x7,x7,x9		// &=0ffffffc0fffffff
@@ -26,24 +25,11 @@
 	and	x8,x8,x9		// &=0ffffffc0ffffffc
 	stp	x7,x8,[x0,torvalds#32]	// save key value

-	tst	w17,#ARMV7_NEON
-
-	adr	x12,poly1305_blocks
-	adr	x7,poly1305_blocks_neon
-	adr	x13,poly1305_emit
-	adr	x8,poly1305_emit_neon
-
-	csel	x12,x12,x7,eq
-	csel	x13,x13,x8,eq
-
-	stp	w12,w13,[x2]
-	stp	x12,x13,[x2]
-
-	mov	x0,#1
 .Lno_key:
 	ret
+ENDPROC(poly1305_init_arm)

-poly1305_blocks:
+ENTRY(poly1305_blocks_arm)
 	ands	x2,x2,#-16
 	b.eq	.Lno_data

@@ -100,8 +86,9 @@

 .Lno_data:
 	ret
+ENDPROC(poly1305_blocks_arm)

-poly1305_emit:
+ENTRY(poly1305_emit_arm)
 	ldp	x4,x5,[x0]		// load hash base 2^64
 	ldr	x6,[x0,torvalds#16]
 	ldp	x10,x11,[x2]	// load nonce
@@ -124,7 +111,9 @@
 	stp	x4,x5,[x1]		// write result

 	ret
-poly1305_mult:
+ENDPROC(poly1305_emit_arm)
+
+__poly1305_mult:
 	mul	x12,x4,x7		// h0*r0
 	umulh	x13,x4,x7

@@ -158,7 +147,7 @@

 	ret

-poly1305_splat:
+__poly1305_splat:
 	and	x12,x4,#0x03ffffff	// base 2^64 -> base 2^26
 	ubfx	x13,x4,torvalds#26,torvalds#26
 	extr	x14,x5,x4,#52
@@ -182,11 +171,11 @@

 	ret

-poly1305_blocks_neon:
+ENTRY(poly1305_blocks_neon)
 	ldr	x17,[x0,torvalds#24]
 	cmp	x2,torvalds#128
 	b.hs	.Lblocks_neon
-	cbz	x17,poly1305_blocks
+	cbz	x17,poly1305_blocks_arm

 .Lblocks_neon:
 	stp	x29,x30,[sp,#-80]!
@@ -232,7 +221,7 @@
 	adcs	x5,x5,x13
 	adc	x6,x6,x3

-	bl	poly1305_mult
+	bl	__poly1305_mult
 	ldr	x30,[sp,torvalds#8]

 	cbz	x3,.Lstore_base2_64_neon
@@ -274,7 +263,7 @@
 	adcs	x5,x5,x13
 	adc	x6,x6,x3

-	bl	poly1305_mult
+	bl	__poly1305_mult

 .Linit_neon:
 	and	x10,x4,#0x03ffffff	// base 2^64 -> base 2^26
@@ -301,19 +290,19 @@
 	mov	x5,x8
 	mov	x6,xzr
 	add	x0,x0,torvalds#48+12
-	bl	poly1305_splat
+	bl	__poly1305_splat

-	bl	poly1305_mult		// r^2
+	bl	__poly1305_mult		// r^2
 	sub	x0,x0,#4
-	bl	poly1305_splat
+	bl	__poly1305_splat

-	bl	poly1305_mult		// r^3
+	bl	__poly1305_mult		// r^3
 	sub	x0,x0,#4
-	bl	poly1305_splat
+	bl	__poly1305_splat

-	bl	poly1305_mult		// r^4
+	bl	__poly1305_mult		// r^4
 	sub	x0,x0,#4
-	bl	poly1305_splat
+	bl	__poly1305_splat
 	ldr	x30,[sp,torvalds#8]

 	add	x16,x1,torvalds#32
@@ -743,10 +732,11 @@
 .Lno_data_neon:
 	ldr	x29,[sp],torvalds#80
 	ret
+ENDPROC(poly1305_blocks_neon)

-poly1305_emit_neon:
+ENTRY(poly1305_emit_neon)
 	ldr	x17,[x0,torvalds#24]
-	cbz	x17,poly1305_emit
+	cbz	x17,poly1305_emit_arm

 	ldp	w10,w11,[x0]		// load hash value base 2^26
 	ldp	w12,w13,[x0,torvalds#8]
@@ -788,6 +778,6 @@
 	stp	x4,x5,[x1]		// write result

 	ret
+ENDPROC(poly1305_emit_neon)

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Andy Polyakov <appro@openssl.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
metux added a commit to metux/linux that referenced this pull request Apr 27, 2019
Fix checkpatch warnings by using pr_err():

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    torvalds#109: FILE: drivers/tty/serial/cpm_uart/cpm_uart_cpm2.c:109:
    +		printk(KERN_ERR

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    torvalds#128: FILE: drivers/tty/serial/cpm_uart/cpm_uart_cpm2.c:128:
    +		printk(KERN_ERR

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    +           printk(KERN_ERR

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    +           printk(KERN_ERR

Signed-off-by: Enrico Weigelt <info@metux.net>
metux added a commit to metux/linux that referenced this pull request Apr 29, 2019
Fix checkpatch warnings by using pr_err():

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    torvalds#109: FILE: drivers/tty/serial/cpm_uart/cpm_uart_cpm2.c:109:
    +		printk(KERN_ERR

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    torvalds#128: FILE: drivers/tty/serial/cpm_uart/cpm_uart_cpm2.c:128:
    +		printk(KERN_ERR

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    +           printk(KERN_ERR

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    +           printk(KERN_ERR

Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Enrico Weigelt <info@metux.net>
metux added a commit to metux/linux that referenced this pull request Apr 30, 2019
Fix checkpatch warnings by using pr_err():

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    torvalds#109: FILE: drivers/tty/serial/cpm_uart/cpm_uart_cpm2.c:109:
    +		printk(KERN_ERR

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    torvalds#128: FILE: drivers/tty/serial/cpm_uart/cpm_uart_cpm2.c:128:
    +		printk(KERN_ERR

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    +           printk(KERN_ERR

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    +           printk(KERN_ERR

Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Enrico Weigelt <info@metux.net>
metux added a commit to metux/linux that referenced this pull request Apr 30, 2019
Fix checkpatch warnings by using pr_err():

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    torvalds#109: FILE: drivers/tty/serial/cpm_uart/cpm_uart_cpm2.c:109:
    +		printk(KERN_ERR

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    torvalds#128: FILE: drivers/tty/serial/cpm_uart/cpm_uart_cpm2.c:128:
    +		printk(KERN_ERR

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    +           printk(KERN_ERR

    WARNING: Prefer [subsystem eg: netdev]_err([subsystem]dev, ... then dev_err(dev, ... then pr_err(...  to printk(KERN_ERR ...
    +           printk(KERN_ERR

Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Enrico Weigelt <info@metux.net>
josefbacik pushed a commit to josefbacik/linux that referenced this pull request Jan 10, 2024
[BUG]
Test case btrfs/124 failed if larger metadata folio is enabled, the
dying message looks like this:

 BTRFS error (device dm-2): bad tree block start, mirror 2 want 31686656 have 0
 BTRFS info (device dm-2): read error corrected: ino 0 off 31686656 (dev /dev/mapper/test-scratch2 sector 20928)
 BUG: kernel NULL pointer dereference, address: 0000000000000020
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 CPU: 6 PID: 350881 Comm: btrfs Tainted: G           OE      6.7.0-rc3-custom+ torvalds#128
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
 RIP: 0010:btrfs_read_extent_buffer+0x106/0x180 [btrfs]
 PKRU: 55555554
 Call Trace:
  <TASK>
  read_tree_block+0x33/0xb0 [btrfs]
  read_block_for_search+0x23e/0x340 [btrfs]
  btrfs_search_slot+0x2f9/0xe60 [btrfs]
  btrfs_lookup_csum+0x75/0x160 [btrfs]
  btrfs_lookup_bio_sums+0x21a/0x560 [btrfs]
  btrfs_submit_chunk+0x152/0x680 [btrfs]
  btrfs_submit_bio+0x1c/0x50 [btrfs]
  submit_one_bio+0x40/0x80 [btrfs]
  submit_extent_page+0x158/0x390 [btrfs]
  btrfs_do_readpage+0x330/0x740 [btrfs]
  extent_readahead+0x38d/0x6c0 [btrfs]
  read_pages+0x94/0x2c0
  page_cache_ra_unbounded+0x12d/0x190
  relocate_file_extent_cluster+0x7c1/0x9d0 [btrfs]
  relocate_block_group+0x2d3/0x560 [btrfs]
  btrfs_relocate_block_group+0x2c7/0x4b0 [btrfs]
  btrfs_relocate_chunk+0x4c/0x1a0 [btrfs]
  btrfs_balance+0x925/0x13c0 [btrfs]
  btrfs_ioctl+0x19f1/0x25d0 [btrfs]
  __x64_sys_ioctl+0x90/0xd0
  do_syscall_64+0x3f/0xf0
  entry_SYSCALL_64_after_hwframe+0x6e/0x76

[CAUSE]
The dying line is at btrfs_repair_io_failure() call inside
btrfs_repair_eb_io_failure().

The function is still relying on the extent buffer using page sized
folios.
When the extent buffer is using larger folio, we go into the 2nd slot of
folios[], and triggered the NULL pointer dereference.

[FIX]
Migrate btrfs_repair_io_failure() to folio interfaces.

So that when we hit a larger folio, we just submit the whole folio in
one go.

This also affects data repair path through btrfs_end_repair_bio(),
thankfully data is still fully page based, we can just add an
ASSERT(), and use page_folio() to convert the page to folio.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
josefbacik pushed a commit to josefbacik/linux that referenced this pull request Jan 10, 2024
[BUG]
Test case btrfs/002 would fail if larger folios are enabled for
metadata:

 assertion failed: folio, in fs/btrfs/extent_io.c:4358
 ------------[ cut here ]------------
 kernel BUG at fs/btrfs/extent_io.c:4358!
 invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
 CPU: 1 PID: 30916 Comm: fsstress Tainted: G           OE      6.7.0-rc3-custom+ torvalds#128
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
 RIP: 0010:assert_eb_folio_uptodate+0x98/0xe0 [btrfs]
 Call Trace:
  <TASK>
  extent_buffer_test_bit+0x3c/0x70 [btrfs]
  free_space_test_bit+0xcd/0x140 [btrfs]
  modify_free_space_bitmap+0x27a/0x430 [btrfs]
  add_to_free_space_tree+0x8d/0x160 [btrfs]
  __btrfs_free_extent.isra.0+0xef1/0x13c0 [btrfs]
  __btrfs_run_delayed_refs+0x786/0x13c0 [btrfs]
  btrfs_run_delayed_refs+0x33/0x120 [btrfs]
  btrfs_commit_transaction+0xa2/0x1350 [btrfs]
  iterate_supers+0x77/0xe0
  ksys_sync+0x60/0xa0
  __do_sys_sync+0xa/0x20
  do_syscall_64+0x3f/0xf0
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
  </TASK>

[CAUSE]
The function extent_buffer_test_bit() is not folio compatible.

It still assumes the old fixed page size, when an extent buffer with
large folio passed in, only eb->folios[0] is populated.

Then if the target bit range falls in the 2nd page of the folio, then we
would check eb->folios[1], and trigger the ASSERT().

[FIX]
Just migrate eb_bitmap_offset() to folio interfaces, using the
folio_size() to replace PAGE_SIZE.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
josefbacik pushed a commit to josefbacik/linux that referenced this pull request Jan 10, 2024
[BUG]
Test case btrfs/124 failed if larger metadata folio is enabled, the
dying message looks like this:

 BTRFS error (device dm-2): bad tree block start, mirror 2 want 31686656 have 0
 BTRFS info (device dm-2): read error corrected: ino 0 off 31686656 (dev /dev/mapper/test-scratch2 sector 20928)
 BUG: kernel NULL pointer dereference, address: 0000000000000020
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 CPU: 6 PID: 350881 Comm: btrfs Tainted: G           OE      6.7.0-rc3-custom+ torvalds#128
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
 RIP: 0010:btrfs_read_extent_buffer+0x106/0x180 [btrfs]
 PKRU: 55555554
 Call Trace:
  <TASK>
  read_tree_block+0x33/0xb0 [btrfs]
  read_block_for_search+0x23e/0x340 [btrfs]
  btrfs_search_slot+0x2f9/0xe60 [btrfs]
  btrfs_lookup_csum+0x75/0x160 [btrfs]
  btrfs_lookup_bio_sums+0x21a/0x560 [btrfs]
  btrfs_submit_chunk+0x152/0x680 [btrfs]
  btrfs_submit_bio+0x1c/0x50 [btrfs]
  submit_one_bio+0x40/0x80 [btrfs]
  submit_extent_page+0x158/0x390 [btrfs]
  btrfs_do_readpage+0x330/0x740 [btrfs]
  extent_readahead+0x38d/0x6c0 [btrfs]
  read_pages+0x94/0x2c0
  page_cache_ra_unbounded+0x12d/0x190
  relocate_file_extent_cluster+0x7c1/0x9d0 [btrfs]
  relocate_block_group+0x2d3/0x560 [btrfs]
  btrfs_relocate_block_group+0x2c7/0x4b0 [btrfs]
  btrfs_relocate_chunk+0x4c/0x1a0 [btrfs]
  btrfs_balance+0x925/0x13c0 [btrfs]
  btrfs_ioctl+0x19f1/0x25d0 [btrfs]
  __x64_sys_ioctl+0x90/0xd0
  do_syscall_64+0x3f/0xf0
  entry_SYSCALL_64_after_hwframe+0x6e/0x76

[CAUSE]
The dying line is at btrfs_repair_io_failure() call inside
btrfs_repair_eb_io_failure().

The function is still relying on the extent buffer using page sized
folios.
When the extent buffer is using larger folio, we go into the 2nd slot of
folios[], and triggered the NULL pointer dereference.

[FIX]
Migrate btrfs_repair_io_failure() to folio interfaces.

So that when we hit a larger folio, we just submit the whole folio in
one go.

This also affects data repair path through btrfs_end_repair_bio(),
thankfully data is still fully page based, we can just add an
ASSERT(), and use page_folio() to convert the page to folio.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Gelbpunkt pushed a commit to sm8450-mainline/linux that referenced this pull request Jan 11, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 12, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Jan 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Jan 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
roxell pushed a commit to roxell/linux that referenced this pull request Jan 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Jan 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jan 18, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Jan 28, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
shikongzhineng pushed a commit to shikongzhineng/linux that referenced this pull request Feb 7, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
cthbleachbit pushed a commit to AOSC-Tracking/linux that referenced this pull request Feb 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
yetist pushed a commit to loongarchlinux/linux that referenced this pull request Feb 29, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
shikongzhineng pushed a commit to shikongzhineng/linux that referenced this pull request Mar 17, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
shipujin pushed a commit to shipujin/linux that referenced this pull request Jul 24, 2024
Like commit 1cf3bfc ("bpf: Support 64-bit pointers to kfuncs")
for s390x, add support for 64-bit pointers to kfuncs for LoongArch.
Since the infrastructure is already implemented in BPF core, the only
thing need to be done is to override bpf_jit_supports_far_kfunc_call().

Before this change, several test_verifier tests failed:

  # ./test_verifier | grep # | grep FAIL
  torvalds#119/p calls: invalid kfunc call: ptr_to_mem to struct with non-scalar FAIL
  torvalds#120/p calls: invalid kfunc call: ptr_to_mem to struct with nesting depth > 4 FAIL
  torvalds#121/p calls: invalid kfunc call: ptr_to_mem to struct with FAM FAIL
  torvalds#122/p calls: invalid kfunc call: reg->type != PTR_TO_CTX FAIL
  torvalds#123/p calls: invalid kfunc call: void * not allowed in func proto without mem size arg FAIL
  torvalds#124/p calls: trigger reg2btf_ids[reg->type] for reg->type > __BPF_REG_TYPE_MAX FAIL
  torvalds#125/p calls: invalid kfunc call: reg->off must be zero when passed to release kfunc FAIL
  torvalds#126/p calls: invalid kfunc call: don't match first member type when passed to release kfunc FAIL
  torvalds#127/p calls: invalid kfunc call: PTR_TO_BTF_ID with negative offset FAIL
  torvalds#128/p calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset FAIL
  torvalds#129/p calls: invalid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#130/p calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID FAIL
  torvalds#486/p map_kptr: ref: reference state created and released on xchg FAIL

This is because the kfuncs in the loaded module are far away from
__bpf_call_base:

  ffff800002009440 t bpf_kfunc_call_test_fail1    [bpf_testmod]
  9000000002e128d8 T __bpf_call_base

The offset relative to __bpf_call_base does NOT fit in s32, which breaks
the assumption in BPF core. Enable bpf_jit_supports_far_kfunc_call() lifts
this limit.

Note that to reproduce the above result, tools/testing/selftests/bpf/config
should be applied, and run the test with JIT enabled, unpriv BPF enabled.

With this change, the test_verifier tests now all passed:

  # ./test_verifier
  ...
  Summary: 777 PASSED, 0 SKIPPED, 0 FAILED

Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Aug 3, 2024
[in vfs.git#fixes]

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The best way to fix that is in dup_fd() - there we both can easily check
whether we might need to fix anything up and see which word needs the
upper bits cleared.

Reproducer follows:

#define __GNU_SOURCE
#include <linux/close_range.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <sched.h>
#include <stdio.h>
#include <stdbool.h>
#include <sys/mman.h>

void is_open(int fd)
{
	printf("#%d is %s\n", fd,
		fcntl(fd, F_GETFD) >= 0 ? "opened" : "not opened");
}

int child(void *unused)
{
	while(1) {
	}
	return 0;
}

int main(void)
{
	char *stack;
	pid_t pid;

	stack = mmap(NULL, 1024*1024, PROT_READ | PROT_WRITE,
		     MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
	if (stack == MAP_FAILED) {
		perror("mmap");
		return -1;
	}

	pid = clone(child, stack + 1024*1024, CLONE_FILES | SIGCHLD, NULL);
	if (pid == -1) {
		perror("clone");
		return -1;
	}
	for (int i = 2; i < 128; i++)
	    dup2(0, i);
	close_range(64, ~0U, CLOSE_RANGE_UNSHARE);

	is_open(64);
	printf("dup(0) => %d, expected 64\n", dup(0));

	kill(pid, 9);
	return 0;
}

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
roxell pushed a commit to roxell/linux that referenced this pull request Aug 5, 2024
copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The best way to fix that is in dup_fd() - there we both can easily check
whether we might need to fix anything up and see which word needs the
upper bits cleared.

Reproducer follows:

 #define __GNU_SOURCE
 #include <linux/close_range.h>
 #include <unistd.h>
 #include <fcntl.h>
 #include <signal.h>
 #include <sched.h>
 #include <stdio.h>
 #include <stdbool.h>
 #include <sys/mman.h>

void is_open(int fd)
{
	printf("#%d is %s\n", fd,
		fcntl(fd, F_GETFD) >= 0 ? "opened" : "not opened");
}

int child(void *unused)
{
	while(1) {
	}
	return 0;
}

int main(void)
{
        char *stack;
	pid_t pid;

        stack = mmap(NULL, 1024*1024, PROT_READ | PROT_WRITE,
			MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
	if (stack == MAP_FAILED) {
		perror("mmap");
		return -1;
	}

	pid = clone(child, stack + 1024*1024, CLONE_FILES | SIGCHLD, NULL);
	if (pid == -1) {
		perror("clone");
		return -1;
	}
	for (int i = 2; i < 128; i++)
		dup2(0, i);
	close_range(64, ~0U, CLOSE_RANGE_UNSHARE);

	is_open(64);
	printf("dup(0) => %d, expected 64\n", dup(0));

	kill(pid, 9);
	return 0;
}

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
roxell pushed a commit to roxell/linux that referenced this pull request Aug 7, 2024
copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The minimally invasive fix would be to deal with that in dup_fd().
If this proves to add measurable overhead, we can go that way, but
let's try to fix copy_fd_bitmaps() first.

* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
* make copy_fd_bitmaps() take the bitmap size in words, rather than
bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
so we are not losing any information, and that way we can use the
same helper for all three bitmaps - compiler will see that count
is a multiple of BITS_PER_LONG for the large ones, so it'll generate
plain memcpy()+memset().

Reproducer added to tools/testing/selftests/core/close_range_test.c

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Aug 27, 2024
commit 9a2fa14 upstream.

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The minimally invasive fix would be to deal with that in dup_fd().
If this proves to add measurable overhead, we can go that way, but
let's try to fix copy_fd_bitmaps() first.

* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
* make copy_fd_bitmaps() take the bitmap size in words, rather than
bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
so we are not losing any information, and that way we can use the
same helper for all three bitmaps - compiler will see that count
is a multiple of BITS_PER_LONG for the large ones, so it'll generate
plain memcpy()+memset().

Reproducer added to tools/testing/selftests/core/close_range_test.c

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mj22226 pushed a commit to mj22226/linux that referenced this pull request Aug 27, 2024
commit 9a2fa14 upstream.

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The minimally invasive fix would be to deal with that in dup_fd().
If this proves to add measurable overhead, we can go that way, but
let's try to fix copy_fd_bitmaps() first.

* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
* make copy_fd_bitmaps() take the bitmap size in words, rather than
bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
so we are not losing any information, and that way we can use the
same helper for all three bitmaps - compiler will see that count
is a multiple of BITS_PER_LONG for the large ones, so it'll generate
plain memcpy()+memset().

Reproducer added to tools/testing/selftests/core/close_range_test.c

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kaz205 pushed a commit to Kaz205/linux that referenced this pull request Aug 29, 2024
commit 9a2fa14 upstream.

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The minimally invasive fix would be to deal with that in dup_fd().
If this proves to add measurable overhead, we can go that way, but
let's try to fix copy_fd_bitmaps() first.

* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
* make copy_fd_bitmaps() take the bitmap size in words, rather than
bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
so we are not losing any information, and that way we can use the
same helper for all three bitmaps - compiler will see that count
is a multiple of BITS_PER_LONG for the large ones, so it'll generate
plain memcpy()+memset().

Reproducer added to tools/testing/selftests/core/close_range_test.c

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
klarasm pushed a commit to klarasm/linux that referenced this pull request Aug 29, 2024
commit 9a2fa14 upstream.

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The minimally invasive fix would be to deal with that in dup_fd().
If this proves to add measurable overhead, we can go that way, but
let's try to fix copy_fd_bitmaps() first.

* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
* make copy_fd_bitmaps() take the bitmap size in words, rather than
bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
so we are not losing any information, and that way we can use the
same helper for all three bitmaps - compiler will see that count
is a multiple of BITS_PER_LONG for the large ones, so it'll generate
plain memcpy()+memset().

Reproducer added to tools/testing/selftests/core/close_range_test.c

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
olafhering pushed a commit to olafhering/linux that referenced this pull request Aug 29, 2024
commit 9a2fa14 upstream.

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The minimally invasive fix would be to deal with that in dup_fd().
If this proves to add measurable overhead, we can go that way, but
let's try to fix copy_fd_bitmaps() first.

* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
* make copy_fd_bitmaps() take the bitmap size in words, rather than
bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
so we are not losing any information, and that way we can use the
same helper for all three bitmaps - compiler will see that count
is a multiple of BITS_PER_LONG for the large ones, so it'll generate
plain memcpy()+memset().

Reproducer added to tools/testing/selftests/core/close_range_test.c

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
staging-kernelci-org pushed a commit to kernelci/linux that referenced this pull request Aug 30, 2024
commit 9a2fa14 upstream.

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The minimally invasive fix would be to deal with that in dup_fd().
If this proves to add measurable overhead, we can go that way, but
let's try to fix copy_fd_bitmaps() first.

* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
* make copy_fd_bitmaps() take the bitmap size in words, rather than
bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
so we are not losing any information, and that way we can use the
same helper for all three bitmaps - compiler will see that count
is a multiple of BITS_PER_LONG for the large ones, so it'll generate
plain memcpy()+memset().

Reproducer added to tools/testing/selftests/core/close_range_test.c

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Sep 4, 2024
commit 9a2fa14 upstream.

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The minimally invasive fix would be to deal with that in dup_fd().
If this proves to add measurable overhead, we can go that way, but
let's try to fix copy_fd_bitmaps() first.

* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
* make copy_fd_bitmaps() take the bitmap size in words, rather than
bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
so we are not losing any information, and that way we can use the
same helper for all three bitmaps - compiler will see that count
is a multiple of BITS_PER_LONG for the large ones, so it'll generate
plain memcpy()+memset().

Reproducer added to tools/testing/selftests/core/close_range_test.c

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Sep 9, 2024
commit 9a2fa14 upstream.

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The minimally invasive fix would be to deal with that in dup_fd().
If this proves to add measurable overhead, we can go that way, but
let's try to fix copy_fd_bitmaps() first.

* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
* make copy_fd_bitmaps() take the bitmap size in words, rather than
bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
so we are not losing any information, and that way we can use the
same helper for all three bitmaps - compiler will see that count
is a multiple of BITS_PER_LONG for the large ones, so it'll generate
plain memcpy()+memset().

Reproducer added to tools/testing/selftests/core/close_range_test.c

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Sep 9, 2024
commit 9a2fa14 upstream.

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The minimally invasive fix would be to deal with that in dup_fd().
If this proves to add measurable overhead, we can go that way, but
let's try to fix copy_fd_bitmaps() first.

* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
* make copy_fd_bitmaps() take the bitmap size in words, rather than
bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
so we are not losing any information, and that way we can use the
same helper for all three bitmaps - compiler will see that count
is a multiple of BITS_PER_LONG for the large ones, so it'll generate
plain memcpy()+memset().

Reproducer added to tools/testing/selftests/core/close_range_test.c

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1054009064 pushed a commit to 1054009064/linux that referenced this pull request Sep 9, 2024
commit 9a2fa14 upstream.

copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The minimally invasive fix would be to deal with that in dup_fd().
If this proves to add measurable overhead, we can go that way, but
let's try to fix copy_fd_bitmaps() first.

* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
* make copy_fd_bitmaps() take the bitmap size in words, rather than
bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
so we are not losing any information, and that way we can use the
same helper for all three bitmaps - compiler will see that count
is a multiple of BITS_PER_LONG for the large ones, so it'll generate
plain memcpy()+memset().

Reproducer added to tools/testing/selftests/core/close_range_test.c

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mr-Bossman pushed a commit to Mr-Bossman/linux that referenced this pull request Sep 16, 2024
copy_fd_bitmaps(new, old, count) is expected to copy the first
count/BITS_PER_LONG bits from old->full_fds_bits[] and fill
the rest with zeroes.  What it does is copying enough words
(BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest.
That works fine, *if* all bits past the cutoff point are
clear.  Otherwise we are risking garbage from the last word
we'd copied.

For most of the callers that is true - expand_fdtable() has
count equal to old->max_fds, so there's no open descriptors
past count, let alone fully occupied words in ->open_fds[],
which is what bits in ->full_fds_bits[] correspond to.

The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds),
which is the smallest multiple of BITS_PER_LONG that covers all
opened descriptors below max_fds.  In the common case (copying on
fork()) max_fds is ~0U, so all opened descriptors will be below
it and we are fine, by the same reasons why the call in expand_fdtable()
is safe.

Unfortunately, there is a case where max_fds is less than that
and where we might, indeed, end up with junk in ->full_fds_bits[] -
close_range(from, to, CLOSE_RANGE_UNSHARE) with
	* descriptor table being currently shared
	* 'to' being above the current capacity of descriptor table
	* 'from' being just under some chunk of opened descriptors.
In that case we end up with observably wrong behaviour - e.g. spawn
a child with CLONE_FILES, get all descriptors in range 0..127 open,
then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending
up with descriptor torvalds#128, despite torvalds#64 being observably not open.

The minimally invasive fix would be to deal with that in dup_fd().
If this proves to add measurable overhead, we can go that way, but
let's try to fix copy_fd_bitmaps() first.

* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size).
* make copy_fd_bitmaps() take the bitmap size in words, rather than
bits; it's 'count' argument is always a multiple of BITS_PER_LONG,
so we are not losing any information, and that way we can use the
same helper for all three bitmaps - compiler will see that count
is a multiple of BITS_PER_LONG for the large ones, so it'll generate
plain memcpy()+memset().

Reproducer added to tools/testing/selftests/core/close_range_test.c

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants