This repository has been archived by the owner on Nov 25, 2023. It is now read-only.
forked from rockchip-linux/kernel
-
Notifications
You must be signed in to change notification settings - Fork 60
When i plug in usb camera i get 2 devices #5
Comments
T-Firefly
pushed a commit
that referenced
this issue
Aug 26, 2019
rc_unregister_device() will first call ir_free_table(), and later device_del(); however, the latter causes a call to rc_dev_uevent(), which prints rc_map.name, which at this point has already bee freed. This fixes a use-after-free bug found with KASAN. As reported by Shuah: "I am seeing the following when I do rmmod on au0828 BUG: KASAN: use-after-free in string+0x170/0x1f0 at addr ffff8801bd513000 Read of size 1 by task rmmod/1831 CPU: 1 PID: 1831 Comm: rmmod Tainted: G W 4.9.0-rc5 #5 Hardware name: Hewlett-Packard HP ProBook 6475b/180F, BIOS 68TTU Ver. F.04 08/03/2012 ffff8801aea2f680 ffffffff81b37ad3 ffff8801fa403b80 ffff8801bd513000 ffff8801aea2f6a8 ffffffff8156c301 ffff8801aea2f738 ffff8801bd513000 ffff8801fa403b80 ffff8801aea2f728 ffffffff8156c59a ffff8801aea2f770 Call Trace: dump_stack+0x67/0x94 [<ffffffff8156c301>] kasan_object_err+0x21/0x70 [<ffffffff8156c59a>] kasan_report_error+0x1fa/0x4d0 [<ffffffffa116f05f>] ? au0828_exit+0x10/0x21 [au0828] [<ffffffff8156c8b3>] __asan_report_load1_noabort+0x43/0x50 [<ffffffff81b58b20>] ? string+0x170/0x1f0 [<ffffffff81b58b20>] string+0x170/0x1f0 [<ffffffff81b621c4>] vsnprintf+0x374/0x1c50 [<ffffffff81b61e50>] ? pointer+0xa80/0xa80 [<ffffffff8156b676>] ? save_stack+0x46/0xd0 [<ffffffff81566faa>] ? __kmalloc+0x14a/0x2a0 [<ffffffff81b3d70a>] ? kobject_get_path+0x9a/0x200 [<ffffffff81b408c2>] ? kobject_uevent_env+0x282/0xca0 [<ffffffff81b412eb>] ? kobject_uevent+0xb/0x10 [<ffffffff81f10104>] ? device_del+0x434/0x6d0 [<ffffffffa0fea717>] ? rc_unregister_device+0x177/0x240 [rc_core] [<ffffffffa116eeb0>] ? au0828_rc_unregister+0x60/0xb0 [au0828] The problem is fixed with this patch on Linux 4.9-rc4" Signed-off-by: Max Kellermann <max.kellermann@gmail.com> Tested-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> (cherry picked from commit c183d35) Signed-off-by: Ziyuan Xu <xzy.xu@rock-chips.com>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Sep 28, 2019
stub_probe() calls put_busid_priv() in an error path when device isn't found in the busid_table. Fix it by making put_busid_priv() safe to be called with null struct bus_id_priv pointer. This problem happens when "usbip bind" is run without loading usbip_host driver and then running modprobe. The first failed bind attempt unbinds the device from the original driver and when usbip_host is modprobed, stub_probe() runs and doesn't find the device in its busid table and calls put_busid_priv(0 with null bus_id_priv pointer. usbip-host 3-10.2: 3-10.2 is not in match_busid table... skip! [ 367.359679] ===================================== [ 367.359681] WARNING: bad unlock balance detected! [ 367.359683] 4.17.0-rc4+ FireflyTeam#5 Not tainted [ 367.359685] ------------------------------------- [ 367.359688] modprobe/2768 is trying to release lock ( [ 367.359689] ================================================================== [ 367.359696] BUG: KASAN: null-ptr-deref in print_unlock_imbalance_bug+0x99/0x110 [ 367.359699] Read of size 8 at addr 0000000000000058 by task modprobe/2768 [ 367.359705] CPU: 4 PID: 2768 Comm: modprobe Not tainted 4.17.0-rc4+ FireflyTeam#5 Fixes: 2207655 ("usbip: usbip_host: fix NULL-ptr deref and use-after-free errors") in usb-linus Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Sep 29, 2019
[ Upstream commit f80c5da ] This commit makes the kernel not send the next queued HCI command until a command complete arrives for the last HCI command sent to the controller. This change avoids a problem with some buggy controllers (seen on two SKUs of QCA9377) that send an extra command complete event for the previous command after the kernel had already sent a new HCI command to the controller. The problem was reproduced when starting an active scanning procedure, where an extra command complete event arrives for the LE_SET_RANDOM_ADDR command. When this happends the kernel ends up not processing the command complete for the following commmand, LE_SET_SCAN_PARAM, and ultimately behaving as if a passive scanning procedure was being performed, when in fact controller is performing an active scanning procedure. This makes it impossible to discover BLE devices as no device found events are sent to userspace. This problem is reproducible on 100% of the attempts on the affected controllers. The extra command complete event can be seen at timestamp 27.420131 on the btmon logs bellow. Bluetooth monitor ver 5.50 = Note: Linux version 5.0.0+ (x86_64) 0.352340 = Note: Bluetooth subsystem version 2.22 0.352343 = New Index: 80:C5:F2:8F:87:84 (Primary,USB,hci0) [hci0] 0.352344 = Open Index: 80:C5:F2:8F:87:84 [hci0] 0.352345 = Index Info: 80:C5:F2:8F:87:84 (Qualcomm) [hci0] 0.352346 @ MGMT Open: bluetoothd (privileged) version 1.14 {0x0001} 0.352347 @ MGMT Open: btmon (privileged) version 1.14 {0x0002} 0.352366 @ MGMT Open: btmgmt (privileged) version 1.14 {0x0003} 27.302164 @ MGMT Command: Start Discovery (0x0023) plen 1 {0x0003} [hci0] 27.302310 Address type: 0x06 LE Public LE Random < HCI Command: LE Set Random Address (0x08|0x0005) plen 6 FireflyTeam#1 [hci0] 27.302496 Address: 15:60:F2:91:B2:24 (Non-Resolvable) > HCI Event: Command Complete (0x0e) plen 4 FireflyTeam#2 [hci0] 27.419117 LE Set Random Address (0x08|0x0005) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7 FireflyTeam#3 [hci0] 27.419244 Type: Active (0x01) Interval: 11.250 msec (0x0012) Window: 11.250 msec (0x0012) Own address type: Random (0x01) Filter policy: Accept all advertisement (0x00) > HCI Event: Command Complete (0x0e) plen 4 FireflyTeam#4 [hci0] 27.420131 LE Set Random Address (0x08|0x0005) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 FireflyTeam#5 [hci0] 27.420259 Scanning: Enabled (0x01) Filter duplicates: Enabled (0x01) > HCI Event: Command Complete (0x0e) plen 4 FireflyTeam#6 [hci0] 27.420969 LE Set Scan Parameters (0x08|0x000b) ncmd 1 Status: Success (0x00) > HCI Event: Command Complete (0x0e) plen 4 FireflyTeam#7 [hci0] 27.421983 LE Set Scan Enable (0x08|0x000c) ncmd 1 Status: Success (0x00) @ MGMT Event: Command Complete (0x0001) plen 4 {0x0003} [hci0] 27.422059 Start Discovery (0x0023) plen 1 Status: Success (0x00) Address type: 0x06 LE Public LE Random @ MGMT Event: Discovering (0x0013) plen 2 {0x0003} [hci0] 27.422067 Address type: 0x06 LE Public LE Random Discovery: Enabled (0x01) @ MGMT Event: Discovering (0x0013) plen 2 {0x0002} [hci0] 27.422067 Address type: 0x06 LE Public LE Random Discovery: Enabled (0x01) @ MGMT Event: Discovering (0x0013) plen 2 {0x0001} [hci0] 27.422067 Address type: 0x06 LE Public LE Random Discovery: Enabled (0x01) Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Sep 29, 2019
[ Upstream commit 41a91c6 ] dwc3_gadget_suspend() is called under dwc->lock spinlock. In such context calling synchronize_irq() is not allowed. Move the problematic call out of the protected block to fix the following kernel BUG during system suspend: BUG: sleeping function called from invalid context at kernel/irq/manage.c:112 in_atomic(): 1, irqs_disabled(): 128, pid: 1601, name: rtcwake 6 locks held by rtcwake/1601: #0: f70ac2a2 (sb_writers#7){.+.+}, at: vfs_write+0x130/0x16c FireflyTeam#1: b5fe1270 (&of->mutex){+.+.}, at: kernfs_fop_write+0xc0/0x1e4 FireflyTeam#2: 7e597705 (kn->count#60){.+.+}, at: kernfs_fop_write+0xc8/0x1e4 FireflyTeam#3: 8b3527d0 (system_transition_mutex){+.+.}, at: pm_suspend+0xc4/0xc04 FireflyTeam#4: fc7f1c42 (&dev->mutex){....}, at: __device_suspend+0xd8/0x74c FireflyTeam#5: 4b36507e (&(&dwc->lock)->rlock){....}, at: dwc3_gadget_suspend+0x24/0x3c irq event stamp: 11252 hardirqs last enabled at (11251): [<c09c54a4>] _raw_spin_unlock_irqrestore+0x6c/0x74 hardirqs last disabled at (11252): [<c09c4d44>] _raw_spin_lock_irqsave+0x1c/0x5c softirqs last enabled at (9744): [<c0102564>] __do_softirq+0x3a4/0x66c softirqs last disabled at (9737): [<c0128528>] irq_exit+0x140/0x168 Preemption disabled at: [<00000000>] (null) CPU: 7 PID: 1601 Comm: rtcwake Not tainted 5.0.0-rc3-next-20190122-00039-ga3f4ee4f8a52 #5252 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [<c01110f0>] (unwind_backtrace) from [<c010d120>] (show_stack+0x10/0x14) [<c010d120>] (show_stack) from [<c09a4d04>] (dump_stack+0x90/0xc8) [<c09a4d04>] (dump_stack) from [<c014c700>] (___might_sleep+0x22c/0x2c8) [<c014c700>] (___might_sleep) from [<c0189d68>] (synchronize_irq+0x28/0x84) [<c0189d68>] (synchronize_irq) from [<c05cbbf8>] (dwc3_gadget_suspend+0x34/0x3c) [<c05cbbf8>] (dwc3_gadget_suspend) from [<c05bd020>] (dwc3_suspend_common+0x154/0x410) [<c05bd020>] (dwc3_suspend_common) from [<c05bd34c>] (dwc3_suspend+0x14/0x2c) [<c05bd34c>] (dwc3_suspend) from [<c051c730>] (platform_pm_suspend+0x2c/0x54) [<c051c730>] (platform_pm_suspend) from [<c05285d4>] (dpm_run_callback+0xa4/0x3dc) [<c05285d4>] (dpm_run_callback) from [<c0528a40>] (__device_suspend+0x134/0x74c) [<c0528a40>] (__device_suspend) from [<c052c508>] (dpm_suspend+0x174/0x588) [<c052c508>] (dpm_suspend) from [<c0182134>] (suspend_devices_and_enter+0xc0/0xe74) [<c0182134>] (suspend_devices_and_enter) from [<c0183658>] (pm_suspend+0x770/0xc04) [<c0183658>] (pm_suspend) from [<c0180ddc>] (state_store+0x6c/0xcc) [<c0180ddc>] (state_store) from [<c09a9a70>] (kobj_attr_store+0x14/0x20) [<c09a9a70>] (kobj_attr_store) from [<c02d6800>] (sysfs_kf_write+0x4c/0x50) [<c02d6800>] (sysfs_kf_write) from [<c02d594c>] (kernfs_fop_write+0xfc/0x1e4) [<c02d594c>] (kernfs_fop_write) from [<c02593d8>] (__vfs_write+0x2c/0x160) [<c02593d8>] (__vfs_write) from [<c0259694>] (vfs_write+0xa4/0x16c) [<c0259694>] (vfs_write) from [<c0259870>] (ksys_write+0x40/0x8c) [<c0259870>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28) Exception stack(0xed55ffa8 to 0xed55fff0) ... Fixes: 01c1088 ("usb: dwc3: gadget: synchronize_irq dwc irq in suspend") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Sep 29, 2019
[ Upstream commit ff612ba ] We've been seeing the following sporadically throughout our fleet panic: kernel BUG at fs/btrfs/relocation.c:4584! netversion: 5.0-0 Backtrace: #0 [ffffc90003adb880] machine_kexec at ffffffff81041da8 FireflyTeam#1 [ffffc90003adb8c8] __crash_kexec at ffffffff8110396c FireflyTeam#2 [ffffc90003adb988] crash_kexec at ffffffff811048ad FireflyTeam#3 [ffffc90003adb9a0] oops_end at ffffffff8101c19a FireflyTeam#4 [ffffc90003adb9c0] do_trap at ffffffff81019114 FireflyTeam#5 [ffffc90003adba00] do_error_trap at ffffffff810195d0 FireflyTeam#6 [ffffc90003adbab0] invalid_op at ffffffff81a00a9b [exception RIP: btrfs_reloc_cow_block+692] RIP: ffffffff8143b614 RSP: ffffc90003adbb68 RFLAGS: 00010246 RAX: fffffffffffffff7 RBX: ffff8806b9c32000 RCX: ffff8806aad00690 RDX: ffff880850b295e0 RSI: ffff8806b9c32000 RDI: ffff88084f205bd0 RBP: ffff880849415000 R8: ffffc90003adbbe0 R9: ffff88085ac90000 R10: ffff8805f7369140 R11: 0000000000000000 R12: ffff880850b295e0 R13: ffff88084f205bd0 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 FireflyTeam#7 [ffffc90003adbbb0] __btrfs_cow_block at ffffffff813bf1cd FireflyTeam#8 [ffffc90003adbc28] btrfs_cow_block at ffffffff813bf4b3 FireflyTeam#9 [ffffc90003adbc78] btrfs_search_slot at ffffffff813c2e6c The way relocation moves data extents is by creating a reloc inode and preallocating extents in this inode and then copying the data into these preallocated extents. Once we've done this for all of our extents, we'll write out these dirty pages, which marks the extent written, and goes into btrfs_reloc_cow_block(). From here we get our current reloc_control, which _should_ match the reloc_control for the current block group we're relocating. However if we get an ENOSPC in this path at some point we'll bail out, never initiating writeback on this inode. Not a huge deal, unless we happen to be doing relocation on a different block group, and this block group is now rc->stage == UPDATE_DATA_PTRS. This trips the BUG_ON() in btrfs_reloc_cow_block(), because we expect to be done modifying the data inode. We are in fact done modifying the metadata for the data inode we're currently using, but not the one from the failed block group, and thus we BUG_ON(). (This happens when writeback finishes for extents from the previous group, when we are at btrfs_finish_ordered_io() which updates the data reloc tree (inode item, drops/adds extent items, etc).) Fix this by writing out the reloc data inode always, and then breaking out of the loop after that point to keep from tripping this BUG_ON() later. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> [ add note from Filipe ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Sep 29, 2019
[ Upstream commit 35399f8 ] In configfs_register_group(), if create_default_group() failed, we forget to unlink the group. It will left a invalid item in the parent list, which may trigger the use-after-free issue seen below: BUG: KASAN: use-after-free in __list_add_valid+0xd4/0xe0 lib/list_debug.c:26 Read of size 8 at addr ffff8881ef61ae20 by task syz-executor.0/5996 CPU: 1 PID: 5996 Comm: syz-executor.0 Tainted: G C 5.0.0+ FireflyTeam#5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xa9/0x10e lib/dump_stack.c:113 print_address_description+0x65/0x270 mm/kasan/report.c:187 kasan_report+0x149/0x18d mm/kasan/report.c:317 __list_add_valid+0xd4/0xe0 lib/list_debug.c:26 __list_add include/linux/list.h:60 [inline] list_add_tail include/linux/list.h:93 [inline] link_obj+0xb0/0x190 fs/configfs/dir.c:759 link_group+0x1c/0x130 fs/configfs/dir.c:784 configfs_register_group+0x56/0x1e0 fs/configfs/dir.c:1751 configfs_register_default_group+0x72/0xc0 fs/configfs/dir.c:1834 ? 0xffffffffc1be0000 iio_sw_trigger_init+0x23/0x1000 [industrialio_sw_trigger] do_one_initcall+0xbc/0x47d init/main.c:887 do_init_module+0x1b5/0x547 kernel/module.c:3456 load_module+0x6405/0x8c10 kernel/module.c:3804 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f494ecbcc58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003 RBP: 00007f494ecbcc70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f494ecbd6bc R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004 Allocated by task 5987: set_track mm/kasan/common.c:87 [inline] __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:497 kmalloc include/linux/slab.h:545 [inline] kzalloc include/linux/slab.h:740 [inline] configfs_register_default_group+0x4c/0xc0 fs/configfs/dir.c:1829 0xffffffffc1bd0023 do_one_initcall+0xbc/0x47d init/main.c:887 do_init_module+0x1b5/0x547 kernel/module.c:3456 load_module+0x6405/0x8c10 kernel/module.c:3804 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 5987: set_track mm/kasan/common.c:87 [inline] __kasan_slab_free+0x130/0x180 mm/kasan/common.c:459 slab_free_hook mm/slub.c:1429 [inline] slab_free_freelist_hook mm/slub.c:1456 [inline] slab_free mm/slub.c:3003 [inline] kfree+0xe1/0x270 mm/slub.c:3955 configfs_register_default_group+0x9a/0xc0 fs/configfs/dir.c:1836 0xffffffffc1bd0023 do_one_initcall+0xbc/0x47d init/main.c:887 do_init_module+0x1b5/0x547 kernel/module.c:3456 load_module+0x6405/0x8c10 kernel/module.c:3804 __do_sys_finit_module+0x162/0x190 kernel/module.c:3898 do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8881ef61ae00 which belongs to the cache kmalloc-192 of size 192 The buggy address is located 32 bytes inside of 192-byte region [ffff8881ef61ae00, ffff8881ef61aec0) The buggy address belongs to the page: page:ffffea0007bd8680 count:1 mapcount:0 mapping:ffff8881f6c03000 index:0xffff8881ef61a700 flags: 0x2fffc0000000200(slab) raw: 02fffc0000000200 ffffea0007ca4740 0000000500000005 ffff8881f6c03000 raw: ffff8881ef61a700 000000008010000c 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881ef61ad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8881ef61ad80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc >ffff8881ef61ae00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881ef61ae80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8881ef61af00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 5cf6a51 ("configfs: allow dynamic group creation") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Sep 29, 2019
commit 3f93a4f upstream. It is possible for an irq triggered by channel0 to be received later after clks are disabled once firmware loaded during sdma probe. If that happens then clearing them by writing to SDMA_H_INTR won't work and the kernel will hang processing infinite interrupts. Actually, don't need interrupt triggered on channel0 since it's pollling SDMA_H_STATSTOP to know channel0 done rather than interrupt in current code, just clear BD_INTR to disable channel0 interrupt to avoid the above case. This issue was brought by commit 1d069bf ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler") which didn't take care the above case. Fixes: 1d069bf ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler") Cc: stable@vger.kernel.org FireflyTeam#5.0+ Signed-off-by: Robin Gong <yibin.gong@nxp.com> Reported-by: Sven Van Asbroeck <thesven73@gmail.com> Tested-by: Sven Van Asbroeck <thesven73@gmail.com> Reviewed-by: Michael Olbrich <m.olbrich@pengutronix.de> Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Sep 29, 2019
commit c952b35 upstream. bpf/btf write_* functions need ff->ph->env. With this missing, pipe-mode (perf record -o -) would crash like: Program terminated with signal SIGSEGV, Segmentation fault. This patch assign proper ph value to ff. Committer testing: (gdb) run record -o - Starting program: /root/bin/perf record -o - PERFILE2 <SNIP start of perf.data headers> Thread 1 "perf" received signal SIGSEGV, Segmentation fault. __do_write_buf (size=4, buf=0x160, ff=0x7fffffff8f80) at util/header.c:126 126 memcpy(ff->buf + ff->offset, buf, size); (gdb) bt #0 __do_write_buf (size=4, buf=0x160, ff=0x7fffffff8f80) at util/header.c:126 FireflyTeam#1 do_write (ff=ff@entry=0x7fffffff8f80, buf=buf@entry=0x160, size=4) at util/header.c:137 FireflyTeam#2 0x00000000004eddba in write_bpf_prog_info (ff=0x7fffffff8f80, evlist=<optimized out>) at util/header.c:912 FireflyTeam#3 0x00000000004f69d7 in perf_event__synthesize_features (tool=tool@entry=0x97cc00 <record>, session=session@entry=0x7fffe9c6d010, evlist=0x7fffe9cae010, process=process@entry=0x4435d0 <process_synthesized_event>) at util/header.c:3695 FireflyTeam#4 0x0000000000443c79 in record__synthesize (tail=tail@entry=false, rec=0x97cc00 <record>) at builtin-record.c:1214 FireflyTeam#5 0x0000000000444ec9 in __cmd_record (rec=0x97cc00 <record>, argv=<optimized out>, argc=0) at builtin-record.c:1435 FireflyTeam#6 cmd_record (argc=0, argv=<optimized out>) at builtin-record.c:2450 FireflyTeam#7 0x00000000004ae3e9 in run_builtin (p=p@entry=0x98e058 <commands+216>, argc=argc@entry=3, argv=0x7fffffffd670) at perf.c:304 FireflyTeam#8 0x000000000042eded in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:356 FireflyTeam#9 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:400 FireflyTeam#10 main (argc=3, argv=<optimized out>) at perf.c:522 (gdb) After the patch the SEGSEGV is gone. Reported-by: David Carrillo Cisneros <davidca@fb.com> Signed-off-by: Song Liu <songliubraving@fb.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: kernel-team@fb.com Cc: stable@vger.kernel.org # v5.1+ Fixes: 606f972 ("perf bpf: Save bpf_prog_info information as headers to perf.data") Link: http://lkml.kernel.org/r/20190620010453.4118689-1-songliubraving@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Sep 29, 2019
…qno errors commit b76becd upstream. With a recent change to our send path for FSF commands we introduced a possible use-after-free of request-objects, that might further lead to zfcp crafting bad requests, which the FCP channel correctly complains about with an error (FSF_PROT_SEQ_NUMB_ERROR). This error is then handled by an adapter-wide recovery. The following sequence illustrates the possible use-after-free: Send Path: int zfcp_fsf_open_port(struct zfcp_erp_action *erp_action) { struct zfcp_fsf_req *req; ... spin_lock_irq(&qdio->req_q_lock); // ^^^^^^^^^^^^^^^^ // protects QDIO queue during sending ... req = zfcp_fsf_req_create(qdio, FSF_QTCB_OPEN_PORT_WITH_DID, SBAL_SFLAGS0_TYPE_READ, qdio->adapter->pool.erp_req); // ^^^^^^^^^^^^^^^^^^^ // allocation of the request-object ... retval = zfcp_fsf_req_send(req); ... spin_unlock_irq(&qdio->req_q_lock); return retval; } static int zfcp_fsf_req_send(struct zfcp_fsf_req *req) { struct zfcp_adapter *adapter = req->adapter; struct zfcp_qdio *qdio = adapter->qdio; ... zfcp_reqlist_add(adapter->req_list, req); // ^^^^^^^^^^^^^^^^ // add request to our driver-internal hash-table for tracking // (protected by separate lock req_list->lock) ... if (zfcp_qdio_send(qdio, &req->qdio_req)) { // ^^^^^^^^^^^^^^ // hand-off the request to FCP channel; // the request can complete at any point now ... } /* Don't increase for unsolicited status */ if (!zfcp_fsf_req_is_status_read_buffer(req)) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // possible use-after-free adapter->fsf_req_seq_no++; // ^^^^^^^^^^^^^^^^ // because of the use-after-free we might // miss this accounting, and as follow-up // this results in the FCP channel error // FSF_PROT_SEQ_NUMB_ERROR adapter->req_no++; return 0; } static inline bool zfcp_fsf_req_is_status_read_buffer(struct zfcp_fsf_req *req) { return req->qtcb == NULL; // ^^^^^^^^^ // possible use-after-free } Response Path: void zfcp_fsf_reqid_check(struct zfcp_qdio *qdio, int sbal_idx) { ... struct zfcp_fsf_req *fsf_req; ... for (idx = 0; idx < QDIO_MAX_ELEMENTS_PER_BUFFER; idx++) { ... fsf_req = zfcp_reqlist_find_rm(adapter->req_list, req_id); // ^^^^^^^^^^^^^^^^^^^^ // remove request from our driver-internal // hash-table (lock req_list->lock) ... zfcp_fsf_req_complete(fsf_req); } } static void zfcp_fsf_req_complete(struct zfcp_fsf_req *req) { ... if (likely(req->status & ZFCP_STATUS_FSFREQ_CLEANUP)) zfcp_fsf_req_free(req); // ^^^^^^^^^^^^^^^^^ // free memory for request-object else complete(&req->completion); // ^^^^^^^^ // completion notification for code-paths that wait // synchronous for the completion of the request; in // those the memory is freed separately } The result of the use-after-free only affects the send path, and can not lead to any data corruption. In case we miss the sequence-number accounting, because the memory was already re-purposed, the next FSF command will fail with said FCP channel error, and we will recover the whole adapter. This causes no additional errors, but it slows down traffic. There is a slight chance of the same thing happen again recursively after the adapter recovery, but so far this has not been seen. This was seen under z/VM, where the send path might run on a virtual CPU that gets scheduled away by z/VM, while the return path might still run, and so create the necessary timing. Running with KASAN can also slow down the kernel sufficiently to run into this user-after-free, and then see the report by KASAN. To fix this, simply pull the test for the sequence-number accounting in front of the hand-off to the FCP channel (this information doesn't change during hand-off), but leave the sequence-number accounting itself where it is. To make future regressions of the same kind less likely, add comments to all closely related code-paths. Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Fixes: f9eca02 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header") Cc: <stable@vger.kernel.org> FireflyTeam#5.0+ Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Sep 29, 2019
commit 7545b6c upstream. Clear the CRYPTO_TFM_REQ_MAY_SLEEP flag when the chacha20poly1305 operation is being continued from an async completion callback, since sleeping may not be allowed in that context. This is basically the same bug that was recently fixed in the xts and lrw templates. But, it's always been broken in chacha20poly1305 too. This was found using syzkaller in combination with the updated crypto self-tests which actually test the MAY_SLEEP flag now. Reproducer: python -c 'import socket; socket.socket(socket.AF_ALG, 5, 0).bind( ("aead", "rfc7539(cryptd(chacha20-generic),poly1305-generic)"))' Kernel output: BUG: sleeping function called from invalid context at include/crypto/algapi.h:426 in_atomic(): 1, irqs_disabled(): 0, pid: 1001, name: kworker/2:2 [...] CPU: 2 PID: 1001 Comm: kworker/2:2 Not tainted 5.2.0-rc2 FireflyTeam#5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014 Workqueue: crypto cryptd_queue_worker Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x4d/0x6a lib/dump_stack.c:113 ___might_sleep kernel/sched/core.c:6138 [inline] ___might_sleep.cold.19+0x8e/0x9f kernel/sched/core.c:6095 crypto_yield include/crypto/algapi.h:426 [inline] crypto_hash_walk_done+0xd6/0x100 crypto/ahash.c:113 shash_ahash_update+0x41/0x60 crypto/shash.c:251 shash_async_update+0xd/0x10 crypto/shash.c:260 crypto_ahash_update include/crypto/hash.h:539 [inline] poly_setkey+0xf6/0x130 crypto/chacha20poly1305.c:337 poly_init+0x51/0x60 crypto/chacha20poly1305.c:364 async_done_continue crypto/chacha20poly1305.c:78 [inline] poly_genkey_done+0x15/0x30 crypto/chacha20poly1305.c:369 cryptd_skcipher_complete+0x29/0x70 crypto/cryptd.c:279 cryptd_skcipher_decrypt+0xcd/0x110 crypto/cryptd.c:339 cryptd_queue_worker+0x70/0xa0 crypto/cryptd.c:184 process_one_work+0x1ed/0x420 kernel/workqueue.c:2269 worker_thread+0x3e/0x3a0 kernel/workqueue.c:2415 kthread+0x11f/0x140 kernel/kthread.c:255 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 Fixes: 71ebc4d ("crypto: chacha20poly1305 - Add a ChaCha20-Poly1305 AEAD construction, RFC7539") Cc: <stable@vger.kernel.org> # v4.2+ Cc: Martin Willi <martin@strongswan.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 4, 2019
The balloon.page field is used for two different purposes if batching is on or off. If batching is on, the field point to the page which is used to communicate with with the hypervisor. If it is off, balloon.page points to the page that is about to be (un)locked. Unfortunately, this dual-purpose of the field introduced a bug: when the balloon is popped (e.g., when the machine is reset or the balloon driver is explicitly removed), the balloon driver frees, unconditionally, the page that is held in balloon.page. As a result, if batching is disabled, this leads to double freeing the last page that is sent to the hypervisor. The following error occurs during rmmod when kernel checkers are on, and the balloon is not empty: [ 42.307653] ------------[ cut here ]------------ [ 42.307657] Kernel BUG at ffffffffba1e4b28 [verbose debug info unavailable] [ 42.307720] invalid opcode: 0000 [FireflyTeam#1] SMP DEBUG_PAGEALLOC [ 42.312512] Modules linked in: vmw_vsock_vmci_transport vsock ppdev joydev vmw_balloon(-) input_leds serio_raw vmw_vmci parport_pc shpchp parport i2c_piix4 nfit mac_hid autofs4 vmwgfx drm_kms_helper hid_generic syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid ttm mptspi scsi_transport_spi ahci mptscsih drm psmouse vmxnet3 libahci mptbase pata_acpi [ 42.312766] CPU: 10 PID: 1527 Comm: rmmod Not tainted 4.12.0+ FireflyTeam#5 [ 42.312803] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2016 [ 42.313042] task: ffff9bf9680f8000 task.stack: ffffbfefc1638000 [ 42.313290] RIP: 0010:__free_pages+0x38/0x40 [ 42.313510] RSP: 0018:ffffbfefc163be98 EFLAGS: 00010246 [ 42.313731] RAX: 000000000000003e RBX: ffffffffc02b9720 RCX: 0000000000000006 [ 42.313972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9bf97e08e0a0 [ 42.314201] RBP: ffffbfefc163be98 R08: 0000000000000000 R09: 0000000000000000 [ 42.314435] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc02b97e4 [ 42.314505] R13: ffffffffc02b9748 R14: ffffffffc02b9728 R15: 0000000000000200 [ 42.314550] FS: 00007f3af5fec700(0000) GS:ffff9bf97e080000(0000) knlGS:0000000000000000 [ 42.314599] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 42.314635] CR2: 00007f44f6f4ab24 CR3: 00000003a7d12000 CR4: 00000000000006e0 [ 42.314864] Call Trace: [ 42.315774] vmballoon_pop+0x102/0x130 [vmw_balloon] [ 42.315816] vmballoon_exit+0x42/0xd64 [vmw_balloon] [ 42.315853] SyS_delete_module+0x1e2/0x250 [ 42.315891] entry_SYSCALL_64_fastpath+0x23/0xc2 [ 42.315924] RIP: 0033:0x7f3af5b0e8e7 [ 42.315949] RSP: 002b:00007fffe6ce0148 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 42.315996] RAX: ffffffffffffffda RBX: 000055be676401e0 RCX: 00007f3af5b0e8e7 [ 42.316951] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055be67640248 [ 42.317887] RBP: 0000000000000003 R08: 0000000000000000 R09: 1999999999999999 [ 42.318845] R10: 0000000000000883 R11: 0000000000000206 R12: 00007fffe6cdf130 [ 42.319755] R13: 0000000000000000 R14: 0000000000000000 R15: 000055be676401e0 [ 42.320606] Code: c0 74 1c f0 ff 4f 1c 74 02 5d c3 85 f6 74 07 e8 0f d8 ff ff 5d c3 31 f6 e8 c6 fb ff ff 5d c3 48 c7 c6 c8 0f c5 ba e8 58 be 02 00 <0f> 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 85 ff 75 01 c3 55 48 [ 42.323462] RIP: __free_pages+0x38/0x40 RSP: ffffbfefc163be98 [ 42.325735] ---[ end trace 872e008e33f81508 ]--- To solve the bug, we eliminate the dual purpose of balloon.page. Fixes: f220a80 ("VMware balloon: add batching to the vmw_balloon.") Cc: stable@vger.kernel.org Reported-by: Oleksandr Natalenko <onatalen@redhat.com> Signed-off-by: Gil Kupfer <gilkup@gmail.com> Signed-off-by: Nadav Amit <namit@vmware.com> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 5, 2019
Yonghong Song says: ==================== Currently, suppose a userspace application has loaded a bpf program and attached it to a tracepoint/kprobe/uprobe, and a bpf introspection tool, e.g., bpftool, wants to show which bpf program is attached to which tracepoint/kprobe/uprobe. Such attachment information will be really useful to understand the overall bpf deployment in the system. There is a name field (16 bytes) for each program, which could be used to encode the attachment point. There are some drawbacks for this approaches. First, bpftool user (e.g., an admin) may not really understand the association between the name and the attachment point. Second, if one program is attached to multiple places, encoding a proper name which can imply all these attachments becomes difficult. This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY. Given a pid and fd, this command will return bpf related information to user space. Right now it only supports tracepoint/kprobe/uprobe perf event fd's. For such a fd, BPF_TASK_FD_QUERY will return . prog_id . tracepoint name, or . k[ret]probe funcname + offset or kernel addr, or . u[ret]probe filename + offset to the userspace. The user can use "bpftool prog" to find more information about bpf program itself with prog_id. Patch FireflyTeam#1 adds function perf_get_event() in kernel/events/core.c. Patch FireflyTeam#2 implements the bpf subcommand BPF_TASK_FD_QUERY. Patch FireflyTeam#3 syncs tools bpf.h header and also add bpf_task_fd_query() in the libbpf library for samples/selftests/bpftool to use. Patch FireflyTeam#4 adds ksym_get_addr() utility function. Patch FireflyTeam#5 add a test in samples/bpf for querying k[ret]probes and u[ret]probes. Patch FireflyTeam#6 add a test in tools/testing/selftests/bpf for querying raw_tracepoint and tracepoint. Patch FireflyTeam#7 add a new subcommand "perf" to bpftool. Changelogs: v4 -> v5: . return strlen(buf) instead of strlen(buf) + 1 in the attr.buf_len. As long as user provides non-empty buffer, it will be filed with empty string, truncated string, or full string based on the buffer size and the length of to-be-copied string. v3 -> v4: . made attr buf_len input/output. The length of actual buffter is written to buf_len so user space knows what is actually needed. If user provides a buffer with length >= 1 but less than required, do partial copy and return -ENOSPC. . code simplification with put_user. . changed query result attach_info to fd_type. . add tests at selftests/bpf to test zero len, null buf and insufficient buf. v2 -> v3: . made perf_get_event() return perf_event pointer const. this was to ensure that event fields are not meddled. . detect whether newly BPF_TASK_FD_QUERY is supported or not in "bpftool perf" and warn users if it is not. v1 -> v2: . changed bpf subcommand name from BPF_PERF_EVENT_QUERY to BPF_TASK_FD_QUERY. . fixed various "bpftool perf" issues and added documentation and auto-completion. ==================== Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 5, 2019
Petr Machata says: ==================== Test mirror-to-gretap with bridge in UL This patchset adds more tests to the mirror-to-gretap suite where bridge is present in the underlay. Specifically it adds tests for bridge VLAN handling, FDB, and bridge port STP status. In patches FireflyTeam#1-FireflyTeam#3, the codebase is refactored to support the new tests. In patch FireflyTeam#4, an STP test is added to the mirroring library, that will later be called from bridge tests. In patches FireflyTeam#5-FireflyTeam#8, the test for mirror-to-gretap with an 802.1q bridge in underlay is adapted and more tests are added. In patch FireflyTeam#9, an STP test is added to the test suite for mirror-to-gretap with an 802.1d bridge in underlay. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 5, 2019
Add null check for nat_hook in nf_nat_decode_session() [ 195.648098] UBSAN: Undefined behaviour in ./include/linux/netfilter.h:348:14 [ 195.651366] BUG: KASAN: null-ptr-deref in __xfrm_policy_check+0x208/0x1d70 [ 195.653888] member access within null pointer of type 'struct nf_nat_hook' [ 195.653896] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.17.0-rc6+ FireflyTeam#5 [ 195.656320] Read of size 8 at addr 0000000000000008 by task ping/2469 [ 195.658715] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 195.658721] Call Trace: [ 195.661087] [ 195.669341] <IRQ> [ 195.670574] dump_stack+0xc6/0x150 [ 195.672156] ? dump_stack_print_info.cold.0+0x1b/0x1b [ 195.674121] ? ubsan_prologue+0x31/0x92 [ 195.676546] ubsan_epilogue+0x9/0x49 [ 195.678159] handle_null_ptr_deref+0x11a/0x130 [ 195.679800] ? sprint_OID+0x1a0/0x1a0 [ 195.681322] __ubsan_handle_type_mismatch_v1+0xd5/0x11d [ 195.683146] ? ubsan_prologue+0x92/0x92 [ 195.684642] __xfrm_policy_check+0x18ef/0x1d70 [ 195.686294] ? rt_cache_valid+0x118/0x180 [ 195.687804] ? __xfrm_route_forward+0x410/0x410 [ 195.689463] ? fib_multipath_hash+0x700/0x700 [ 195.691109] ? kvm_sched_clock_read+0x23/0x40 [ 195.692805] ? pvclock_clocksource_read+0xf6/0x280 [ 195.694409] ? graph_lock+0xa0/0xa0 [ 195.695824] ? pvclock_clocksource_read+0xf6/0x280 [ 195.697508] ? pvclock_read_flags+0x80/0x80 [ 195.698981] ? kvm_sched_clock_read+0x23/0x40 [ 195.700347] ? sched_clock+0x5/0x10 [ 195.701525] ? sched_clock_cpu+0x18/0x1a0 [ 195.702846] tcp_v4_rcv+0x1d32/0x1de0 [ 195.704115] ? lock_repin_lock+0x70/0x270 [ 195.707072] ? pvclock_read_flags+0x80/0x80 [ 195.709302] ? tcp_v4_early_demux+0x4b0/0x4b0 [ 195.711833] ? lock_acquire+0x195/0x380 [ 195.714222] ? ip_local_deliver_finish+0xfc/0x770 [ 195.716967] ? raw_rcv+0x2b0/0x2b0 [ 195.718856] ? lock_release+0xa00/0xa00 [ 195.720938] ip_local_deliver_finish+0x1b9/0x770 [...] Fixes: 2c205dd ("netfilter: add struct nf_nat_hook and use it") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 5, 2019
Shakeel reported a crash in mem_cgroup_protected(), which can be triggered by memcg reclaim if the legacy cgroup v1 use_hierarchy=0 mode is used: BUG: unable to handle kernel NULL pointer dereference at 0000000000000120 PGD 8000001ff55da067 P4D 8000001ff55da067 PUD 1fdc7df067 PMD 0 Oops: 0000 [FireflyTeam#4] SMP PTI CPU: 0 PID: 15581 Comm: bash Tainted: G D 4.17.0-smp-clean FireflyTeam#5 Hardware name: ... RIP: 0010:mem_cgroup_protected+0x54/0x130 Code: 4c 8b 8e 00 01 00 00 4c 8b 86 08 01 00 00 48 8d 8a 08 ff ff ff 48 85 d2 ba 00 00 00 00 48 0f 44 ca 48 39 c8 0f 84 cf 00 00 00 <48> 8b 81 20 01 00 00 4d 89 ca 4c 39 c8 4c 0f 46 d0 4d 85 d2 74 05 RSP: 0000:ffffabe64dfafa58 EFLAGS: 00010286 RAX: ffff9fb6ff03d000 RBX: ffff9fb6f5b1b000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9fb6f5b1b000 RDI: ffff9fb6f5b1b000 RBP: ffffabe64dfafb08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 000000000000c800 R12: ffffabe64dfafb88 R13: ffff9fb6f5b1b000 R14: ffffabe64dfafb88 R15: ffff9fb77fffe000 FS: 00007fed1f8ac700(0000) GS:ffff9fb6ff400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000120 CR3: 0000001fdcf86003 CR4: 00000000001606f0 Call Trace: ? shrink_node+0x194/0x510 do_try_to_free_pages+0xfd/0x390 try_to_free_mem_cgroup_pages+0x123/0x210 try_charge+0x19e/0x700 mem_cgroup_try_charge+0x10b/0x1a0 wp_page_copy+0x134/0x5b0 do_wp_page+0x90/0x460 __handle_mm_fault+0x8e3/0xf30 handle_mm_fault+0xfe/0x220 __do_page_fault+0x262/0x500 do_page_fault+0x28/0xd0 ? page_fault+0x8/0x30 page_fault+0x1e/0x30 RIP: 0033:0x485b72 The problem happens because parent_mem_cgroup() returns a NULL pointer, which is dereferenced later without a check. As cgroup v1 has no memory guarantee support, let's make mem_cgroup_protected() immediately return MEMCG_PROT_NONE, if the given cgroup has no parent (non-hierarchical mode is used). Link: http://lkml.kernel.org/r/20180611175418.7007-2-guro@fb.com Fixes: bf8d5d5 ("memcg: introduce memory.min") Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Shakeel Butt <shakeelb@google.com> Tested-by: Shakeel Butt <shakeelb@google.com> Tested-by: John Stultz <john.stultz@linaro.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 5, 2019
Crash dump shows following instructions crash> bt PID: 0 TASK: ffffffffbe412480 CPU: 0 COMMAND: "swapper/0" #0 [ffff891ee0003868] machine_kexec at ffffffffbd063ef1 FireflyTeam#1 [ffff891ee00038c8] __crash_kexec at ffffffffbd12b6f2 FireflyTeam#2 [ffff891ee0003998] crash_kexec at ffffffffbd12c84c FireflyTeam#3 [ffff891ee00039b8] oops_end at ffffffffbd030f0a FireflyTeam#4 [ffff891ee00039e0] no_context at ffffffffbd074643 FireflyTeam#5 [ffff891ee0003a40] __bad_area_nosemaphore at ffffffffbd07496e FireflyTeam#6 [ffff891ee0003a90] bad_area_nosemaphore at ffffffffbd074a64 FireflyTeam#7 [ffff891ee0003aa0] __do_page_fault at ffffffffbd074b0a FireflyTeam#8 [ffff891ee0003b18] do_page_fault at ffffffffbd074fc8 FireflyTeam#9 [ffff891ee0003b50] page_fault at ffffffffbda01925 [exception RIP: qlt_schedule_sess_for_deletion+15] RIP: ffffffffc02e526f RSP: ffff891ee0003c08 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc0307847 RDX: 00000000000020e6 RSI: ffff891edbc377c8 RDI: 0000000000000000 RBP: ffff891ee0003c18 R8: ffffffffc02f0b20 R9: 0000000000000250 R10: 0000000000000258 R11: 000000000000b780 R12: ffff891ed9b43000 R13: 00000000000000f0 R14: 0000000000000006 R15: ffff891edbc377c8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 FireflyTeam#10 [ffff891ee0003c20] qla2x00_fcport_event_handler at ffffffffc02853d3 [qla2xxx] FireflyTeam#11 [ffff891ee0003cf0] __dta_qla24xx_async_gnl_sp_done_333 at ffffffffc0285a1d [qla2xxx] FireflyTeam#12 [ffff891ee0003de8] qla24xx_process_response_queue at ffffffffc02a2eb5 [qla2xxx] FireflyTeam#13 [ffff891ee0003e88] qla24xx_msix_rsp_q at ffffffffc02a5403 [qla2xxx] FireflyTeam#14 [ffff891ee0003ec0] __handle_irq_event_percpu at ffffffffbd0f4c59 FireflyTeam#15 [ffff891ee0003f10] handle_irq_event_percpu at ffffffffbd0f4e02 FireflyTeam#16 [ffff891ee0003f40] handle_irq_event at ffffffffbd0f4e90 rockchip-linux#17 [ffff891ee0003f68] handle_edge_irq at ffffffffbd0f8984 rockchip-linux#18 [ffff891ee0003f88] handle_irq at ffffffffbd0305d5 rockchip-linux#19 [ffff891ee0003fb8] do_IRQ at ffffffffbda02a18 --- <IRQ stack> --- rockchip-linux#20 [ffffffffbe403d30] ret_from_intr at ffffffffbda0094e [exception RIP: unknown or invalid address] RIP: 000000000000001f RSP: 0000000000000000 RFLAGS: fff3b8c2091ebb3f RAX: ffffbba5a0000200 RBX: 0000be8cdfa8f9fa RCX: 0000000000000018 RDX: 0000000000000101 RSI: 000000000000015d RDI: 0000000000000193 RBP: 0000000000000083 R8: ffffffffbe403e38 R9: 0000000000000002 R10: 0000000000000000 R11: ffffffffbe56b820 R12: ffff891ee001cf00 R13: ffffffffbd11c0a4 R14: ffffffffbe403d60 R15: 0000000000000001 ORIG_RAX: ffff891ee0022ac0 CS: 0000 SS: ffffffffffffffb9 bt: WARNING: possibly bogus exception frame rockchip-linux#21 [ffffffffbe403dd8] cpuidle_enter_state at ffffffffbd67c6fd rockchip-linux#22 [ffffffffbe403e40] cpuidle_enter at ffffffffbd67c907 rockchip-linux#23 [ffffffffbe403e50] call_cpuidle at ffffffffbd0d98f3 rockchip-linux#24 [ffffffffbe403e60] do_idle at ffffffffbd0d9b42 rockchip-linux#25 [ffffffffbe403e98] cpu_startup_entry at ffffffffbd0d9da3 rockchip-linux#26 [ffffffffbe403ec0] rest_init at ffffffffbd81d4aa rockchip-linux#27 [ffffffffbe403ed0] start_kernel at ffffffffbe67d2ca rockchip-linux#28 [ffffffffbe403f28] x86_64_start_reservations at ffffffffbe67c675 rockchip-linux#29 [ffffffffbe403f38] x86_64_start_kernel at ffffffffbe67c6eb rockchip-linux#30 [ffffffffbe403f50] secondary_startup_64 at ffffffffbd0000d5 Fixes: 040036b ("scsi: qla2xxx: Delay loop id allocation at login") Cc: <stable@vger.kernel.org> # v4.17+ Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 6, 2019
struct xfrm_userpolicy_type has two holes, so we should not use C99 style initializer. KMSAN report: BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:140 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x1b14/0x2800 lib/iov_iter.c:571 CPU: 1 PID: 4520 Comm: syz-executor841 Not tainted 4.17.0+ FireflyTeam#5 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1117 kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1211 kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1253 copyout lib/iov_iter.c:140 [inline] _copy_to_iter+0x1b14/0x2800 lib/iov_iter.c:571 copy_to_iter include/linux/uio.h:106 [inline] skb_copy_datagram_iter+0x422/0xfa0 net/core/datagram.c:431 skb_copy_datagram_msg include/linux/skbuff.h:3268 [inline] netlink_recvmsg+0x6f1/0x1900 net/netlink/af_netlink.c:1959 sock_recvmsg_nosec net/socket.c:802 [inline] sock_recvmsg+0x1d6/0x230 net/socket.c:809 ___sys_recvmsg+0x3fe/0x810 net/socket.c:2279 __sys_recvmmsg+0x58e/0xe30 net/socket.c:2391 do_sys_recvmmsg+0x2a6/0x3e0 net/socket.c:2472 __do_sys_recvmmsg net/socket.c:2485 [inline] __se_sys_recvmmsg net/socket.c:2481 [inline] __x64_sys_recvmmsg+0x15d/0x1c0 net/socket.c:2481 do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x446ce9 RSP: 002b:00007fc307918db8 EFLAGS: 00000293 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 00000000006dbc24 RCX: 0000000000446ce9 RDX: 000000000000000a RSI: 0000000020005040 RDI: 0000000000000003 RBP: 00000000006dbc20 R08: 0000000020004e40 R09: 0000000000000000 R10: 0000000040000000 R11: 0000000000000293 R12: 0000000000000000 R13: 00007ffc8d2df32f R14: 00007fc3079199c0 R15: 0000000000000001 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_save_stack mm/kmsan/kmsan.c:294 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685 kmsan_memcpy_origins+0x11d/0x170 mm/kmsan/kmsan.c:527 __msan_memcpy+0x109/0x160 mm/kmsan/kmsan_instr.c:413 __nla_put lib/nlattr.c:569 [inline] nla_put+0x276/0x340 lib/nlattr.c:627 copy_to_user_policy_type net/xfrm/xfrm_user.c:1678 [inline] dump_one_policy+0xbe1/0x1090 net/xfrm/xfrm_user.c:1708 xfrm_policy_walk+0x45a/0xd00 net/xfrm/xfrm_policy.c:1013 xfrm_dump_policy+0x1c0/0x2a0 net/xfrm/xfrm_user.c:1749 netlink_dump+0x9b5/0x1550 net/netlink/af_netlink.c:2226 __netlink_dump_start+0x1131/0x1270 net/netlink/af_netlink.c:2323 netlink_dump_start include/linux/netlink.h:214 [inline] xfrm_user_rcv_msg+0x8a3/0x9b0 net/xfrm/xfrm_user.c:2577 netlink_rcv_skb+0x37e/0x600 net/netlink/af_netlink.c:2448 xfrm_netlink_rcv+0xb2/0xf0 net/xfrm/xfrm_user.c:2598 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1680/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Local variable description: ----upt.i@dump_one_policy Variable was created at: dump_one_policy+0x78/0x1090 net/xfrm/xfrm_user.c:1689 xfrm_policy_walk+0x45a/0xd00 net/xfrm/xfrm_policy.c:1013 Byte 130 of 137 is uninitialized Memory access starts at ffff88019550407f Fixes: c0144be ("[XFRM] netlink: Use nla_put()/NLA_PUT() variantes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 6, 2019
Kernel panic when with high memory pressure, calltrace looks like, PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java" #0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb FireflyTeam#1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942 FireflyTeam#2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30 FireflyTeam#3 [ffff881ec7ed7778] oops_end at ffffffff816902c8 FireflyTeam#4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46 FireflyTeam#5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc FireflyTeam#6 [ffff881ec7ed7838] __node_set at ffffffff81680300 FireflyTeam#7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f FireflyTeam#8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5 FireflyTeam#9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8 [exception RIP: _raw_spin_lock_irqsave+47] RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8 RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008 RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098 R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 It happens in the pagefault and results in double pagefault during compacting pages when memory allocation fails. Analysed the vmcore, the page leads to second pagefault is corrupted with _mapcount=-256, but private=0. It's caused by the race between migration and ballooning, and lock missing in virtballoon_migratepage() of virtio_balloon driver. This patch fix the bug. Fixes: e225042 ("virtio_balloon: introduce migration primitives to balloon pages") Cc: stable@vger.kernel.org Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Huang Chong <huang.chong@zte.com.cn> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 11, 2019
There's an issue with using threads::last_match in multithread mode which is enabled during the perf top synthesize. It might crash with following assertion: perf: ...include/linux/refcount.h:109: refcount_inc: Assertion `!(!refcount_inc_not_zero(r))' failed. The gdb backtrace looks like this: 0x00007ffff50839fb in raise () from /lib64/libc.so.6 (gdb) #0 0x00007ffff50839fb in raise () from /lib64/libc.so.6 FireflyTeam#1 0x00007ffff5085800 in abort () from /lib64/libc.so.6 FireflyTeam#2 0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6 FireflyTeam#3 0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6 FireflyTeam#4 0x0000000000535ff9 in refcount_inc (r=0x7fffe8009a70) at ...include/linux/refcount.h:109 FireflyTeam#5 0x0000000000536771 in thread__get (thread=0x7fffe8009a40) at util/thread.c:115 FireflyTeam#6 0x0000000000523cd0 in ____machine__findnew_thread (machine=0xbfde38, threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:432 FireflyTeam#7 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38, pid=2, tid=2) at util/machine.c:489 FireflyTeam#8 0x0000000000523f24 in machine__findnew_thread (machine=0xbfde38, pid=2, tid=2) at util/machine.c:499 FireflyTeam#9 0x0000000000526fbe in machine__process_fork_event (machine=0xbfde38, ... The failing assertion is this one: REFCOUNT_WARN(!refcount_inc_not_zero(r), ... the problem is that we don't serialize access to threads::last_match. We serialize the access to the threads tree, but we don't care how's threads::last_match being accessed. Both locked/unlocked paths use that data and can set it. In multithreaded mode we can end up with invalid object in thread__get call, like in following paths race: thread 1 ... machine__findnew_thread down_write(&threads->lock); __machine__findnew_thread ____machine__findnew_thread th = threads->last_match; if (th->tid == tid) { thread__get thread 2 ... machine__find_thread down_read(&threads->lock); __machine__findnew_thread ____machine__findnew_thread th = threads->last_match; if (th->tid == tid) { thread__get thread 3 ... machine__process_fork_event machine__remove_thread __machine__remove_thread threads->last_match = NULL thread__put thread__put Thread 1 and 2 might got stale last_match, before thread 3 clears it. Thread 1 and 2 then race with thread 3's thread__put and they might trigger the refcnt == 0 assertion above. The patch is disabling the last_match cache for multiple thread mode. It was originally meant for single thread scenarios, where it's common to have multiple sequential searches of the same thread. In multithread mode this does not make sense, because top's threads processes different /proc entries and so the 'struct threads' object is queried for various threads. Moreover we'd need to add more locks to make it work. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20180719143345.12963-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 11, 2019
We occasionaly hit following assert failure in 'perf top', when processing the /proc info in multiple threads. perf: ...include/linux/refcount.h:109: refcount_inc: Assertion `!(!refcount_inc_not_zero(r))' failed. The gdb backtrace looks like this: [Switching to Thread 0x7ffff11ba700 (LWP 13749)] 0x00007ffff50839fb in raise () from /lib64/libc.so.6 (gdb) #0 0x00007ffff50839fb in raise () from /lib64/libc.so.6 FireflyTeam#1 0x00007ffff5085800 in abort () from /lib64/libc.so.6 FireflyTeam#2 0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6 FireflyTeam#3 0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6 FireflyTeam#4 0x0000000000535373 in refcount_inc (r=0x7fffdc009be0) at ...include/linux/refcount.h:109 FireflyTeam#5 0x00000000005354f1 in comm_str__get (cs=0x7fffdc009bc0) at util/comm.c:24 FireflyTeam#6 0x00000000005356bd in __comm_str__findnew (str=0x7fffd000b260 ":2", root=0xbed5c0 <comm_str_root>) at util/comm.c:72 FireflyTeam#7 0x000000000053579e in comm_str__findnew (str=0x7fffd000b260 ":2", root=0xbed5c0 <comm_str_root>) at util/comm.c:95 FireflyTeam#8 0x000000000053582e in comm__new (str=0x7fffd000b260 ":2", timestamp=0, exec=false) at util/comm.c:111 FireflyTeam#9 0x00000000005363bc in thread__new (pid=2, tid=2) at util/thread.c:57 FireflyTeam#10 0x0000000000523da0 in ____machine__findnew_thread (machine=0xbfde38, threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:457 FireflyTeam#11 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38, ... The failing assertion is this one: REFCOUNT_WARN(!refcount_inc_not_zero(r), ... The problem is that we keep global comm_str_root list, which is accessed by multiple threads during the 'perf top' startup and following 2 paths can race: thread 1: ... thread__new comm__new comm_str__findnew down_write(&comm_str_lock); __comm_str__findnew comm_str__get thread 2: ... comm__override or comm__free comm_str__put refcount_dec_and_test down_write(&comm_str_lock); rb_erase(&cs->rb_node, &comm_str_root); Because thread 2 first decrements the refcnt and only after then it removes the struct comm_str from the list, the thread 1 can find this object on the list with refcnt equls to 0 and hit the assert. This patch fixes the thread 1 __comm_str__findnew path, by ignoring objects that already dropped the refcnt to 0. For the rest of the objects we take the refcnt before comparing its name and release it afterwards with comm_str__put, which can also release the object completely. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Lukasz Odzioba <lukasz.odzioba@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20180720101740.GA27176@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 11, 2019
Petr Machata says: ==================== A test for mirror-to-gretap with team in UL packet path This patchset adds a test for "tc action mirred mirror" where the mirrored-to device is a gretap, and underlay path contains a team device. In patch FireflyTeam#1 require_command() is added, which should henceforth be used to declare dependence on a certain tool. In patch FireflyTeam#2, two new functions, team_create() and team_destroy(), are added to lib.sh. The newly-added test uses arping, which isn't necessarily available. Therefore patch FireflyTeam#3 introduces $ARPING, and a preexisting test is fixed to require_command $ARPING. In patches FireflyTeam#4 and FireflyTeam#5, two new tests are added. In both cases, a team device is on egress path of a mirrored packet in a mirror-to-gretap scenario. In the first one, the team device is in loadbalance mode, in the second one it's in lacp mode. (The difference in modes necessitates a different testing strategy, hence two test cases instead of just parameterizing one.) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 11, 2019
Petr Machata says: ==================== ipv4: Control SKB reprioritization after forwarding After IPv4 packets are forwarded, the priority of the corresponding SKB is updated according to the TOS field of IPv4 header. This overrides any prioritization done earlier by e.g. an skbedit action or ingress-qos-map defined at a vlan device. Such overriding may not always be desirable. Even if the packet ends up being routed, which implies this is an L3 network node, an administrator may wish to preserve whatever prioritization was done earlier on in the pipeline. Therefore this patch set introduces a sysctl that controls this behavior, net.ipv4.ip_forward_update_priority. It's value is 1 by default to preserve the current behavior. All of the above is implemented in patch FireflyTeam#1. Value changes prompt a new NETEVENT_IPV4_FWD_UPDATE_PRIORITY_UPDATE notification, so that the drivers can hook up whatever logic may depend on this value. That is implemented in patch FireflyTeam#2. In patches FireflyTeam#3 and FireflyTeam#4, mlxsw is adapted to recognize the sysctl. On initialization, the RGCR register that handles router configuration is set in accordance with the sysctl. The new notification is listened to and RGCR is reconfigured as necessary. In patches FireflyTeam#5 to FireflyTeam#7, a selftest is added to verify that mlxsw reflects the sysctl value as necessary. The test is expressed in terms of the recently-introduced ieee_setapp support, and works by observing how DSCP value gets rewritten depending on packet priority. For this reason, the test is added to the subdirectory drivers/net/mlxsw. Even though it's not particularly specific to mlxsw, it's not suitable for running on soft devices (which don't support the ieee_setapp et.al.). Changes from v1 to v2: - In patch FireflyTeam#1, init sysctl_ip_fwd_update_priority to 1 instead of true. Changes from RFC to v1: - Fix wrong sysctl name in ip-sysctl.txt - Add notifications - Add mlxsw support - Add self test ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 11, 2019
This prepares the btrtl code so it can be used to initialize Bluetooth modules connected via UART (these are found for example on the RTL8723BS and RTL8723DS SDIO chips, which come with an embedded UART Bluetooth module). The Realtek "rtl8723bs_bt" and "rtl8723ds_bt" userspace Bluetooth UART initialization tools (rtk_hciattach) use the following sequence: 1) send H5 sync pattern (already supported by hci_h5) 2) get LMP version (already supported by btrtl) 3) get ROM version (already supported by btrtl) 4) load the firmware and config for the current chipset (already supported by btrtl) 5) read UART settings from the config blob (currently not supported) 6) send UART settings via a vendor command to the device (which changes the baudrate of the device and enables or disables flow control depending on the config) 7) change the baudrate and flow control settings on the host 8) send the firmware and config blob to the device (already supported by btrtl) The main reason why the initialization has to be split is step FireflyTeam#7. This requires changes to the underlying "bus", which should be kept outside of the "generic" btrtl driver. The idea for this split is borrowed from the btbcm driver but adjusted where needed (the btrtl driver for example needs two blobs: firmware and config, while the btbcm only needs one). This also prepares the code for step FireflyTeam#5 (parsing the config blob) by centralizing the code which loads the firmware and config blobs and storing the result in the new struct btrtl_device_info. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Jeremy Cline <jeremy@jcline.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 11, 2019
Guillaume Nault says: ==================== l2tp: rework pppol2tp ioctl handling The current ioctl() handling code can be simplified. It tests for non-relevant conditions and uselessly holds sockets. Once useless code is removed, it becomes even simpler to let pppol2tp_ioctl() handle commands directly, rather than dispatch them to pppol2tp_tunnel_ioctl() or pppol2tp_session_ioctl(). That is the approach taken by this series. Patch FireflyTeam#1 and FireflyTeam#2 define helper functions aimed at simplifying the rest of the patch set. Patch FireflyTeam#3 drops useless tests in pppol2p_ioctl() and avoid holding a refcount on the socket. Patches FireflyTeam#4, FireflyTeam#5 and FireflyTeam#6 are the core of the series. They let pppol2tp_ioctl() handle all ioctls and drop the tunnel and session specific functions. Then patch FireflyTeam#6 brings a little bit of consolidation. Finally, patch FireflyTeam#7 takes advantage of the simplified code to make pppol2tp sockets compatible with dev_ioctl(). Certainly not a killer feature, but it is trivial and it is always nice to see l2tp getting better integration with the rest of the stack. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 12, 2019
Currently, whenever a new node is created/re-used from the memhotplug path, we call free_area_init_node()->free_area_init_core(). But there is some code that we do not really need to run when we are coming from such path. free_area_init_core() performs the following actions: 1) Initializes pgdat internals, such as spinlock, waitqueues and more. 2) Account # nr_all_pages and # nr_kernel_pages. These values are used later on when creating hash tables. 3) Account number of managed_pages per zone, substracting dma_reserved and memmap pages. 4) Initializes some fields of the zone structure data 5) Calls init_currently_empty_zone to initialize all the freelists 6) Calls memmap_init to initialize all pages belonging to certain zone When called from memhotplug path, free_area_init_core() only performs actions FireflyTeam#1 and FireflyTeam#4. Action FireflyTeam#2 is pointless as the zones do not have any pages since either the node was freed, or we are re-using it, eitherway all zones belonging to this node should have 0 pages. For the same reason, action FireflyTeam#3 results always in manages_pages being 0. Action FireflyTeam#5 and FireflyTeam#6 are performed later on when onlining the pages: online_pages()->move_pfn_range_to_zone()->init_currently_empty_zone() online_pages()->move_pfn_range_to_zone()->memmap_init_zone() This patch does two things: First, moves the node/zone initializtion to their own function, so it allows us to create a small version of free_area_init_core, where we only perform: 1) Initialization of pgdat internals, such as spinlock, waitqueues and more 4) Initialization of some fields of the zone structure data These two functions are: pgdat_init_internals() and zone_init_internals(). The second thing this patch does, is to introduce free_area_init_core_hotplug(), the memhotplug version of free_area_init_core(): Currently, we call free_area_init_node() from the memhotplug path. In there, we set some pgdat's fields, and call calculate_node_totalpages(). calculate_node_totalpages() calculates the # of pages the node has. Since the node is either new, or we are re-using it, the zones belonging to this node should not have any pages, so there is no point to calculate this now. Actually, we re-set these values to 0 later on with the calls to: reset_node_managed_pages() reset_node_present_pages() The # of pages per node and the # of pages per zone will be calculated when onlining the pages: online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_zone_range() online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_pgdat_range() Also, since free_area_init_core/free_area_init_node will now only get called during early init, let us replace __paginginit with __init, so their code gets freed up. [osalvador@techadventures.net: fix section usage] Link: http://lkml.kernel.org/r/20180731101752.GA473@techadventures.net [osalvador@suse.de: v6] Link: http://lkml.kernel.org/r/20180801122348.21588-6-osalvador@techadventures.net Link: http://lkml.kernel.org/r/20180730101757.28058-5-osalvador@techadventures.net Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 12, 2019
Fixes a crash when the report encounters an address that could not be associated with an mmaped region: #0 0x00005555557bdc4a in callchain_srcline (ip=<error reading variable: Cannot access memory at address 0x38>, sym=0x0, map=0x0) at util/machine.c:2329 FireflyTeam#1 unwind_entry (entry=entry@entry=0x7fffffff9180, arg=arg@entry=0x7ffff5642498) at util/machine.c:2329 FireflyTeam#2 0x00005555558370af in entry (arg=0x7ffff5642498, cb=0x5555557bdb50 <unwind_entry>, thread=<optimized out>, ip=18446744073709551615) at util/unwind-libunwind-local.c:586 FireflyTeam#3 get_entries (ui=ui@entry=0x7fffffff9620, cb=0x5555557bdb50 <unwind_entry>, arg=0x7ffff5642498, max_stack=<optimized out>) at util/unwind-libunwind-local.c:703 FireflyTeam#4 0x0000555555837192 in _unwind__get_entries (cb=<optimized out>, arg=<optimized out>, thread=<optimized out>, data=<optimized out>, max_stack=<optimized out>) at util/unwind-libunwind-local.c:725 FireflyTeam#5 0x00005555557c310f in thread__resolve_callchain_unwind (max_stack=127, sample=0x7fffffff9830, evsel=0x555555c7b3b0, cursor=0x7ffff5642498, thread=0x555555c7f6f0) at util/machine.c:2351 FireflyTeam#6 thread__resolve_callchain (thread=0x555555c7f6f0, cursor=0x7ffff5642498, evsel=0x555555c7b3b0, sample=0x7fffffff9830, parent=0x7fffffff97b8, root_al=0x7fffffff9750, max_stack=127) at util/machine.c:2378 FireflyTeam#7 0x00005555557ba4ee in sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fffffff97b8, evsel=<optimized out>, al=al@entry=0x7fffffff9750, max_stack=<optimized out>) at util/callchain.c:1085 Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Tested-by: Sandipan Das <sandipan@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: 2a9d505 ("perf script: Show correct offsets for DWARF-based unwinding") Link: http://lkml.kernel.org/r/20180926135207.30263-1-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 12, 2019
This reverts commit d76c743. While commit d76c743 ("serial: 8250_dw: Fix runtime PM handling") fixes runtime PM handling when using kgdb, it introduces a traceback for everyone else. BUG: sleeping function called from invalid context at /mnt/host/source/src/third_party/kernel/next/drivers/base/power/runtime.c:1034 in_atomic(): 1, irqs_disabled(): 1, pid: 1, name: swapper/0 7 locks held by swapper/0/1: #0: 000000005ec5bc72 (&dev->mutex){....}, at: __driver_attach+0xb5/0x12b FireflyTeam#1: 000000005d5fa9e5 (&dev->mutex){....}, at: __device_attach+0x3e/0x15b FireflyTeam#2: 0000000047e93286 (serial_mutex){+.+.}, at: serial8250_register_8250_port+0x51/0x8bb FireflyTeam#3: 000000003b328f07 (port_mutex){+.+.}, at: uart_add_one_port+0xab/0x8b0 FireflyTeam#4: 00000000fa313d4d (&port->mutex){+.+.}, at: uart_add_one_port+0xcc/0x8b0 FireflyTeam#5: 00000000090983ca (console_lock){+.+.}, at: vprintk_emit+0xdb/0x217 FireflyTeam#6: 00000000c743e583 (console_owner){-...}, at: console_unlock+0x211/0x60f irq event stamp: 735222 __down_trylock_console_sem+0x4a/0x84 console_unlock+0x338/0x60f __do_softirq+0x4a4/0x50d irq_exit+0x64/0xe2 CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc5 FireflyTeam#6 Hardware name: Google Caroline/Caroline, BIOS Google_Caroline.7820.286.0 03/15/2017 Call Trace: dump_stack+0x7d/0xbd ___might_sleep+0x238/0x259 __pm_runtime_resume+0x4e/0xa4 ? serial8250_rpm_get+0x2e/0x44 serial8250_console_write+0x44/0x301 ? lock_acquire+0x1b8/0x1fa console_unlock+0x577/0x60f vprintk_emit+0x1f0/0x217 printk+0x52/0x6e register_console+0x43b/0x524 uart_add_one_port+0x672/0x8b0 ? set_io_from_upio+0x150/0x162 serial8250_register_8250_port+0x825/0x8bb dw8250_probe+0x80c/0x8b0 ? dw8250_serial_inq+0x8e/0x8e ? dw8250_check_lcr+0x108/0x108 ? dw8250_runtime_resume+0x5b/0x5b ? dw8250_serial_outq+0xa1/0xa1 ? dw8250_remove+0x115/0x115 platform_drv_probe+0x76/0xc5 really_probe+0x1f1/0x3ee ? driver_allows_async_probing+0x5d/0x5d driver_probe_device+0xd6/0x112 ? driver_allows_async_probing+0x5d/0x5d bus_for_each_drv+0xbe/0xe5 __device_attach+0xdd/0x15b bus_probe_device+0x5a/0x10b device_add+0x501/0x894 ? _raw_write_unlock+0x27/0x3a platform_device_add+0x224/0x2b7 mfd_add_device+0x718/0x75b ? __kmalloc+0x144/0x16a ? mfd_add_devices+0x38/0xdb mfd_add_devices+0x9b/0xdb intel_lpss_probe+0x7d4/0x8ee intel_lpss_pci_probe+0xac/0xd4 pci_device_probe+0x101/0x18e ... Revert the offending patch until a more comprehensive solution is available. Cc: Tony Lindgren <tony@atomide.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Phil Edworthy <phil.edworthy@renesas.com> Fixes: d76c743 ("serial: 8250_dw: Fix runtime PM handling") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
0lvin
pushed a commit
to free-z4u/roc-rk3328-cc-official
that referenced
this issue
Oct 12, 2019
When the function name for an inline frame is invalid, we must not try to demangle this symbol, otherwise we crash with: #0 0x0000555555895c01 in bfd_demangle () FireflyTeam#1 0x0000555555823262 in demangle_sym (dso=0x555555d92b90, elf_name=0x0, kmodule=0) at util/symbol-elf.c:215 FireflyTeam#2 dso__demangle_sym (dso=dso@entry=0x555555d92b90, kmodule=<optimized out>, kmodule@entry=0, elf_name=elf_name@entry=0x0) at util/symbol-elf.c:400 FireflyTeam#3 0x00005555557fef4b in new_inline_sym (funcname=0x0, base_sym=0x555555d92b90, dso=0x555555d92b90) at util/srcline.c:89 FireflyTeam#4 inline_list__append_dso_a2l (dso=dso@entry=0x555555c7bb00, node=node@entry=0x555555e31810, sym=sym@entry=0x555555d92b90) at util/srcline.c:264 FireflyTeam#5 0x00005555557ff27f in addr2line (dso_name=dso_name@entry=0x555555d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf", addr=addr@entry=2888, file=file@entry=0x0, line=line@entry=0x0, dso=dso@entry=0x555555c7bb00, unwind_inlines=unwind_inlines@entry=true, node=0x555555e31810, sym=0x555555d92b90) at util/srcline.c:313 FireflyTeam#6 0x00005555557ffe7c in addr2inlines (sym=0x555555d92b90, dso=0x555555c7bb00, addr=2888, dso_name=0x555555d92430 "/home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf") at util/srcline.c:358 So instead handle the case where we get invalid function names for inlined frames and use a fallback '??' function name instead. While this crash was originally reported by Hadrien for rust code, I can now also reproduce it with trivial C++ code. Indeed, it seems like libbfd fails to interpret the debug information for the inline frame symbol name: $ addr2line -e /home/milian/.debug/.build-id/f7/186d14bb94f3c6161c010926da66033d24fce5/elf -if b48 main /usr/include/c++/8.2.1/complex:610 ?? /usr/include/c++/8.2.1/complex:618 ?? /usr/include/c++/8.2.1/complex:675 ?? /usr/include/c++/8.2.1/complex:685 main /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 I've reported this bug upstream and also attached a patch there which should fix this issue: https://sourceware.org/bugzilla/show_bug.cgi?id=23715 Reported-by: Hadrien Grasland <grasland@lal.in2p3.fr> Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: a64489c ("perf report: Find the inline stack for a given address") [ The above 'Fixes:' cset is where originally the problem was introduced, i.e. using a2l->funcname without checking if it is NULL, but this current patch fixes the current codebase, i.e. multiple csets were applied after a64489c before the problem was reported by Hadrien ] Link: http://lkml.kernel.org/r/20180926135207.30263-3-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
T-Firefly
pushed a commit
that referenced
this issue
Jan 16, 2020
commit 3f93a4f upstream. It is possible for an irq triggered by channel0 to be received later after clks are disabled once firmware loaded during sdma probe. If that happens then clearing them by writing to SDMA_H_INTR won't work and the kernel will hang processing infinite interrupts. Actually, don't need interrupt triggered on channel0 since it's pollling SDMA_H_STATSTOP to know channel0 done rather than interrupt in current code, just clear BD_INTR to disable channel0 interrupt to avoid the above case. This issue was brought by commit 1d069bf ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler") which didn't take care the above case. Fixes: 1d069bf ("dmaengine: imx-sdma: ack channel 0 IRQ in the interrupt handler") Cc: stable@vger.kernel.org #5.0+ Signed-off-by: Robin Gong <yibin.gong@nxp.com> Reported-by: Sven Van Asbroeck <thesven73@gmail.com> Tested-by: Sven Van Asbroeck <thesven73@gmail.com> Reviewed-by: Michael Olbrich <m.olbrich@pengutronix.de> Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
T-Firefly
pushed a commit
that referenced
this issue
Jan 16, 2020
commit cf3591e upstream. Revert the commit bd293d0. The proper fix has been made available with commit d0a255e ("loop: set PF_MEMALLOC_NOIO for the worker thread"). Note that the fix offered by commit bd293d0 doesn't really prevent the deadlock from occuring - if we look at the stacktrace reported by Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex - i.e. it has already successfully taken the mutex. Changing the mutex from mutex_lock to mutex_trylock won't help with deadlocks that happen afterwards. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f #13 [ffff8813dedfbec0] kthread at ffffffff810a8428 #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Fixes: bd293d0 ("dm bufio: fix deadlock with loop device") Depends-on: d0a255e ("loop: set PF_MEMALLOC_NOIO for the worker thread") Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kraj
pushed a commit
to kraj/firefly-kernel
that referenced
this issue
Feb 28, 2021
commit c171654 upstream. stub_probe() calls put_busid_priv() in an error path when device isn't found in the busid_table. Fix it by making put_busid_priv() safe to be called with null struct bus_id_priv pointer. This problem happens when "usbip bind" is run without loading usbip_host driver and then running modprobe. The first failed bind attempt unbinds the device from the original driver and when usbip_host is modprobed, stub_probe() runs and doesn't find the device in its busid table and calls put_busid_priv(0 with null bus_id_priv pointer. usbip-host 3-10.2: 3-10.2 is not in match_busid table... skip! [ 367.359679] ===================================== [ 367.359681] WARNING: bad unlock balance detected! [ 367.359683] 4.17.0-rc4+ FireflyTeam#5 Not tainted [ 367.359685] ------------------------------------- [ 367.359688] modprobe/2768 is trying to release lock ( [ 367.359689] ================================================================== [ 367.359696] BUG: KASAN: null-ptr-deref in print_unlock_imbalance_bug+0x99/0x110 [ 367.359699] Read of size 8 at addr 0000000000000058 by task modprobe/2768 [ 367.359705] CPU: 4 PID: 2768 Comm: modprobe Not tainted 4.17.0-rc4+ FireflyTeam#5 Fixes: 2207655 ("usbip: usbip_host: fix NULL-ptr deref and use-after-free errors") in usb-linus Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kraj
pushed a commit
to kraj/firefly-kernel
that referenced
this issue
Feb 28, 2021
[ Upstream commit 2c0aa08 ] Scenario: 1. Port down and do fail over 2. Ap do rds_bind syscall PID: 47039 TASK: ffff89887e2fe640 CPU: 47 COMMAND: "kworker/u:6" #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9 FireflyTeam#1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3 FireflyTeam#2 [ffff898e35f15b30] oops_end at ffffffff8150f518 FireflyTeam#3 [ffff898e35f15b60] no_context at ffffffff8104854c FireflyTeam#4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675 FireflyTeam#5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3 FireflyTeam#6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8 FireflyTeam#7 [ffff898e35f15d10] page_fault at ffffffff8150ea95 [exception RIP: unknown or invalid address] RIP: 0000000000000000 RSP: ffff898e35f15dc8 RFLAGS: 00010282 RAX: 00000000fffffffe RBX: ffff889b77f6fc00 RCX:ffffffff81c99d88 RDX: 0000000000000000 RSI: ffff896019ee08e8 RDI:ffff889b77f6fc00 RBP: ffff898e35f15df0 R8: ffff896019ee08c8 R9:0000000000000000 R10: 0000000000000400 R11: 0000000000000000 R12:ffff896019ee08c0 R13: ffff889b77f6fe68 R14: ffffffff81c99d80 R15: ffffffffa022a1e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 FireflyTeam#8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm] FireflyTeam#9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6 FireflyTeam#10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0 FireflyTeam#11 [ffff898e35f15ee8] kthread at ffffffff81090fe6 PID: 45659 TASK: ffff880d313d2500 CPU: 31 COMMAND: "oracle_45659_ap" #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4 FireflyTeam#1 [ffff881024ccfd40] schedule at ffffffff8150c2cf FireflyTeam#2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7 FireflyTeam#3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb FireflyTeam#4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm] FireflyTeam#5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma] FireflyTeam#6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds] FireflyTeam#7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds] FireflyTeam#8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670 PID: 45659 PID: 47039 rds_ib_laddr_check /* create id_priv with a null event_handler */ rdma_create_id rdma_bind_addr cma_acquire_dev /* add id_priv to cma_dev->id_list */ cma_attach_to_dev cma_ndev_work_handler /* event_hanlder is null */ id_priv->id.event_handler Signed-off-by: Guanglei Li <guanglei.li@oracle.com> Signed-off-by: Honglei Wang <honglei.wang@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kraj
pushed a commit
to kraj/firefly-kernel
that referenced
this issue
Feb 28, 2021
[ Upstream commit 2bbea6e ] when mounting an ISO filesystem sometimes (very rarely) the system hangs because of a race condition between two tasks. PID: 6766 TASK: ffff88007b2a6dd0 CPU: 0 COMMAND: "mount" #0 [ffff880078447ae0] __schedule at ffffffff8168d605 FireflyTeam#1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49 FireflyTeam#2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995 FireflyTeam#3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef FireflyTeam#4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod] FireflyTeam#5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50 FireflyTeam#6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3 FireflyTeam#7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs] FireflyTeam#8 [ffff880078447da8] mount_bdev at ffffffff81202570 FireflyTeam#9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs] FireflyTeam#10 [ffff880078447e28] mount_fs at ffffffff81202d09 FireflyTeam#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f FireflyTeam#12 [ffff880078447ea8] do_mount at ffffffff81220fee FireflyTeam#13 [ffff880078447f28] sys_mount at ffffffff812218d6 FireflyTeam#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49 RIP: 00007fd9ea914e9a RSP: 00007ffd5d9bf648 RFLAGS: 00010246 RAX: 00000000000000a5 RBX: ffffffff81698c49 RCX: 0000000000000010 RDX: 00007fd9ec2bc210 RSI: 00007fd9ec2bc290 RDI: 00007fd9ec2bcf30 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000000010 R10: 00000000c0ed0001 R11: 0000000000000206 R12: 00007fd9ec2bc040 R13: 00007fd9eb6b2380 R14: 00007fd9ec2bc210 R15: 00007fd9ec2bcf30 ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b This task was trying to mount the cdrom. It allocated and configured a super_block struct and owned the write-lock for the super_block->s_umount rwsem. While exclusively owning the s_umount lock, it called sr_block_ioctl and waited to acquire the global sr_mutex lock. PID: 6785 TASK: ffff880078720fb0 CPU: 0 COMMAND: "systemd-udevd" #0 [ffff880078417898] __schedule at ffffffff8168d605 FireflyTeam#1 [ffff880078417900] schedule at ffffffff8168dc59 FireflyTeam#2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605 FireflyTeam#3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838 FireflyTeam#4 [ffff8800784179d0] down_read at ffffffff8168cde0 FireflyTeam#5 [ffff8800784179e8] get_super at ffffffff81201cc7 FireflyTeam#6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de FireflyTeam#7 [ffff880078417a40] flush_disk at ffffffff8123a94b FireflyTeam#8 [ffff880078417a88] check_disk_change at ffffffff8123ab50 FireflyTeam#9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom] FireflyTeam#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod] FireflyTeam#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86 FireflyTeam#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65 FireflyTeam#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b FireflyTeam#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7 FireflyTeam#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf FireflyTeam#16 [ffff880078417d00] do_last at ffffffff8120d53d rockchip-linux#17 [ffff880078417db0] path_openat at ffffffff8120e6b2 rockchip-linux#18 [ffff880078417e48] do_filp_open at ffffffff8121082b rockchip-linux#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33 rockchip-linux#20 [ffff880078417f70] sys_open at ffffffff811fde4e rockchip-linux#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49 RIP: 00007f29438b0c20 RSP: 00007ffc76624b78 RFLAGS: 00010246 RAX: 0000000000000002 RBX: ffffffff81698c49 RCX: 0000000000000000 RDX: 00007f2944a5fa70 RSI: 00000000000a0800 RDI: 00007f2944a5fa70 RBP: 00007f2944a5f540 R8: 0000000000000000 R9: 0000000000000020 R10: 00007f2943614c40 R11: 0000000000000246 R12: ffffffff811fde4e R13: ffff880078417f78 R14: 000000000000000c R15: 00007f2944a4b010 ORIG_RAX: 0000000000000002 CS: 0033 SS: 002b This task tried to open the cdrom device, the sr_block_open function acquired the global sr_mutex lock. The call to check_disk_change() then saw an event flag indicating a possible media change and tried to flush any cached data for the device. As part of the flush, it tried to acquire the super_block->s_umount lock associated with the cdrom device. This was the same super_block as created and locked by the previous task. The first task acquires the s_umount lock and then the sr_mutex_lock; the second task acquires the sr_mutex_lock and then the s_umount lock. This patch fixes the issue by moving check_disk_change() out of cdrom_open() and let the caller take care of it. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kraj
pushed a commit
to kraj/firefly-kernel
that referenced
this issue
Feb 28, 2021
commit b23220f upstream. The balloon.page field is used for two different purposes if batching is on or off. If batching is on, the field point to the page which is used to communicate with with the hypervisor. If it is off, balloon.page points to the page that is about to be (un)locked. Unfortunately, this dual-purpose of the field introduced a bug: when the balloon is popped (e.g., when the machine is reset or the balloon driver is explicitly removed), the balloon driver frees, unconditionally, the page that is held in balloon.page. As a result, if batching is disabled, this leads to double freeing the last page that is sent to the hypervisor. The following error occurs during rmmod when kernel checkers are on, and the balloon is not empty: [ 42.307653] ------------[ cut here ]------------ [ 42.307657] Kernel BUG at ffffffffba1e4b28 [verbose debug info unavailable] [ 42.307720] invalid opcode: 0000 [FireflyTeam#1] SMP DEBUG_PAGEALLOC [ 42.312512] Modules linked in: vmw_vsock_vmci_transport vsock ppdev joydev vmw_balloon(-) input_leds serio_raw vmw_vmci parport_pc shpchp parport i2c_piix4 nfit mac_hid autofs4 vmwgfx drm_kms_helper hid_generic syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid ttm mptspi scsi_transport_spi ahci mptscsih drm psmouse vmxnet3 libahci mptbase pata_acpi [ 42.312766] CPU: 10 PID: 1527 Comm: rmmod Not tainted 4.12.0+ FireflyTeam#5 [ 42.312803] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2016 [ 42.313042] task: ffff9bf9680f8000 task.stack: ffffbfefc1638000 [ 42.313290] RIP: 0010:__free_pages+0x38/0x40 [ 42.313510] RSP: 0018:ffffbfefc163be98 EFLAGS: 00010246 [ 42.313731] RAX: 000000000000003e RBX: ffffffffc02b9720 RCX: 0000000000000006 [ 42.313972] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9bf97e08e0a0 [ 42.314201] RBP: ffffbfefc163be98 R08: 0000000000000000 R09: 0000000000000000 [ 42.314435] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc02b97e4 [ 42.314505] R13: ffffffffc02b9748 R14: ffffffffc02b9728 R15: 0000000000000200 [ 42.314550] FS: 00007f3af5fec700(0000) GS:ffff9bf97e080000(0000) knlGS:0000000000000000 [ 42.314599] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 42.314635] CR2: 00007f44f6f4ab24 CR3: 00000003a7d12000 CR4: 00000000000006e0 [ 42.314864] Call Trace: [ 42.315774] vmballoon_pop+0x102/0x130 [vmw_balloon] [ 42.315816] vmballoon_exit+0x42/0xd64 [vmw_balloon] [ 42.315853] SyS_delete_module+0x1e2/0x250 [ 42.315891] entry_SYSCALL_64_fastpath+0x23/0xc2 [ 42.315924] RIP: 0033:0x7f3af5b0e8e7 [ 42.315949] RSP: 002b:00007fffe6ce0148 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 42.315996] RAX: ffffffffffffffda RBX: 000055be676401e0 RCX: 00007f3af5b0e8e7 [ 42.316951] RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055be67640248 [ 42.317887] RBP: 0000000000000003 R08: 0000000000000000 R09: 1999999999999999 [ 42.318845] R10: 0000000000000883 R11: 0000000000000206 R12: 00007fffe6cdf130 [ 42.319755] R13: 0000000000000000 R14: 0000000000000000 R15: 000055be676401e0 [ 42.320606] Code: c0 74 1c f0 ff 4f 1c 74 02 5d c3 85 f6 74 07 e8 0f d8 ff ff 5d c3 31 f6 e8 c6 fb ff ff 5d c3 48 c7 c6 c8 0f c5 ba e8 58 be 02 00 <0f> 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 85 ff 75 01 c3 55 48 [ 42.323462] RIP: __free_pages+0x38/0x40 RSP: ffffbfefc163be98 [ 42.325735] ---[ end trace 872e008e33f81508 ]--- To solve the bug, we eliminate the dual purpose of balloon.page. Fixes: f220a80 ("VMware balloon: add batching to the vmw_balloon.") Cc: stable@vger.kernel.org Reported-by: Oleksandr Natalenko <onatalen@redhat.com> Signed-off-by: Gil Kupfer <gilkup@gmail.com> Signed-off-by: Nadav Amit <namit@vmware.com> Reviewed-by: Xavier Deguillard <xdeguillard@vmware.com> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kraj
pushed a commit
to kraj/firefly-kernel
that referenced
this issue
Feb 28, 2021
commit ba062eb upstream. Three attributes are currently not verified, thus can trigger KMSAN warnings such as : BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline] BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline] BUG: KMSAN: uninit-value in nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268 CPU: 1 PID: 4521 Comm: syz-executor120 Not tainted 4.17.0+ FireflyTeam#5 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1117 __msan_warning_32+0x70/0xc0 mm/kmsan/kmsan_instr.c:620 __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline] __fswab32 include/uapi/linux/swab.h:59 [inline] nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268 nfnetlink_rcv_msg+0xb2e/0xc80 net/netfilter/nfnetlink.c:212 netlink_rcv_skb+0x37e/0x600 net/netlink/af_netlink.c:2448 nfnetlink_rcv+0x2fe/0x680 net/netfilter/nfnetlink.c:513 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1680/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x43fd59 RSP: 002b:00007ffde0e30d28 EFLAGS: 00000213 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fd59 RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8 R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401680 R13: 0000000000401710 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315 kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322 slab_post_alloc_hook mm/slab.h:446 [inline] slab_alloc_node mm/slub.c:2753 [inline] __kmalloc_node_track_caller+0xb35/0x11b0 mm/slub.c:4395 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:988 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline] netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: fdb694a ("netfilter: Add fail-open support") Fixes: 829e17a ("[NETFILTER]: nfnetlink_queue: allow changing queue length through netlink") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kraj
pushed a commit
to kraj/firefly-kernel
that referenced
this issue
Feb 28, 2021
commit 89da619 upstream. Kernel panic when with high memory pressure, calltrace looks like, PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java" #0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb FireflyTeam#1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942 FireflyTeam#2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30 FireflyTeam#3 [ffff881ec7ed7778] oops_end at ffffffff816902c8 FireflyTeam#4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46 FireflyTeam#5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc FireflyTeam#6 [ffff881ec7ed7838] __node_set at ffffffff81680300 FireflyTeam#7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f FireflyTeam#8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5 FireflyTeam#9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8 [exception RIP: _raw_spin_lock_irqsave+47] RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8 RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008 RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098 R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 It happens in the pagefault and results in double pagefault during compacting pages when memory allocation fails. Analysed the vmcore, the page leads to second pagefault is corrupted with _mapcount=-256, but private=0. It's caused by the race between migration and ballooning, and lock missing in virtballoon_migratepage() of virtio_balloon driver. This patch fix the bug. Fixes: e225042 ("virtio_balloon: introduce migration primitives to balloon pages") Cc: stable@vger.kernel.org Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Huang Chong <huang.chong@zte.com.cn> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kraj
pushed a commit
to kraj/firefly-kernel
that referenced
this issue
Feb 28, 2021
commit 17e8354 upstream. Fix the following kernel bug: kernel BUG at drivers/iommu/intel-iommu.c:3260! invalid opcode: 0000 [FireflyTeam#5] PREEMPT SMP Hardware name: Intel Corp. Harcuvar/Server, BIOS HAVLCRB0.X64.0013.D39.1608311820 08/31/2016 task: ffff880175389950 ti: ffff880176bec000 task.ti: ffff880176bec000 RIP: 0010:[<ffffffff8150a83b>] [<ffffffff8150a83b>] intel_unmap+0x25b/0x260 RSP: 0018:ffff880176bef5e8 EFLAGS: 00010296 RAX: 0000000000000024 RBX: ffff8800773c7c88 RCX: 000000000000ce04 RDX: 0000000080000000 RSI: 0000000000000000 RDI: 0000000000000009 RBP: ffff880176bef638 R08: 0000000000000010 R09: 0000000000000004 R10: ffff880175389c78 R11: 0000000000000a4f R12: ffff8800773c7868 R13: 00000000ffffac88 R14: ffff8800773c7818 R15: 0000000000000001 FS: 00007fef21258700(0000) GS:ffff88017b5c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000066d6d8 CR3: 000000007118c000 CR4: 00000000003406e0 Stack: 00000000ffffac88 ffffffff8199867f ffff880176bef5f8 ffff880100000030 ffff880176bef668 ffff8800773c7c88 ffff880178288098 ffff8800772c0010 ffff8800773c7818 0000000000000001 ffff880176bef648 ffffffff8150a86e Call Trace: [<ffffffff8199867f>] ? printk+0x46/0x48 [<ffffffff8150a86e>] intel_unmap_page+0xe/0x10 [<ffffffffa039d99b>] ismt_access+0x27b/0x8fa [i2c_ismt] [<ffffffff81554420>] ? __pm_runtime_suspend+0xa0/0xa0 [<ffffffff815544a0>] ? pm_suspend_timer_fn+0x80/0x80 [<ffffffff81554420>] ? __pm_runtime_suspend+0xa0/0xa0 [<ffffffff815544a0>] ? pm_suspend_timer_fn+0x80/0x80 [<ffffffff8143dfd0>] ? pci_bus_read_dev_vendor_id+0xf0/0xf0 [<ffffffff8172b36c>] i2c_smbus_xfer+0xec/0x4b0 [<ffffffff810aa4d5>] ? vprintk_emit+0x345/0x530 [<ffffffffa038936b>] i2cdev_ioctl_smbus+0x12b/0x240 [i2c_dev] [<ffffffff810aa829>] ? vprintk_default+0x29/0x40 [<ffffffffa0389b33>] i2cdev_ioctl+0x63/0x1ec [i2c_dev] [<ffffffff811b04c8>] do_vfs_ioctl+0x328/0x5d0 [<ffffffff8119d8ec>] ? vfs_write+0x11c/0x190 [<ffffffff8109d449>] ? rt_up_read+0x19/0x20 [<ffffffff811b07f1>] SyS_ioctl+0x81/0xa0 [<ffffffff819a351b>] system_call_fastpath+0x16/0x6e This happen When run "i2cdetect -y 0" detect SMBus iSMT adapter. After finished I2C block read/write, when unmap the data buffer, a wrong device address was pass to dma_unmap_single(). To fix this, give dma_unmap_single() the "dev" parameter, just like what dma_map_single() does, then unmap can find the right devices. Fixes: 13f35ac ("i2c: Adding support for Intel iSMT SMBus 2.0 host controller") Signed-off-by: Liwei Song <liwei.song@windriver.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kraj
pushed a commit
to kraj/firefly-kernel
that referenced
this issue
Feb 28, 2021
commit 45c180b upstream. struct xfrm_userpolicy_type has two holes, so we should not use C99 style initializer. KMSAN report: BUG: KMSAN: kernel-infoleak in copyout lib/iov_iter.c:140 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x1b14/0x2800 lib/iov_iter.c:571 CPU: 1 PID: 4520 Comm: syz-executor841 Not tainted 4.17.0+ FireflyTeam#5 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1117 kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1211 kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1253 copyout lib/iov_iter.c:140 [inline] _copy_to_iter+0x1b14/0x2800 lib/iov_iter.c:571 copy_to_iter include/linux/uio.h:106 [inline] skb_copy_datagram_iter+0x422/0xfa0 net/core/datagram.c:431 skb_copy_datagram_msg include/linux/skbuff.h:3268 [inline] netlink_recvmsg+0x6f1/0x1900 net/netlink/af_netlink.c:1959 sock_recvmsg_nosec net/socket.c:802 [inline] sock_recvmsg+0x1d6/0x230 net/socket.c:809 ___sys_recvmsg+0x3fe/0x810 net/socket.c:2279 __sys_recvmmsg+0x58e/0xe30 net/socket.c:2391 do_sys_recvmmsg+0x2a6/0x3e0 net/socket.c:2472 __do_sys_recvmmsg net/socket.c:2485 [inline] __se_sys_recvmmsg net/socket.c:2481 [inline] __x64_sys_recvmmsg+0x15d/0x1c0 net/socket.c:2481 do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x446ce9 RSP: 002b:00007fc307918db8 EFLAGS: 00000293 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 00000000006dbc24 RCX: 0000000000446ce9 RDX: 000000000000000a RSI: 0000000020005040 RDI: 0000000000000003 RBP: 00000000006dbc20 R08: 0000000020004e40 R09: 0000000000000000 R10: 0000000040000000 R11: 0000000000000293 R12: 0000000000000000 R13: 00007ffc8d2df32f R14: 00007fc3079199c0 R15: 0000000000000001 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_save_stack mm/kmsan/kmsan.c:294 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685 kmsan_memcpy_origins+0x11d/0x170 mm/kmsan/kmsan.c:527 __msan_memcpy+0x109/0x160 mm/kmsan/kmsan_instr.c:413 __nla_put lib/nlattr.c:569 [inline] nla_put+0x276/0x340 lib/nlattr.c:627 copy_to_user_policy_type net/xfrm/xfrm_user.c:1678 [inline] dump_one_policy+0xbe1/0x1090 net/xfrm/xfrm_user.c:1708 xfrm_policy_walk+0x45a/0xd00 net/xfrm/xfrm_policy.c:1013 xfrm_dump_policy+0x1c0/0x2a0 net/xfrm/xfrm_user.c:1749 netlink_dump+0x9b5/0x1550 net/netlink/af_netlink.c:2226 __netlink_dump_start+0x1131/0x1270 net/netlink/af_netlink.c:2323 netlink_dump_start include/linux/netlink.h:214 [inline] xfrm_user_rcv_msg+0x8a3/0x9b0 net/xfrm/xfrm_user.c:2577 netlink_rcv_skb+0x37e/0x600 net/netlink/af_netlink.c:2448 xfrm_netlink_rcv+0xb2/0xf0 net/xfrm/xfrm_user.c:2598 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1680/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Local variable description: ----upt.i@dump_one_policy Variable was created at: dump_one_policy+0x78/0x1090 net/xfrm/xfrm_user.c:1689 xfrm_policy_walk+0x45a/0xd00 net/xfrm/xfrm_policy.c:1013 Byte 130 of 137 is uninitialized Memory access starts at ffff88019550407f Fixes: c0144be ("[XFRM] netlink: Use nla_put()/NLA_PUT() variantes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kraj
pushed a commit
to kraj/firefly-kernel
that referenced
this issue
Feb 28, 2021
…eadlock commit 0ee223b upstream. A long time ago the unfortunate decision was taken to add a self-deletion attribute to the sysfs SCSI device directory. That decision was unfortunate because self-deletion is really tricky. We can't drop that attribute because widely used user space software depends on it, namely the rescan-scsi-bus.sh script. Hence this patch that avoids that writing into that attribute triggers a deadlock. See also commit 7973cbd9fbd9 ("[PATCH] add sysfs attributes to scan and delete scsi_devices"). This patch avoids that self-removal triggers the following deadlock: ====================================================== WARNING: possible circular locking dependency detected 4.18.0-rc2-dbg+ FireflyTeam#5 Not tainted ------------------------------------------------------ modprobe/6539 is trying to acquire lock: 000000008323c4cd (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x45/0x90 but task is already holding lock: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> FireflyTeam#1 (&shost->scan_mutex){+.+.}: __mutex_lock+0xfe/0xc70 mutex_lock_nested+0x1b/0x20 scsi_remove_device+0x26/0x40 [scsi_mod] sdev_store_delete+0x27/0x30 [scsi_mod] dev_attr_store+0x3e/0x50 sysfs_kf_write+0x87/0xa0 kernfs_fop_write+0x190/0x230 __vfs_write+0xd2/0x3b0 vfs_write+0x101/0x270 ksys_write+0xab/0x120 __x64_sys_write+0x43/0x50 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (kn->count#202){++++}: lock_acquire+0xd2/0x260 __kernfs_remove+0x424/0x4a0 kernfs_remove_by_name_ns+0x45/0x90 remove_files.isra.1+0x3a/0x90 sysfs_remove_group+0x5c/0xc0 sysfs_remove_groups+0x39/0x60 device_remove_attrs+0x82/0xb0 device_del+0x251/0x580 __scsi_remove_device+0x19f/0x1d0 [scsi_mod] scsi_forget_host+0x37/0xb0 [scsi_mod] scsi_remove_host+0x9b/0x150 [scsi_mod] sdebug_driver_remove+0x4b/0x150 [scsi_debug] device_release_driver_internal+0x241/0x360 device_release_driver+0x12/0x20 bus_remove_device+0x1bc/0x290 device_del+0x259/0x580 device_unregister+0x1a/0x70 sdebug_remove_adapter+0x8b/0xf0 [scsi_debug] scsi_debug_exit+0x76/0xe8 [scsi_debug] __x64_sys_delete_module+0x1c1/0x280 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&shost->scan_mutex); lock(kn->count#202); lock(&shost->scan_mutex); lock(kn->count#202); *** DEADLOCK *** 2 locks held by modprobe/6539: #0: 00000000efaf9298 (&dev->mutex){....}, at: device_release_driver_internal+0x68/0x360 FireflyTeam#1: 00000000a6ec2c69 (&shost->scan_mutex){+.+.}, at: scsi_remove_host+0x21/0x150 [scsi_mod] stack backtrace: CPU: 10 PID: 6539 Comm: modprobe Not tainted 4.18.0-rc2-dbg+ FireflyTeam#5 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xa4/0xf5 print_circular_bug.isra.34+0x213/0x221 __lock_acquire+0x1a7e/0x1b50 lock_acquire+0xd2/0x260 __kernfs_remove+0x424/0x4a0 kernfs_remove_by_name_ns+0x45/0x90 remove_files.isra.1+0x3a/0x90 sysfs_remove_group+0x5c/0xc0 sysfs_remove_groups+0x39/0x60 device_remove_attrs+0x82/0xb0 device_del+0x251/0x580 __scsi_remove_device+0x19f/0x1d0 [scsi_mod] scsi_forget_host+0x37/0xb0 [scsi_mod] scsi_remove_host+0x9b/0x150 [scsi_mod] sdebug_driver_remove+0x4b/0x150 [scsi_debug] device_release_driver_internal+0x241/0x360 device_release_driver+0x12/0x20 bus_remove_device+0x1bc/0x290 device_del+0x259/0x580 device_unregister+0x1a/0x70 sdebug_remove_adapter+0x8b/0xf0 [scsi_debug] scsi_debug_exit+0x76/0xe8 [scsi_debug] __x64_sys_delete_module+0x1c1/0x280 do_syscall_64+0x77/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe See also https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg54525.html. Fixes: ac0ece9 ("scsi: use device_remove_file_self() instead of device_schedule_callback()") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
SquallATF
pushed a commit
to SquallATF/kernel
that referenced
this issue
May 31, 2022
commit 443f2d5 upstream. Observe a segmentation fault when 'perf stat' is asked to repeat forever with the interval option. Without fix: # perf stat -r 0 -I 5000 -e cycles -a sleep 10 # time counts unit events 5.000211692 3,13,89,82,34,157 cycles 10.000380119 1,53,98,52,22,294 cycles 10.040467280 17,16,79,265 cycles Segmentation fault This problem was only observed when we use forever option aka -r 0 and works with limited repeats. Calling print_counter with ts being set to NULL, is not a correct option when interval is set. Hence avoid print_counter(NULL,..) if interval is set. With fix: # perf stat -r 0 -I 5000 -e cycles -a sleep 10 # time counts unit events 5.019866622 3,15,14,43,08,697 cycles 10.039865756 3,15,16,31,95,261 cycles 10.059950628 1,26,05,47,158 cycles 5.009902655 3,14,52,62,33,932 cycles 10.019880228 3,14,52,22,89,154 cycles 10.030543876 66,90,18,333 cycles 5.009848281 3,14,51,98,25,437 cycles 10.029854402 3,15,14,93,04,918 cycles 5.009834177 3,14,51,95,92,316 cycles Committer notes: Did the 'git bisect' to find the cset introducing the problem to add the Fixes tag below, and at that time the problem reproduced as: (gdb) run stat -r0 -I500 sleep 1 <SNIP> Program received signal SIGSEGV, Segmentation fault. print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866 866 sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep); (gdb) bt #0 print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866 FireflyTeam#1 0x000000000041860a in print_counters (ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffd640) at builtin-stat.c:938 FireflyTeam#2 0x0000000000419a7f in cmd_stat (argc=2, argv=0x7fffffffd640, prefix=<optimized out>) at builtin-stat.c:1411 FireflyTeam#3 0x000000000045c65a in run_builtin (p=p@entry=0x6291b8 <commands+216>, argc=argc@entry=5, argv=argv@entry=0x7fffffffd640) at perf.c:370 FireflyTeam#4 0x000000000045c893 in handle_internal_command (argc=5, argv=0x7fffffffd640) at perf.c:429 FireflyTeam#5 0x000000000045c8f1 in run_argv (argcp=argcp@entry=0x7fffffffd4ac, argv=argv@entry=0x7fffffffd4a0) at perf.c:473 FireflyTeam#6 0x000000000045cac9 in main (argc=<optimized out>, argv=<optimized out>) at perf.c:588 (gdb) Mostly the same as just before this patch: Program received signal SIGSEGV, Segmentation fault. 0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964 964 sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep); (gdb) bt #0 0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964 FireflyTeam#1 0x0000000000588047 in perf_evlist__print_counters (evlist=0xbc9b90, config=0xa1f2a0 <stat_config>, _target=0xa1f0c0 <target>, ts=0x0, argc=2, argv=0x7fffffffd670) at util/stat-display.c:1172 FireflyTeam#2 0x000000000045390f in print_counters (ts=0x0, argc=2, argv=0x7fffffffd670) at builtin-stat.c:656 FireflyTeam#3 0x0000000000456bb5 in cmd_stat (argc=2, argv=0x7fffffffd670) at builtin-stat.c:1960 FireflyTeam#4 0x00000000004dd2e0 in run_builtin (p=0xa30e00 <commands+288>, argc=5, argv=0x7fffffffd670) at perf.c:310 FireflyTeam#5 0x00000000004dd54d in handle_internal_command (argc=5, argv=0x7fffffffd670) at perf.c:362 FireflyTeam#6 0x00000000004dd694 in run_argv (argcp=0x7fffffffd4cc, argv=0x7fffffffd4c0) at perf.c:406 FireflyTeam#7 0x00000000004dda11 in main (argc=5, argv=0x7fffffffd670) at perf.c:531 (gdb) Fixes: d4f63a4 ("perf stat: Introduce print_counters function") Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # v4.2+ Link: http://lore.kernel.org/lkml/20190904094738.9558-3-srikar@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SquallATF
pushed a commit
to SquallATF/kernel
that referenced
this issue
May 31, 2022
commit a8fabb3 upstream. In case a user process using xenbus has open transactions and is killed e.g. via ctrl-C the following cleanup of the allocated resources might result in a deadlock due to trying to end a transaction in the xenbus worker thread: [ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds. [ 2551.492215] Tainted: P OE 5.0.0-29-generic FireflyTeam#5 [ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2551.528585] xenbus D 0 37 2 0x80000080 [ 2551.528590] Call Trace: [ 2551.528603] __schedule+0x2c0/0x870 [ 2551.528606] ? _cond_resched+0x19/0x40 [ 2551.528632] schedule+0x2c/0x70 [ 2551.528637] xs_talkv+0x1ec/0x2b0 [ 2551.528642] ? wait_woken+0x80/0x80 [ 2551.528645] xs_single+0x53/0x80 [ 2551.528648] xenbus_transaction_end+0x3b/0x70 [ 2551.528651] xenbus_file_free+0x5a/0x160 [ 2551.528654] xenbus_dev_queue_reply+0xc4/0x220 [ 2551.528657] xenbus_thread+0x7de/0x880 [ 2551.528660] ? wait_woken+0x80/0x80 [ 2551.528665] kthread+0x121/0x140 [ 2551.528667] ? xb_read+0x1d0/0x1d0 [ 2551.528670] ? kthread_park+0x90/0x90 [ 2551.528673] ret_from_fork+0x35/0x40 Fix this by doing the cleanup via a workqueue instead. Reported-by: James Dingwall <james@dingwall.me.uk> Fixes: fd8aa90 ("xen: optimize xenbus driver for multiple concurrent xenstore accesses") Cc: <stable@vger.kernel.org> # 4.11 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SquallATF
pushed a commit
to SquallATF/kernel
that referenced
this issue
May 31, 2022
[ Upstream commit 0216234 ] We release wrong pointer on error path in cpu_cache_level__read function, leading to segfault: (gdb) r record ls Starting program: /root/perf/tools/perf/perf record ls ... [ perf record: Woken up 1 times to write data ] double free or corruption (out) Thread 1 "perf" received signal SIGABRT, Aborted. 0x00007ffff7463798 in raise () from /lib64/power9/libc.so.6 (gdb) bt #0 0x00007ffff7463798 in raise () from /lib64/power9/libc.so.6 FireflyTeam#1 0x00007ffff7443bac in abort () from /lib64/power9/libc.so.6 FireflyTeam#2 0x00007ffff74af8bc in __libc_message () from /lib64/power9/libc.so.6 FireflyTeam#3 0x00007ffff74b92b8 in malloc_printerr () from /lib64/power9/libc.so.6 FireflyTeam#4 0x00007ffff74bb874 in _int_free () from /lib64/power9/libc.so.6 FireflyTeam#5 0x0000000010271260 in __zfree (ptr=0x7fffffffa0b0) at ../../lib/zalloc.. FireflyTeam#6 0x0000000010139340 in cpu_cache_level__read (cache=0x7fffffffa090, cac.. FireflyTeam#7 0x0000000010143c90 in build_caches (cntp=0x7fffffffa118, size=<optimiz.. ... Releasing the proper pointer. Fixes: 720e98b ("perf tools: Add perf data cache feature") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org: # v4.6+ Link: http://lore.kernel.org/lkml/20190912105235.10689-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Why I get 2 video devices(/dev/video0 & /dev/video1) when I plug in a usb camera ?
I get same result when run
cat /sys/class/video4linux/video*/name
The text was updated successfully, but these errors were encountered: