-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel crash caused by network setting. #17
Comments
Looks similar to issue #8; I suspect we have a similar cause here. |
It would be caused by a abnormal situation. From the beginning, we try to bring up the interface, but the NCSI interface is down for the attempt. The NCSI stack notifies the ftgmac100 driver about that and the driver shutdown itself down by freeing all BDs/buffers. However, the interface is still up. After that, the proceeding "ifconfig eth0 down" tries to close the NCSI dev. For that, the NCSI stack tries to send packets. Unfortunately, there're no available BDs/buffers any more. The solution would be: (1) Return error when there're no available BDs/buffer or (2) Don't send NCSI packets when closing NCSI dev, or (3) The NIC and NCSI interface should have consistent state ("down" for the case). |
The problem was reproduced successfully with private code: Force returning errcode from ncsi_select_active_channel(). The root cause (as last update) is still applied. I'll figure out the fix. |
Any news on the fix? I tried it today and no crash, but the network doesn't come back either |
We think that this is fixed with the lock fix. Please re-open if you see it again |
commit 236cf17 upstream. When we allocate bitmaps in vgic_vcpu_init_maps, we divide the number of bits we need by 8 to figure out how many bytes to allocate. However, bitmap elements are always accessed as unsigned longs, and if we didn't happen to allocate a size such that size % sizeof(unsigned long) == 0, bitmap accesses may go past the end of the allocation. When using KASAN (which does byte-granular access checks), this results in a continuous stream of BUGs whenever these bitmaps are accessed: ============================================================================= BUG kmalloc-128 (Tainted: G B ): kasan: bad access detected ----------------------------------------------------------------------------- INFO: Allocated in vgic_init.part.25+0x55c/0x990 age=7493 cpu=3 pid=1730 INFO: Slab 0xffffffbde6d5da40 objects=16 used=15 fp=0xffffffc935769700 flags=0x4000000000000080 INFO: Object 0xffffffc935769500 @offset=1280 fp=0x (null) Bytes b4 ffffffc9357694f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffffffc935769500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffffffc935769510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffffffc935769520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffffffc935769530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffffffc935769540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffffffc935769550: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffffffc935769560: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Object ffffffc935769570: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Padding ffffffc9357695b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Padding ffffffc9357695c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Padding ffffffc9357695d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Padding ffffffc9357695e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Padding ffffffc9357695f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ CPU: 3 PID: 1740 Comm: kvm-vcpu-0 Tainted: G B 4.4.0+ openbmc#17 Hardware name: ARM Juno development board (r1) (DT) Call trace: [<ffffffc00008e770>] dump_backtrace+0x0/0x280 [<ffffffc00008ea04>] show_stack+0x14/0x20 [<ffffffc000726360>] dump_stack+0x100/0x188 [<ffffffc00030d324>] print_trailer+0xfc/0x168 [<ffffffc000312294>] object_err+0x3c/0x50 [<ffffffc0003140fc>] kasan_report_error+0x244/0x558 [<ffffffc000314548>] __asan_report_load8_noabort+0x48/0x50 [<ffffffc000745688>] __bitmap_or+0xc0/0xc8 [<ffffffc0000d9e44>] kvm_vgic_flush_hwstate+0x1bc/0x650 [<ffffffc0000c514c>] kvm_arch_vcpu_ioctl_run+0x2ec/0xa60 [<ffffffc0000b9a6c>] kvm_vcpu_ioctl+0x474/0xa68 [<ffffffc00036b7b0>] do_vfs_ioctl+0x5b8/0xcb0 [<ffffffc00036bf34>] SyS_ioctl+0x8c/0xa0 [<ffffffc000086cb0>] el0_svc_naked+0x24/0x28 Memory state around the buggy address: ffffffc935769400: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffffffc935769480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffffffc935769500: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffffffc935769580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffffffc935769600: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Fix the issue by always allocating a multiple of sizeof(unsigned long), as we do elsewhere in the vgic code. Fixes: c1bfb57 ("arm/arm64: KVM: vgic: switch to dynamic allocation") Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The EFI firmware on Macs contains a full-fledged network stack for downloading OS X images from osrecovery.apple.com. Unfortunately on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331 wireless card on every boot and leaves it enabled even after ExitBootServices has been called. The card continues to assert its IRQ line, causing spurious interrupts if the IRQ is shared. It also corrupts memory by DMAing received packets, allowing for remote code execution over the air. This only stops when a driver is loaded for the wireless card, which may be never if the driver is not installed or blacklisted. The issue seems to be constrained to the Broadcom 4331. Chris Milsted has verified that the newer Broadcom 4360 built into the MacBookPro11,3 (2013/2014) does not exhibit this behaviour. The chances that Apple will ever supply a firmware fix for the older machines appear to be zero. The solution is to reset the card on boot by writing to a reset bit in its mmio space. This must be done as an early quirk and not as a plain vanilla PCI quirk to successfully combat memory corruption by DMAed packets: Matthew Garrett found out in 2012 that the packets are written to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html). This type of memory is made available to the page allocator by efi_free_boot_services(). Plain vanilla PCI quirks run much later, in subsys initcall level. In-between a time window would be open for memory corruption. Random crashes occurring in this time window and attributed to DMAed packets have indeed been observed in the wild by Chris Bainbridge. When Matthew Garrett analyzed the memory corruption issue in 2012, he sought to fix it with a grub quirk which transitions the card to D3hot: http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56 This approach does not help users with other bootloaders and while it may prevent DMAed packets, it does not cure the spurious interrupts emanating from the card. Unfortunately the card's mmio space is inaccessible in D3hot, so to reset it, we have to undo the effect of Matthew's grub patch and transition the card back to D0. Note that the quirk takes a few shortcuts to reduce the amount of code: The size of BAR 0 and the location of the PM capability is identical on all affected machines and therefore hardcoded. Only the address of BAR 0 differs between models. Also, it is assumed that the BCMA core currently mapped is the 802.11 core. The EFI driver seems to always take care of this. Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback towards finding the best solution to this problem. The following should be a comprehensive list of affected models: iMac13,1 2012 21.5" [Root Port 00:1c.3 = 8086:1e16] iMac13,2 2012 27" [Root Port 00:1c.3 = 8086:1e16] Macmini5,1 2011 i5 2.3 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini5,2 2011 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini5,3 2011 i7 2.0 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini6,1 2012 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1e12] Macmini6,2 2012 i7 2.3 GHz [Root Port 00:1c.1 = 8086:1e12] MacBookPro8,1 2011 13" [Root Port 00:1c.1 = 8086:1c12] MacBookPro8,2 2011 15" [Root Port 00:1c.1 = 8086:1c12] MacBookPro8,3 2011 17" [Root Port 00:1c.1 = 8086:1c12] MacBookPro9,1 2012 15" [Root Port 00:1c.1 = 8086:1e12] MacBookPro9,2 2012 13" [Root Port 00:1c.1 = 8086:1e12] MacBookPro10,1 2012 15" [Root Port 00:1c.1 = 8086:1e12] MacBookPro10,2 2012 13" [Root Port 00:1c.1 = 8086:1e12] For posterity, spurious interrupts caused by the Broadcom 4331 wireless card resulted in splats like this (stacktrace omitted): irq 17: nobody cared (try booting with the "irqpoll" option) handlers: [<ffffffff81374370>] pcie_isr [<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci] [<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec] Disabling IRQ openbmc#17 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732 Tested-by: Konstantin Simanov <k.simanov@stlk.ru> # [MacBookPro8,1] Tested-by: Lukas Wunner <lukas@wunner.de> # [MacBookPro9,1] Tested-by: Bryan Paradis <bryan.paradis@gmail.com> # [MacBookPro9,2] Tested-by: Andrew Worsley <amworsley@gmail.com> # [MacBookPro10,1] Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2] Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Rafał Miłecki <zajec5@gmail.com> Acked-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris Milsted <cmilsted@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Michael Buesch <m@bues.ch> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: b43-dev@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: stable@vger.kernel.org Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Apply nvidia_bugs quirk only on root bus Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Reintroduce scanning of secondary buses Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de [ Did minor readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
commit abb2baf upstream. The EFI firmware on Macs contains a full-fledged network stack for downloading OS X images from osrecovery.apple.com. Unfortunately on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331 wireless card on every boot and leaves it enabled even after ExitBootServices has been called. The card continues to assert its IRQ line, causing spurious interrupts if the IRQ is shared. It also corrupts memory by DMAing received packets, allowing for remote code execution over the air. This only stops when a driver is loaded for the wireless card, which may be never if the driver is not installed or blacklisted. The issue seems to be constrained to the Broadcom 4331. Chris Milsted has verified that the newer Broadcom 4360 built into the MacBookPro11,3 (2013/2014) does not exhibit this behaviour. The chances that Apple will ever supply a firmware fix for the older machines appear to be zero. The solution is to reset the card on boot by writing to a reset bit in its mmio space. This must be done as an early quirk and not as a plain vanilla PCI quirk to successfully combat memory corruption by DMAed packets: Matthew Garrett found out in 2012 that the packets are written to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html). This type of memory is made available to the page allocator by efi_free_boot_services(). Plain vanilla PCI quirks run much later, in subsys initcall level. In-between a time window would be open for memory corruption. Random crashes occurring in this time window and attributed to DMAed packets have indeed been observed in the wild by Chris Bainbridge. When Matthew Garrett analyzed the memory corruption issue in 2012, he sought to fix it with a grub quirk which transitions the card to D3hot: http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56 This approach does not help users with other bootloaders and while it may prevent DMAed packets, it does not cure the spurious interrupts emanating from the card. Unfortunately the card's mmio space is inaccessible in D3hot, so to reset it, we have to undo the effect of Matthew's grub patch and transition the card back to D0. Note that the quirk takes a few shortcuts to reduce the amount of code: The size of BAR 0 and the location of the PM capability is identical on all affected machines and therefore hardcoded. Only the address of BAR 0 differs between models. Also, it is assumed that the BCMA core currently mapped is the 802.11 core. The EFI driver seems to always take care of this. Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback towards finding the best solution to this problem. The following should be a comprehensive list of affected models: iMac13,1 2012 21.5" [Root Port 00:1c.3 = 8086:1e16] iMac13,2 2012 27" [Root Port 00:1c.3 = 8086:1e16] Macmini5,1 2011 i5 2.3 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini5,2 2011 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini5,3 2011 i7 2.0 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini6,1 2012 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1e12] Macmini6,2 2012 i7 2.3 GHz [Root Port 00:1c.1 = 8086:1e12] MacBookPro8,1 2011 13" [Root Port 00:1c.1 = 8086:1c12] MacBookPro8,2 2011 15" [Root Port 00:1c.1 = 8086:1c12] MacBookPro8,3 2011 17" [Root Port 00:1c.1 = 8086:1c12] MacBookPro9,1 2012 15" [Root Port 00:1c.1 = 8086:1e12] MacBookPro9,2 2012 13" [Root Port 00:1c.1 = 8086:1e12] MacBookPro10,1 2012 15" [Root Port 00:1c.1 = 8086:1e12] MacBookPro10,2 2012 13" [Root Port 00:1c.1 = 8086:1e12] For posterity, spurious interrupts caused by the Broadcom 4331 wireless card resulted in splats like this (stacktrace omitted): irq 17: nobody cared (try booting with the "irqpoll" option) handlers: [<ffffffff81374370>] pcie_isr [<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci] [<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec] Disabling IRQ #17 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732 Tested-by: Konstantin Simanov <k.simanov@stlk.ru> # [MacBookPro8,1] Tested-by: Lukas Wunner <lukas@wunner.de> # [MacBookPro9,1] Tested-by: Bryan Paradis <bryan.paradis@gmail.com> # [MacBookPro9,2] Tested-by: Andrew Worsley <amworsley@gmail.com> # [MacBookPro10,1] Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2] Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Rafał Miłecki <zajec5@gmail.com> Acked-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris Milsted <cmilsted@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Michael Buesch <m@bues.ch> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: b43-dev@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linux-wireless@vger.kernel.org Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de [ Did minor readability edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 45caeaa ] As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: openbmc#8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . openbmc#9 [] tcp_rcv_established at ffffffff81580b64 openbmc#10 [] tcp_v4_do_rcv at ffffffff8158b54a openbmc#11 [] tcp_v4_rcv at ffffffff8158cd02 openbmc#12 [] ip_local_deliver_finish at ffffffff815668f4 openbmc#13 [] ip_local_deliver at ffffffff81566bd9 openbmc#14 [] ip_rcv_finish at ffffffff8156656d openbmc#15 [] ip_rcv at ffffffff81566f06 openbmc#16 [] __netif_receive_skb_core at ffffffff8152b3a2 openbmc#17 [] __netif_receive_skb at ffffffff8152b608 openbmc#18 [] netif_receive_skb at ffffffff8152b690 openbmc#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] openbmc#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] openbmc#21 [] net_rx_action at ffffffff8152bac2 openbmc#22 [] __do_softirq at ffffffff81084b4f openbmc#23 [] call_softirq at ffffffff8164845c openbmc#24 [] do_softirq at ffffffff81016fc5 openbmc#25 [] irq_exit at ffffffff81084ee5 openbmc#26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
------------[ cut here ]------------ WARNING: CPU: 4 PID: 5280 at /home/kernel/linux/arch/x86/kvm//vmx.c:11394 nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel] CPU: 4 PID: 5280 Comm: qemu-system-x86 Tainted: G W OE 4.13.0+ openbmc#17 RIP: 0010:nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel] Call Trace: ? emulator_read_emulated+0x15/0x20 [kvm] ? segmented_read+0xae/0xf0 [kvm] vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel] ? vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel] x86_emulate_instruction+0x733/0x810 [kvm] vmx_handle_exit+0x2f4/0xda0 [kvm_intel] ? kvm_arch_vcpu_ioctl_run+0xd2f/0x1c60 [kvm] kvm_arch_vcpu_ioctl_run+0xdab/0x1c60 [kvm] ? kvm_arch_vcpu_load+0x62/0x230 [kvm] kvm_vcpu_ioctl+0x340/0x700 [kvm] ? kvm_vcpu_ioctl+0x340/0x700 [kvm] ? __fget+0xfc/0x210 do_vfs_ioctl+0xa4/0x6a0 ? __fget+0x11d/0x210 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x23/0xc2 A nested #PF is triggered during L0 emulating instruction for L2. However, it doesn't consider we should not break L1's vmlauch/vmresme. This patch fixes it by queuing the #PF exception instead ,requesting an immediate VM exit from L2 and keeping the exception for L1 pending for a subsequent nested VM exit. This should actually work all the time, making vmx_inject_page_fault_nested totally unnecessary. However, that's not working yet, so this patch can work around the issue in the meanwhile. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
commit 36eb8ff upstream. Crash dump shows following instructions crash> bt PID: 0 TASK: ffffffffbe412480 CPU: 0 COMMAND: "swapper/0" #0 [ffff891ee0003868] machine_kexec at ffffffffbd063ef1 #1 [ffff891ee00038c8] __crash_kexec at ffffffffbd12b6f2 #2 [ffff891ee0003998] crash_kexec at ffffffffbd12c84c #3 [ffff891ee00039b8] oops_end at ffffffffbd030f0a #4 [ffff891ee00039e0] no_context at ffffffffbd074643 #5 [ffff891ee0003a40] __bad_area_nosemaphore at ffffffffbd07496e #6 [ffff891ee0003a90] bad_area_nosemaphore at ffffffffbd074a64 #7 [ffff891ee0003aa0] __do_page_fault at ffffffffbd074b0a #8 [ffff891ee0003b18] do_page_fault at ffffffffbd074fc8 #9 [ffff891ee0003b50] page_fault at ffffffffbda01925 [exception RIP: qlt_schedule_sess_for_deletion+15] RIP: ffffffffc02e526f RSP: ffff891ee0003c08 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc0307847 RDX: 00000000000020e6 RSI: ffff891edbc377c8 RDI: 0000000000000000 RBP: ffff891ee0003c18 R8: ffffffffc02f0b20 R9: 0000000000000250 R10: 0000000000000258 R11: 000000000000b780 R12: ffff891ed9b43000 R13: 00000000000000f0 R14: 0000000000000006 R15: ffff891edbc377c8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff891ee0003c20] qla2x00_fcport_event_handler at ffffffffc02853d3 [qla2xxx] #11 [ffff891ee0003cf0] __dta_qla24xx_async_gnl_sp_done_333 at ffffffffc0285a1d [qla2xxx] #12 [ffff891ee0003de8] qla24xx_process_response_queue at ffffffffc02a2eb5 [qla2xxx] #13 [ffff891ee0003e88] qla24xx_msix_rsp_q at ffffffffc02a5403 [qla2xxx] #14 [ffff891ee0003ec0] __handle_irq_event_percpu at ffffffffbd0f4c59 #15 [ffff891ee0003f10] handle_irq_event_percpu at ffffffffbd0f4e02 #16 [ffff891ee0003f40] handle_irq_event at ffffffffbd0f4e90 #17 [ffff891ee0003f68] handle_edge_irq at ffffffffbd0f8984 #18 [ffff891ee0003f88] handle_irq at ffffffffbd0305d5 #19 [ffff891ee0003fb8] do_IRQ at ffffffffbda02a18 --- <IRQ stack> --- #20 [ffffffffbe403d30] ret_from_intr at ffffffffbda0094e [exception RIP: unknown or invalid address] RIP: 000000000000001f RSP: 0000000000000000 RFLAGS: fff3b8c2091ebb3f RAX: ffffbba5a0000200 RBX: 0000be8cdfa8f9fa RCX: 0000000000000018 RDX: 0000000000000101 RSI: 000000000000015d RDI: 0000000000000193 RBP: 0000000000000083 R8: ffffffffbe403e38 R9: 0000000000000002 R10: 0000000000000000 R11: ffffffffbe56b820 R12: ffff891ee001cf00 R13: ffffffffbd11c0a4 R14: ffffffffbe403d60 R15: 0000000000000001 ORIG_RAX: ffff891ee0022ac0 CS: 0000 SS: ffffffffffffffb9 bt: WARNING: possibly bogus exception frame #21 [ffffffffbe403dd8] cpuidle_enter_state at ffffffffbd67c6fd #22 [ffffffffbe403e40] cpuidle_enter at ffffffffbd67c907 #23 [ffffffffbe403e50] call_cpuidle at ffffffffbd0d98f3 #24 [ffffffffbe403e60] do_idle at ffffffffbd0d9b42 #25 [ffffffffbe403e98] cpu_startup_entry at ffffffffbd0d9da3 #26 [ffffffffbe403ec0] rest_init at ffffffffbd81d4aa #27 [ffffffffbe403ed0] start_kernel at ffffffffbe67d2ca #28 [ffffffffbe403f28] x86_64_start_reservations at ffffffffbe67c675 #29 [ffffffffbe403f38] x86_64_start_kernel at ffffffffbe67c6eb #30 [ffffffffbe403f50] secondary_startup_64 at ffffffffbd0000d5 Fixes: 040036b ("scsi: qla2xxx: Delay loop id allocation at login") Cc: <stable@vger.kernel.org> # v4.17+ Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the warning below by calling rhashtable_lookup_fast. Also, make some code movements for better quality and human readability. [ 342.450870] WARNING: suspicious RCU usage [ 342.455856] 4.18.0-rc2+ openbmc#17 Tainted: G O [ 342.462210] ----------------------------- [ 342.467202] ./include/linux/rhashtable.h:481 suspicious rcu_dereference_check() usage! [ 342.476568] [ 342.476568] other info that might help us debug this: [ 342.476568] [ 342.486978] [ 342.486978] rcu_scheduler_active = 2, debug_locks = 1 [ 342.495211] 4 locks held by modprobe/3934: [ 342.500265] #0: 00000000e23116b2 (mlx5_intf_mutex){+.+.}, at: mlx5_unregister_interface+0x18/0x90 [mlx5_core] [ 342.511953] openbmc#1: 00000000ca16db96 (rtnl_mutex){+.+.}, at: unregister_netdev+0xe/0x20 [ 342.521109] openbmc#2: 00000000a46e2c4b (&priv->state_lock){+.+.}, at: mlx5e_close+0x29/0x60 [mlx5_core] [ 342.531642] openbmc#3: 0000000060c5bde3 (mem_id_lock){+.+.}, at: xdp_rxq_info_unreg+0x93/0x6b0 [ 342.541206] [ 342.541206] stack backtrace: [ 342.547075] CPU: 12 PID: 3934 Comm: modprobe Tainted: G O 4.18.0-rc2+ openbmc#17 [ 342.556621] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015 [ 342.565606] Call Trace: [ 342.568861] dump_stack+0x78/0xb3 [ 342.573086] xdp_rxq_info_unreg+0x3f5/0x6b0 [ 342.578285] ? __call_rcu+0x220/0x300 [ 342.582911] mlx5e_free_rq+0x38/0xc0 [mlx5_core] [ 342.588602] mlx5e_close_channel+0x20/0x120 [mlx5_core] [ 342.594976] mlx5e_close_channels+0x26/0x40 [mlx5_core] [ 342.601345] mlx5e_close_locked+0x44/0x50 [mlx5_core] [ 342.607519] mlx5e_close+0x42/0x60 [mlx5_core] [ 342.613005] __dev_close_many+0xb1/0x120 [ 342.617911] dev_close_many+0xa2/0x170 [ 342.622622] rollback_registered_many+0x148/0x460 [ 342.628401] ? __lock_acquire+0x48d/0x11b0 [ 342.633498] ? unregister_netdev+0xe/0x20 [ 342.638495] rollback_registered+0x56/0x90 [ 342.643588] unregister_netdevice_queue+0x7e/0x100 [ 342.649461] unregister_netdev+0x18/0x20 [ 342.654362] mlx5e_remove+0x2a/0x50 [mlx5_core] [ 342.659944] mlx5_remove_device+0xe5/0x110 [mlx5_core] [ 342.666208] mlx5_unregister_interface+0x39/0x90 [mlx5_core] [ 342.673038] cleanup+0x5/0xbfc [mlx5_core] [ 342.678094] __x64_sys_delete_module+0x16b/0x240 [ 342.683725] ? do_syscall_64+0x1c/0x210 [ 342.688476] do_syscall_64+0x5a/0x210 [ 342.693025] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 8d5d885 ("xdp: rhashtable with allocator ID to pointer mapping") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
commit a5ba1d9 upstream. We have reports of the following crash: PID: 7 TASK: ffff88085c6d61c0 CPU: 1 COMMAND: "kworker/u25:0" #0 [ffff88085c6db710] machine_kexec at ffffffff81046239 #1 [ffff88085c6db760] crash_kexec at ffffffff810fc248 #2 [ffff88085c6db830] oops_end at ffffffff81008ae7 #3 [ffff88085c6db860] no_context at ffffffff81050b8f #4 [ffff88085c6db8b0] __bad_area_nosemaphore at ffffffff81050d75 #5 [ffff88085c6db900] bad_area_nosemaphore at ffffffff81050e83 #6 [ffff88085c6db910] __do_page_fault at ffffffff8105132e #7 [ffff88085c6db9b0] do_page_fault at ffffffff8105152c #8 [ffff88085c6db9c0] page_fault at ffffffff81a3f122 [exception RIP: uart_put_char+149] RIP: ffffffff814b67b5 RSP: ffff88085c6dba78 RFLAGS: 00010006 RAX: 0000000000000292 RBX: ffffffff827c5120 RCX: 0000000000000081 RDX: 0000000000000000 RSI: 000000000000005f RDI: ffffffff827c5120 RBP: ffff88085c6dba98 R8: 000000000000012c R9: ffffffff822ea320 R10: ffff88085fe4db04 R11: 0000000000000001 R12: ffff881059f9c000 R13: 0000000000000001 R14: 000000000000005f R15: 0000000000000fba ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88085c6dbaa0] tty_put_char at ffffffff81497544 #10 [ffff88085c6dbac0] do_output_char at ffffffff8149c91c #11 [ffff88085c6dbae0] __process_echoes at ffffffff8149cb8b #12 [ffff88085c6dbb30] commit_echoes at ffffffff8149cdc2 #13 [ffff88085c6dbb60] n_tty_receive_buf_fast at ffffffff8149e49b #14 [ffff88085c6dbbc0] __receive_buf at ffffffff8149ef5a #15 [ffff88085c6dbc20] n_tty_receive_buf_common at ffffffff8149f016 #16 [ffff88085c6dbca0] n_tty_receive_buf2 at ffffffff8149f194 #17 [ffff88085c6dbcb0] flush_to_ldisc at ffffffff814a238a #18 [ffff88085c6dbd50] process_one_work at ffffffff81090be2 #19 [ffff88085c6dbe20] worker_thread at ffffffff81091b4d #20 [ffff88085c6dbeb0] kthread at ffffffff81096384 #21 [ffff88085c6dbf50] ret_from_fork at ffffffff81a3d69f after slogging through some dissasembly: ffffffff814b6720 <uart_put_char>: ffffffff814b6720: 55 push %rbp ffffffff814b6721: 48 89 e5 mov %rsp,%rbp ffffffff814b6724: 48 83 ec 20 sub $0x20,%rsp ffffffff814b6728: 48 89 1c 24 mov %rbx,(%rsp) ffffffff814b672c: 4c 89 64 24 08 mov %r12,0x8(%rsp) ffffffff814b6731: 4c 89 6c 24 10 mov %r13,0x10(%rsp) ffffffff814b6736: 4c 89 74 24 18 mov %r14,0x18(%rsp) ffffffff814b673b: e8 b0 8e 58 00 callq ffffffff81a3f5f0 <mcount> ffffffff814b6740: 4c 8b a7 88 02 00 00 mov 0x288(%rdi),%r12 ffffffff814b6747: 45 31 ed xor %r13d,%r13d ffffffff814b674a: 41 89 f6 mov %esi,%r14d ffffffff814b674d: 49 83 bc 24 70 01 00 cmpq $0x0,0x170(%r12) ffffffff814b6754: 00 00 ffffffff814b6756: 49 8b 9c 24 80 01 00 mov 0x180(%r12),%rbx ffffffff814b675d: 00 ffffffff814b675e: 74 2f je ffffffff814b678f <uart_put_char+0x6f> ffffffff814b6760: 48 89 df mov %rbx,%rdi ffffffff814b6763: e8 a8 67 58 00 callq ffffffff81a3cf10 <_raw_spin_lock_irqsave> ffffffff814b6768: 41 8b 8c 24 78 01 00 mov 0x178(%r12),%ecx ffffffff814b676f: 00 ffffffff814b6770: 89 ca mov %ecx,%edx ffffffff814b6772: f7 d2 not %edx ffffffff814b6774: 41 03 94 24 7c 01 00 add 0x17c(%r12),%edx ffffffff814b677b: 00 ffffffff814b677c: 81 e2 ff 0f 00 00 and $0xfff,%edx ffffffff814b6782: 75 23 jne ffffffff814b67a7 <uart_put_char+0x87> ffffffff814b6784: 48 89 c6 mov %rax,%rsi ffffffff814b6787: 48 89 df mov %rbx,%rdi ffffffff814b678a: e8 e1 64 58 00 callq ffffffff81a3cc70 <_raw_spin_unlock_irqrestore> ffffffff814b678f: 44 89 e8 mov %r13d,%eax ffffffff814b6792: 48 8b 1c 24 mov (%rsp),%rbx ffffffff814b6796: 4c 8b 64 24 08 mov 0x8(%rsp),%r12 ffffffff814b679b: 4c 8b 6c 24 10 mov 0x10(%rsp),%r13 ffffffff814b67a0: 4c 8b 74 24 18 mov 0x18(%rsp),%r14 ffffffff814b67a5: c9 leaveq ffffffff814b67a6: c3 retq ffffffff814b67a7: 49 8b 94 24 70 01 00 mov 0x170(%r12),%rdx ffffffff814b67ae: 00 ffffffff814b67af: 48 63 c9 movslq %ecx,%rcx ffffffff814b67b2: 41 b5 01 mov $0x1,%r13b ffffffff814b67b5: 44 88 34 0a mov %r14b,(%rdx,%rcx,1) ffffffff814b67b9: 41 8b 94 24 78 01 00 mov 0x178(%r12),%edx ffffffff814b67c0: 00 ffffffff814b67c1: 83 c2 01 add $0x1,%edx ffffffff814b67c4: 81 e2 ff 0f 00 00 and $0xfff,%edx ffffffff814b67ca: 41 89 94 24 78 01 00 mov %edx,0x178(%r12) ffffffff814b67d1: 00 ffffffff814b67d2: eb b0 jmp ffffffff814b6784 <uart_put_char+0x64> ffffffff814b67d4: 66 66 66 2e 0f 1f 84 data32 data32 nopw %cs:0x0(%rax,%rax,1) ffffffff814b67db: 00 00 00 00 00 for our build, this is crashing at: circ->buf[circ->head] = c; Looking in uart_port_startup(), it seems that circ->buf (state->xmit.buf) protected by the "per-port mutex", which based on uart_port_check() is state->port.mutex. Indeed, the lock acquired in uart_put_char() is uport->lock, i.e. not the same lock. Anyway, since the lock is not acquired, if uart_shutdown() is called, the last chunk of that function may release state->xmit.buf before its assigned to null, and cause the race above. To fix it, let's lock uport->lock when allocating/deallocating state->xmit.buf in addition to the per-port mutex. v2: switch to locking uport->lock on allocation/deallocation instead of locking the per-port mutex in uart_put_char. Note that since uport->lock is a spin lock, we have to switch the allocation to GFP_ATOMIC. v3: move the allocation outside the lock, so we can switch back to GFP_KERNEL Signed-off-by: Tycho Andersen <tycho@tycho.ws> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 21b172e ] Fix the warning below by calling rhashtable_lookup_fast. Also, make some code movements for better quality and human readability. [ 342.450870] WARNING: suspicious RCU usage [ 342.455856] 4.18.0-rc2+ #17 Tainted: G O [ 342.462210] ----------------------------- [ 342.467202] ./include/linux/rhashtable.h:481 suspicious rcu_dereference_check() usage! [ 342.476568] [ 342.476568] other info that might help us debug this: [ 342.476568] [ 342.486978] [ 342.486978] rcu_scheduler_active = 2, debug_locks = 1 [ 342.495211] 4 locks held by modprobe/3934: [ 342.500265] #0: 00000000e23116b2 (mlx5_intf_mutex){+.+.}, at: mlx5_unregister_interface+0x18/0x90 [mlx5_core] [ 342.511953] #1: 00000000ca16db96 (rtnl_mutex){+.+.}, at: unregister_netdev+0xe/0x20 [ 342.521109] #2: 00000000a46e2c4b (&priv->state_lock){+.+.}, at: mlx5e_close+0x29/0x60 [mlx5_core] [ 342.531642] #3: 0000000060c5bde3 (mem_id_lock){+.+.}, at: xdp_rxq_info_unreg+0x93/0x6b0 [ 342.541206] [ 342.541206] stack backtrace: [ 342.547075] CPU: 12 PID: 3934 Comm: modprobe Tainted: G O 4.18.0-rc2+ #17 [ 342.556621] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015 [ 342.565606] Call Trace: [ 342.568861] dump_stack+0x78/0xb3 [ 342.573086] xdp_rxq_info_unreg+0x3f5/0x6b0 [ 342.578285] ? __call_rcu+0x220/0x300 [ 342.582911] mlx5e_free_rq+0x38/0xc0 [mlx5_core] [ 342.588602] mlx5e_close_channel+0x20/0x120 [mlx5_core] [ 342.594976] mlx5e_close_channels+0x26/0x40 [mlx5_core] [ 342.601345] mlx5e_close_locked+0x44/0x50 [mlx5_core] [ 342.607519] mlx5e_close+0x42/0x60 [mlx5_core] [ 342.613005] __dev_close_many+0xb1/0x120 [ 342.617911] dev_close_many+0xa2/0x170 [ 342.622622] rollback_registered_many+0x148/0x460 [ 342.628401] ? __lock_acquire+0x48d/0x11b0 [ 342.633498] ? unregister_netdev+0xe/0x20 [ 342.638495] rollback_registered+0x56/0x90 [ 342.643588] unregister_netdevice_queue+0x7e/0x100 [ 342.649461] unregister_netdev+0x18/0x20 [ 342.654362] mlx5e_remove+0x2a/0x50 [mlx5_core] [ 342.659944] mlx5_remove_device+0xe5/0x110 [mlx5_core] [ 342.666208] mlx5_unregister_interface+0x39/0x90 [mlx5_core] [ 342.673038] cleanup+0x5/0xbfc [mlx5_core] [ 342.678094] __x64_sys_delete_module+0x16b/0x240 [ 342.683725] ? do_syscall_64+0x1c/0x210 [ 342.688476] do_syscall_64+0x5a/0x210 [ 342.693025] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 8d5d885 ("xdp: rhashtable with allocator ID to pointer mapping") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 34043d2 ] Matteo reported the following splat, testing the datapath of TC 'sample': BUG: KASAN: null-ptr-deref in tcf_sample_act+0xc4/0x310 Read of size 8 at addr 0000000000000000 by task nc/433 CPU: 0 PID: 433 Comm: nc Not tainted 4.19.0-rc3-kvm #17 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 Call Trace: kasan_report.cold.6+0x6c/0x2fa tcf_sample_act+0xc4/0x310 ? dev_hard_start_xmit+0x117/0x180 tcf_action_exec+0xa3/0x160 tcf_classify+0xdd/0x1d0 htb_enqueue+0x18e/0x6b0 ? deref_stack_reg+0x7a/0xb0 ? htb_delete+0x4b0/0x4b0 ? unwind_next_frame+0x819/0x8f0 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 __dev_queue_xmit+0x722/0xca0 ? unwind_get_return_address_ptr+0x50/0x50 ? netdev_pick_tx+0xe0/0xe0 ? save_stack+0x8c/0xb0 ? kasan_kmalloc+0xbe/0xd0 ? __kmalloc_track_caller+0xe4/0x1c0 ? __kmalloc_reserve.isra.45+0x24/0x70 ? __alloc_skb+0xdd/0x2e0 ? sk_stream_alloc_skb+0x91/0x3b0 ? tcp_sendmsg_locked+0x71b/0x15a0 ? tcp_sendmsg+0x22/0x40 ? __sys_sendto+0x1b0/0x250 ? __x64_sys_sendto+0x6f/0x80 ? do_syscall_64+0x5d/0x150 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 ? __sys_sendto+0x1b0/0x250 ? __x64_sys_sendto+0x6f/0x80 ? do_syscall_64+0x5d/0x150 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 ip_finish_output2+0x495/0x590 ? ip_copy_metadata+0x2e0/0x2e0 ? skb_gso_validate_network_len+0x6f/0x110 ? ip_finish_output+0x174/0x280 __tcp_transmit_skb+0xb17/0x12b0 ? __tcp_select_window+0x380/0x380 tcp_write_xmit+0x913/0x1de0 ? __sk_mem_schedule+0x50/0x80 tcp_sendmsg_locked+0x49d/0x15a0 ? tcp_rcv_established+0x8da/0xa30 ? tcp_set_state+0x220/0x220 ? clear_user+0x1f/0x50 ? iov_iter_zero+0x1ae/0x590 ? __fget_light+0xa0/0xe0 tcp_sendmsg+0x22/0x40 __sys_sendto+0x1b0/0x250 ? __ia32_sys_getpeername+0x40/0x40 ? _copy_to_user+0x58/0x70 ? poll_select_copy_remaining+0x176/0x200 ? __pollwait+0x1c0/0x1c0 ? ktime_get_ts64+0x11f/0x140 ? kern_select+0x108/0x150 ? core_sys_select+0x360/0x360 ? vfs_read+0x127/0x150 ? kernel_write+0x90/0x90 __x64_sys_sendto+0x6f/0x80 do_syscall_64+0x5d/0x150 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fefef2b129d Code: ff ff ff ff eb b6 0f 1f 80 00 00 00 00 48 8d 05 51 37 0c 00 41 89 ca 8b 00 85 c0 75 20 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 6b f3 c3 66 0f 1f 84 00 00 00 00 00 41 56 41 RSP: 002b:00007fff2f5350c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 000056118d60c120 RCX: 00007fefef2b129d RDX: 0000000000002000 RSI: 000056118d629320 RDI: 0000000000000003 RBP: 000056118d530370 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000 R13: 000056118d5c2a10 R14: 000056118d5c2a10 R15: 000056118d5303b8 tcf_sample_act() tried to update its per-cpu stats, but tcf_sample_init() forgot to allocate them, because tcf_idr_create() was called with a wrong value of 'cpustats'. Setting it to true proved to fix the reported crash. Reported-by: Matteo Croce <mcroce@redhat.com> Fixes: 65a206c ("net/sched: Change act_api and act_xxx modules to use IDR") Fixes: 5c5670f ("net/sched: Introduce sample tc action") Tested-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit dd7ef7b ] In a low-memory situation, cc->fast_search_fail can keep increasing as it is unable to find an available page to isolate in fast_isolate_freepages(). As the result, it could trigger an error below, so just compare with the maximum bits can be shifted first. UBSAN: Undefined behaviour in mm/compaction.c:1160:30 shift exponent 64 is too large for 64-bit type 'unsigned long' CPU: 131 PID: 1308 Comm: kcompactd1 Kdump: loaded Tainted: G W L 5.0.0+ #17 Call trace: dump_backtrace+0x0/0x450 show_stack+0x20/0x2c dump_stack+0xc8/0x14c __ubsan_handle_shift_out_of_bounds+0x7e8/0x8c4 compaction_alloc+0x2344/0x2484 unmap_and_move+0xdc/0x1dbc migrate_pages+0x274/0x1310 compact_zone+0x26ec/0x43bc kcompactd+0x15b8/0x1a24 kthread+0x374/0x390 ret_from_fork+0x10/0x18 [akpm@linux-foundation.org: code cleanup] Link: http://lkml.kernel.org/r/20190320203338.53367-1-cai@lca.pw Fixes: 70b4459 ("mm, compaction: use free lists to quickly locate a migration source") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4d8e3e9 ] During early system resume on Exynos5422 with performance counters enabled the following kernel oops happens: Internal error: Oops - undefined instruction: 0 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 1433 Comm: bash Tainted: G W 5.0.0-rc5-next-20190208-00023-gd5fb5a8a13e6-dirty #5480 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) ... Flags: nZCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 4451006a DAC: 00000051 Process bash (pid: 1433, stack limit = 0xb7e0e22f) ... (reset_ctrl_regs) from [<c0112ad0>] (dbg_cpu_pm_notify+0x1c/0x24) (dbg_cpu_pm_notify) from [<c014c840>] (notifier_call_chain+0x44/0x84) (notifier_call_chain) from [<c014cbc0>] (__atomic_notifier_call_chain+0x7c/0x128) (__atomic_notifier_call_chain) from [<c01ffaac>] (cpu_pm_notify+0x30/0x54) (cpu_pm_notify) from [<c055116c>] (syscore_resume+0x98/0x3f4) (syscore_resume) from [<c0189350>] (suspend_devices_and_enter+0x97c/0xe74) (suspend_devices_and_enter) from [<c0189fb8>] (pm_suspend+0x770/0xc04) (pm_suspend) from [<c0187740>] (state_store+0x6c/0xcc) (state_store) from [<c09fa698>] (kobj_attr_store+0x14/0x20) (kobj_attr_store) from [<c030159c>] (sysfs_kf_write+0x4c/0x50) (sysfs_kf_write) from [<c0300620>] (kernfs_fop_write+0xfc/0x1e0) (kernfs_fop_write) from [<c0282be8>] (__vfs_write+0x2c/0x160) (__vfs_write) from [<c0282ea4>] (vfs_write+0xa4/0x16c) (vfs_write) from [<c0283080>] (ksys_write+0x40/0x8c) (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28) Undefined instruction is triggered during CP14 reset, because bits: #16 (Secure privileged invasive debug disabled) and #17 (Secure privileged noninvasive debug disable) are set in DSCR. Those bits depend on SPNIDEN and SPIDEN lines, which are provided by Secure JTAG hardware block. That block in turn is powered from cluster 0 (big/Eagle), but the Exynos5422 boots on cluster 1 (LITTLE/KFC). To fix this issue it is enough to turn on the power on the cluster 0 for a while. This lets the Secure JTAG block to propagate the needed signals to LITTLE/KFC cores and change their DSCR. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The mapper may be NULL when called from register_ftrace_function_probe() with probe->data == NULL. This issue can be reproduced as follow (it may be covered by compiler optimization sometime): / # cat /sys/kernel/debug/tracing/set_ftrace_filter #### all functions enabled #### / # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter [ 206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 206.952402] Mem abort info: [ 206.952819] ESR = 0x96000006 [ 206.955326] Exception class = DABT (current EL), IL = 32 bits [ 206.955844] SET = 0, FnV = 0 [ 206.956272] EA = 0, S1PTW = 0 [ 206.956652] Data abort info: [ 206.957320] ISV = 0, ISS = 0x00000006 [ 206.959271] CM = 0, WnR = 0 [ 206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000 [ 206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000 [ 206.964953] Internal error: Oops: 96000006 [openbmc#1] SMP [ 206.971122] Dumping ftrace buffer: [ 206.973677] (ftrace buffer empty) [ 206.975258] Modules linked in: [ 206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____)) [ 206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ openbmc#17 [ 206.978955] Hardware name: linux,dummy-virt (DT) [ 206.979883] pstate: 60000005 (nZCv daif -PAN -UAO) [ 206.980499] pc : free_ftrace_func_mapper+0x2c/0x118 [ 206.980874] lr : ftrace_count_free+0x68/0x80 [ 206.982539] sp : ffff0000182f3ab0 [ 206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700 [ 206.983632] x27: ffff000013054b40 x26: 0000000000000001 [ 206.984000] x25: ffff00001385f000 x24: 0000000000000000 [ 206.984394] x23: ffff000013453000 x22: ffff000013054000 [ 206.984775] x21: 0000000000000000 x20: ffff00001385fe28 [ 206.986575] x19: ffff000013872c30 x18: 0000000000000000 [ 206.987111] x17: 0000000000000000 x16: 0000000000000000 [ 206.987491] x15: ffffffffffffffb0 x14: 0000000000000000 [ 206.987850] x13: 000000000017430e x12: 0000000000000580 [ 206.988251] x11: 0000000000000000 x10: cccccccccccccccc [ 206.988740] x9 : 0000000000000000 x8 : ffff000013917550 [ 206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000 [ 206.991008] x5 : ffff0000103da588 x4 : 0000000000000001 [ 206.991395] x3 : 0000000000000001 x2 : ffff000013872a28 [ 206.991771] x1 : 0000000000000000 x0 : 0000000000000000 [ 206.992557] Call trace: [ 206.993101] free_ftrace_func_mapper+0x2c/0x118 [ 206.994827] ftrace_count_free+0x68/0x80 [ 206.995238] release_probe+0xfc/0x1d0 [ 206.995555] register_ftrace_function_probe+0x4a8/0x868 [ 206.995923] ftrace_trace_probe_callback.isra.4+0xb8/0x180 [ 206.996330] ftrace_dump_callback+0x50/0x70 [ 206.996663] ftrace_regex_write.isra.29+0x290/0x3a8 [ 206.997157] ftrace_filter_write+0x44/0x60 [ 206.998971] __vfs_write+0x64/0xf0 [ 206.999285] vfs_write+0x14c/0x2f0 [ 206.999591] ksys_write+0xbc/0x1b0 [ 206.999888] __arm64_sys_write+0x3c/0x58 [ 207.000246] el0_svc_common.constprop.0+0x408/0x5f0 [ 207.000607] el0_svc_handler+0x144/0x1c8 [ 207.000916] el0_svc+0x8/0xc [ 207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303) [ 207.008388] ---[ end trace 7b6d11b5f542bdf1 ]--- [ 207.010126] Kernel panic - not syncing: Fatal exception [ 207.011322] SMP: stopping secondary CPUs [ 207.013956] Dumping ftrace buffer: [ 207.014595] (ftrace buffer empty) [ 207.015632] Kernel Offset: disabled [ 207.017187] CPU features: 0x002,20006008 [ 207.017985] Memory Limit: none [ 207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]--- Link: http://lkml.kernel.org/r/20190606031754.10798-1-liwei391@huawei.com Signed-off-by: Wei Li <liwei391@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
commit d0a255e upstream. A deadlock with this stacktrace was observed. The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio shrinker and the shrinker depends on I/O completion in the dm-bufio subsystem. In order to fix the deadlock (and other similar ones), we set the flag PF_MEMALLOC_NOIO at loop thread entry. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f #13 [ffff8813dedfbec0] kthread at ffffffff810a8428 #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 PID: 14127 TASK: ffff881455749c00 CPU: 11 COMMAND: "loop1" #0 [ffff88272f5af228] __schedule at ffffffff8173f405 #1 [ffff88272f5af280] schedule at ffffffff8173fa27 #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5 #4 [ffff88272f5af330] mutex_lock at ffffffff81742133 #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio] #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778 #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34 #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8 #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3 #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71 #12 [ffff88272f5af760] new_slab at ffffffff811f4523 #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5 #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3 #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3 #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs] #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994 #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs] #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop] #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop] #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c #23 [ffff88272f5afec0] kthread at ffffffff810a8428 #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d00ee64 ] Eric reported a syzbot warning: BUG: KMSAN: uninit-value in nh_valid_get_del_req+0x6f1/0x8c0 net/ipv4/nexthop.c:1510 CPU: 0 PID: 11812 Comm: syz-executor444 Not tainted 5.3.0-rc3+ #17 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x191/0x1f0 lib/dump_stack.c:113 kmsan_report+0x162/0x2d0 mm/kmsan/kmsan_report.c:109 __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:294 nh_valid_get_del_req+0x6f1/0x8c0 net/ipv4/nexthop.c:1510 rtm_del_nexthop+0x1b1/0x610 net/ipv4/nexthop.c:1543 rtnetlink_rcv_msg+0x115a/0x1580 net/core/rtnetlink.c:5223 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5241 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf6c/0x1050 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x110f/0x1330 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg net/socket.c:657 [inline] ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311 __sys_sendmmsg+0x53a/0xae0 net/socket.c:2413 __do_sys_sendmmsg net/socket.c:2442 [inline] __se_sys_sendmmsg+0xbd/0xe0 net/socket.c:2439 __x64_sys_sendmmsg+0x56/0x70 net/socket.c:2439 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:297 entry_SYSCALL_64_after_hwframe+0x63/0xe7 The root cause is nlmsg_parse calling __nla_parse which means the header struct size is not checked. nlmsg_parse should be a wrapper around __nlmsg_parse with NL_VALIDATE_STRICT for the validate argument very much like nlmsg_parse_deprecated is for NL_VALIDATE_LIBERAL. Fixes: 3de6440 ("netlink: re-add parse/validate functions in strict mode") Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 02d715b upstream. dmar_drhd_units is traversed using list_for_each_entry_rcu() outside of an RCU read side critical section but under the protection of dmar_global_lock. Hence add corresponding lockdep expression to silence the following false-positive warnings: [ 1.603975] ============================= [ 1.603976] WARNING: suspicious RCU usage [ 1.603977] 5.5.4-stable #17 Not tainted [ 1.603978] ----------------------------- [ 1.603980] drivers/iommu/intel-iommu.c:4769 RCU-list traversed in non-reader section!! [ 1.603869] ============================= [ 1.603870] WARNING: suspicious RCU usage [ 1.603872] 5.5.4-stable #17 Not tainted [ 1.603874] ----------------------------- [ 1.603875] drivers/iommu/dmar.c:293 RCU-list traversed in non-reader section!! Tested-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: Amol Grover <frextrite@gmail.com> Cc: stable@vger.kernel.org Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1bc7896 ] When experimenting with bpf_send_signal() helper in our production environment (5.2 based), we experienced a deadlock in NMI mode: #5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24 #6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012 #7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd #8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55 #9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602 #10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a #11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227 #12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140 #13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf #14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09 #15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47 #16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d #17 [ffffc9002219fc90] schedule at ffffffff81a3e219 #18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9 #19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529 #20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc #21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c #22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602 #23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068 The above call stack is actually very similar to an issue reported by Commit eac9153 ("bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()") by Song Liu. The only difference is bpf_send_signal() helper instead of bpf_get_stack() helper. The above deadlock is triggered with a perf_sw_event. Similar to Commit eac9153, the below almost identical reproducer used tracepoint point sched/sched_switch so the issue can be easily caught. /* stress_test.c */ #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <pthread.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #define THREAD_COUNT 1000 char *filename; void *worker(void *p) { void *ptr; int fd; char *pptr; fd = open(filename, O_RDONLY); if (fd < 0) return NULL; while (1) { struct timespec ts = {0, 1000 + rand() % 2000}; ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0); usleep(1); if (ptr == MAP_FAILED) { printf("failed to mmap\n"); break; } munmap(ptr, 4096 * 64); usleep(1); pptr = malloc(1); usleep(1); pptr[0] = 1; usleep(1); free(pptr); usleep(1); nanosleep(&ts, NULL); } close(fd); return NULL; } int main(int argc, char *argv[]) { void *ptr; int i; pthread_t threads[THREAD_COUNT]; if (argc < 2) return 0; filename = argv[1]; for (i = 0; i < THREAD_COUNT; i++) { if (pthread_create(threads + i, NULL, worker, NULL)) { fprintf(stderr, "Error creating thread\n"); return 0; } } for (i = 0; i < THREAD_COUNT; i++) pthread_join(threads[i], NULL); return 0; } and the following command: 1. run `stress_test /bin/ls` in one windown 2. hack bcc trace.py with the following change: # --- a/tools/trace.py # +++ b/tools/trace.py @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s); __data.tgid = __tgid; __data.pid = __pid; bpf_get_current_comm(&__data.comm, sizeof(__data.comm)); + bpf_send_signal(10); %s %s %s.perf_submit(%s, &__data, sizeof(__data)); 3. in a different window run ./trace.py -p $(pidof stress_test) t:sched:sched_switch The deadlock can be reproduced in our production system. Similar to Song's fix, the fix is to delay sending signal if irqs is disabled to avoid deadlocks involving with rq_lock. With this change, my above stress-test in our production system won't cause deadlock any more. I also implemented a scale-down version of reproducer in the selftest (a subsequent commit). With latest bpf-next, it complains for the following potential deadlock. [ 32.832450] -> #1 (&p->pi_lock){-.-.}: [ 32.833100] _raw_spin_lock_irqsave+0x44/0x80 [ 32.833696] task_rq_lock+0x2c/0xa0 [ 32.834182] task_sched_runtime+0x59/0xd0 [ 32.834721] thread_group_cputime+0x250/0x270 [ 32.835304] thread_group_cputime_adjusted+0x2e/0x70 [ 32.835959] do_task_stat+0x8a7/0xb80 [ 32.836461] proc_single_show+0x51/0xb0 ... [ 32.839512] -> #0 (&(&sighand->siglock)->rlock){....}: [ 32.840275] __lock_acquire+0x1358/0x1a20 [ 32.840826] lock_acquire+0xc7/0x1d0 [ 32.841309] _raw_spin_lock_irqsave+0x44/0x80 [ 32.841916] __lock_task_sighand+0x79/0x160 [ 32.842465] do_send_sig_info+0x35/0x90 [ 32.842977] bpf_send_signal+0xa/0x10 [ 32.843464] bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000 [ 32.844301] trace_call_bpf+0x115/0x270 [ 32.844809] perf_trace_run_bpf_submit+0x4a/0xc0 [ 32.845411] perf_trace_sched_switch+0x10f/0x180 [ 32.846014] __schedule+0x45d/0x880 [ 32.846483] schedule+0x5f/0xd0 ... [ 32.853148] Chain exists of: [ 32.853148] &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock [ 32.853148] [ 32.854451] Possible unsafe locking scenario: [ 32.854451] [ 32.855173] CPU0 CPU1 [ 32.855745] ---- ---- [ 32.856278] lock(&rq->lock); [ 32.856671] lock(&p->pi_lock); [ 32.857332] lock(&rq->lock); [ 32.857999] lock(&(&sighand->siglock)->rlock); Deadlock happens on CPU0 when it tries to acquire &sighand->siglock but it has been held by CPU1 and CPU1 tries to grab &rq->lock and cannot get it. This is not exactly the callstack in our production environment, but sympotom is similar and both locks are using spin_lock_irqsave() to acquire the lock, and both involves rq_lock. The fix to delay sending signal when irq is disabled also fixed this issue. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 #6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 #7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 #8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 #9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 #10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 #11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 #12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 #13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 #14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 #15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 #16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 #17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 #18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 #19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 #20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 #21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 #22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 #23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 #24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 #25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 #26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b12eea5 ] The evsel->unit borrows a pointer of pmu event or alias instead of owns a string. But tool event (duration_time) passes a result of strdup() caused a leak. It was found by ASAN during metric test: Direct leak of 210 byte(s) in 70 object(s) allocated from: #0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5) #1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414 #2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414 #3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439 #4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096 #5 0x559fbbcc95da in __parse_events util/parse-events.c:2141 #6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406 #7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393 #8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415 #9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498 #10 0x559fbbc0109b in run_test tests/builtin-test.c:410 #11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440 #12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695 #13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807 #14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: f0fbb11 ("perf stat: Implement duration_time as a proper event") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bc62d68 ] exceptions may be traversed using list_for_each_entry_rcu() outside of an RCU read side critical section BUT under the protection of decgroup_mutex. Hence add the corresponding lockdep expression to fix the following false-positive warning: [ 2.304417] ============================= [ 2.304418] WARNING: suspicious RCU usage [ 2.304420] 5.5.4-stable #17 Tainted: G E [ 2.304422] ----------------------------- [ 2.304424] security/device_cgroup.c:355 RCU-list traversed in non-reader section!! Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8ffd778 ] Mimic the pre-existing ACPI and Device Tree event log behavior by not creating the binary_bios_measurements file when the EFI TPM event log is empty. This fixes the following NULL pointer dereference that can occur when reading /sys/kernel/security/tpm0/binary_bios_measurements after the kernel received an empty event log from the firmware: BUG: kernel NULL pointer dereference, address: 000000000000002c #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [openbmc#1] SMP PTI CPU: 2 PID: 3932 Comm: fwupdtpmevlog Not tainted 5.9.0-00003-g629990edad62 openbmc#17 Hardware name: LENOVO 20LCS03L00/20LCS03L00, BIOS N27ET38W (1.24 ) 11/28/2019 RIP: 0010:tpm2_bios_measurements_start+0x3a/0x550 Code: 54 53 48 83 ec 68 48 8b 57 70 48 8b 1e 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 8b 82 c0 06 00 00 48 8b 8a c8 06 00 00 <44> 8b 60 1c 48 89 4d a0 4c 89 e2 49 83 c4 20 48 83 fb 00 75 2a 49 RSP: 0018:ffffa9c901203db0 EFLAGS: 00010246 RAX: 0000000000000010 RBX: 0000000000000000 RCX: 0000000000000010 RDX: ffff8ba1eb99c000 RSI: ffff8ba1e4ce8280 RDI: ffff8ba1e4ce8258 RBP: ffffa9c901203e40 R08: ffffa9c901203dd8 R09: ffff8ba1ec443300 R10: ffffa9c901203e50 R11: 0000000000000000 R12: ffff8ba1e4ce8280 R13: ffffa9c901203ef0 R14: ffffa9c901203ef0 R15: ffff8ba1e4ce8258 FS: 00007f6595460880(0000) GS:ffff8ba1ef880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000002c CR3: 00000007d8d18003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? __kmalloc_node+0x113/0x320 ? kvmalloc_node+0x31/0x80 seq_read+0x94/0x420 vfs_read+0xa7/0x190 ksys_read+0xa7/0xe0 __x64_sys_read+0x1a/0x20 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 In this situation, the bios_event_log pointer in the tpm_bios_log struct was not NULL but was equal to the ZERO_SIZE_PTR (0x10) value. This was due to the following kmemdup() in tpm_read_log_efi(): int tpm_read_log_efi(struct tpm_chip *chip) { ... /* malloc EventLog space */ log->bios_event_log = kmemdup(log_tbl->log, log_size, GFP_KERNEL); if (!log->bios_event_log) { ret = -ENOMEM; goto out; } ... } When log_size is zero, due to an empty event log from firmware, ZERO_SIZE_PTR is returned from kmemdup(). Upon a read of the binary_bios_measurements file, the tpm2_bios_measurements_start() function does not perform a ZERO_OR_NULL_PTR() check on the bios_event_log pointer before dereferencing it. Rather than add a ZERO_OR_NULL_PTR() check in functions that make use of the bios_event_log pointer, simply avoid creating the binary_bios_measurements_file as is done in other event log retrieval backends. Explicitly ignore all of the events in the final event log when the main event log is empty. The list of events in the final event log cannot be accurately parsed without referring to the first event in the main event log (the event log header) so the final event log is useless in such a situation. Fixes: 58cc1e4 ("tpm: parse TPM event logs based on EFI table") Link: https://lore.kernel.org/linux-integrity/E1FDCCCB-CA51-4AEE-AC83-9CDE995EAE52@canonical.com/ Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reported-by: Kenneth R. Crudup <kenny@panix.com> Reported-by: Mimi Zohar <zohar@linux.ibm.com> Cc: Thiébaud Weksteen <tweek@google.com> Cc: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 4d14c5c upstream Calling btrfs_qgroup_reserve_meta_prealloc from btrfs_delayed_inode_reserve_metadata can result in flushing delalloc while holding a transaction and delayed node locks. This is deadlock prone. In the past multiple commits: * ae5e070 ("btrfs: qgroup: don't try to wait flushing if we're already holding a transaction") * 6f23277 ("btrfs: qgroup: don't commit transaction when we already hold the handle") Tried to solve various aspects of this but this was always a whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock scenario involving btrfs_delayed_node::mutex. Namely, one thread can call btrfs_dirty_inode as a result of reading a file and modifying its atime: PID: 6963 TASK: ffff8c7f3f94c000 CPU: 2 COMMAND: "test" #0 __schedule at ffffffffa529e07d #1 schedule at ffffffffa529e4ff #2 schedule_timeout at ffffffffa52a1bdd #3 wait_for_completion at ffffffffa529eeea <-- sleeps with delayed node mutex held #4 start_delalloc_inodes at ffffffffc0380db5 #5 btrfs_start_delalloc_snapshot at ffffffffc0393836 #6 try_flush_qgroup at ffffffffc03f04b2 #7 __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6 <-- tries to reserve space and starts delalloc inodes. #8 btrfs_delayed_update_inode at ffffffffc03e31aa <-- acquires delayed node mutex #9 btrfs_update_inode at ffffffffc0385ba8 #10 btrfs_dirty_inode at ffffffffc038627b <-- TRANSACTIION OPENED #11 touch_atime at ffffffffa4cf0000 #12 generic_file_read_iter at ffffffffa4c1f123 #13 new_sync_read at ffffffffa4ccdc8a #14 vfs_read at ffffffffa4cd0849 #15 ksys_read at ffffffffa4cd0bd1 #16 do_syscall_64 at ffffffffa4a052eb #17 entry_SYSCALL_64_after_hwframe at ffffffffa540008c This will cause an asynchronous work to flush the delalloc inodes to happen which can try to acquire the same delayed_node mutex: PID: 455 TASK: ffff8c8085fa4000 CPU: 5 COMMAND: "kworker/u16:30" #0 __schedule at ffffffffa529e07d #1 schedule at ffffffffa529e4ff #2 schedule_preempt_disabled at ffffffffa529e80a #3 __mutex_lock at ffffffffa529fdcb <-- goes to sleep, never wakes up. #4 btrfs_delayed_update_inode at ffffffffc03e3143 <-- tries to acquire the mutex #5 btrfs_update_inode at ffffffffc0385ba8 <-- this is the same inode that pid 6963 is holding #6 cow_file_range_inline.constprop.78 at ffffffffc0386be7 #7 cow_file_range at ffffffffc03879c1 #8 btrfs_run_delalloc_range at ffffffffc038894c #9 writepage_delalloc at ffffffffc03a3c8f #10 __extent_writepage at ffffffffc03a4c01 #11 extent_write_cache_pages at ffffffffc03a500b #12 extent_writepages at ffffffffc03a6de2 #13 do_writepages at ffffffffa4c277eb #14 __filemap_fdatawrite_range at ffffffffa4c1e5bb #15 btrfs_run_delalloc_work at ffffffffc0380987 <-- starts running delayed nodes #16 normal_work_helper at ffffffffc03b706c #17 process_one_work at ffffffffa4aba4e4 #18 worker_thread at ffffffffa4aba6fd #19 kthread at ffffffffa4ac0a3d #20 ret_from_fork at ffffffffa54001ff To fully address those cases the complete fix is to never issue any flushing while holding the transaction or the delayed node lock. This patch achieves it by calling qgroup_reserve_meta directly which will either succeed without flushing or will fail and return -EDQUOT. In the latter case that return value is going to be propagated to btrfs_dirty_inode which will fallback to start a new transaction. That's fine as the majority of time we expect the inode will have BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly copying the in-memory state. Fixes: c53e965 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> [sudip: adjust context] Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 41d5854 upstream. I got several memory leak reports from Asan with a simple command. It was because VDSO is not released due to the refcount. Like in __dsos_addnew_id(), it should put the refcount after adding to the list. $ perf record true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ] ================================================================= ==692599==ERROR: LeakSanitizer: detected memory leaks Direct leak of 439 byte(s) in 1 object(s) allocated from: #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154 #1 0x559bce4aa8ee in dso__new_id util/dso.c:1256 #2 0x559bce59245a in __machine__addnew_vdso util/vdso.c:132 #3 0x559bce59245a in machine__findnew_vdso util/vdso.c:347 #4 0x559bce50826c in map__new util/map.c:175 #5 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787 #6 0x559bce512f6b in machines__deliver_event util/session.c:1481 #7 0x559bce515107 in perf_session__deliver_event util/session.c:1551 #8 0x559bce51d4d2 in do_flush util/ordered-events.c:244 #9 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323 #10 0x559bce519bea in __perf_session__process_events util/session.c:2268 #11 0x559bce519bea in perf_session__process_events util/session.c:2297 #12 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017 #13 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234 #14 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026 #15 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858 #16 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 #17 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 #18 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 #19 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539 #20 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308 Indirect leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154 #1 0x559bce520907 in nsinfo__copy util/namespaces.c:169 #2 0x559bce50821b in map__new util/map.c:168 #3 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787 #4 0x559bce512f6b in machines__deliver_event util/session.c:1481 #5 0x559bce515107 in perf_session__deliver_event util/session.c:1551 #6 0x559bce51d4d2 in do_flush util/ordered-events.c:244 #7 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323 #8 0x559bce519bea in __perf_session__process_events util/session.c:2268 #9 0x559bce519bea in perf_session__process_events util/session.c:2297 #10 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017 #11 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234 #12 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026 #13 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858 #14 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313 #15 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365 #16 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409 #17 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539 #18 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308 SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210315045641.700430-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57f0ff0 upstream. It's later supposed to be either a correct address or NULL. Without the initialization, it may contain an undefined value which results in the following segmentation fault: # perf top --sort comm -g --ignore-callees=do_idle terminates with: #0 0x00007ffff56b7685 in __strlen_avx2 () from /lib64/libc.so.6 #1 0x00007ffff55e3802 in strdup () from /lib64/libc.so.6 #2 0x00005555558cb139 in hist_entry__init (callchain_size=<optimized out>, sample_self=true, template=0x7fffde7fb110, he=0x7fffd801c250) at util/hist.c:489 #3 hist_entry__new (template=template@entry=0x7fffde7fb110, sample_self=sample_self@entry=true) at util/hist.c:564 #4 0x00005555558cb4ba in hists__findnew_entry (hists=hists@entry=0x5555561d9e38, entry=entry@entry=0x7fffde7fb110, al=al@entry=0x7fffde7fb420, sample_self=sample_self@entry=true) at util/hist.c:657 #5 0x00005555558cba1b in __hists__add_entry (hists=hists@entry=0x5555561d9e38, al=0x7fffde7fb420, sym_parent=<optimized out>, bi=bi@entry=0x0, mi=mi@entry=0x0, sample=sample@entry=0x7fffde7fb4b0, sample_self=true, ops=0x0, block_info=0x0) at util/hist.c:288 #6 0x00005555558cbb70 in hists__add_entry (sample_self=true, sample=0x7fffde7fb4b0, mi=0x0, bi=0x0, sym_parent=<optimized out>, al=<optimized out>, hists=0x5555561d9e38) at util/hist.c:1056 #7 iter_add_single_cumulative_entry (iter=0x7fffde7fb460, al=<optimized out>) at util/hist.c:1056 #8 0x00005555558cc8a4 in hist_entry_iter__add (iter=iter@entry=0x7fffde7fb460, al=al@entry=0x7fffde7fb420, max_stack_depth=<optimized out>, arg=arg@entry=0x7fffffff7db0) at util/hist.c:1231 #9 0x00005555557cdc9a in perf_event__process_sample (machine=<optimized out>, sample=0x7fffde7fb4b0, evsel=<optimized out>, event=<optimized out>, tool=0x7fffffff7db0) at builtin-top.c:842 #10 deliver_event (qe=<optimized out>, qevent=<optimized out>) at builtin-top.c:1202 #11 0x00005555558a9318 in do_flush (show_progress=false, oe=0x7fffffff80e0) at util/ordered-events.c:244 #12 __ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP, timestamp=timestamp@entry=0) at util/ordered-events.c:323 #13 0x00005555558a9789 in __ordered_events__flush (timestamp=<optimized out>, how=<optimized out>, oe=<optimized out>) at util/ordered-events.c:339 #14 ordered_events__flush (how=OE_FLUSH__TOP, oe=0x7fffffff80e0) at util/ordered-events.c:341 #15 ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP) at util/ordered-events.c:339 #16 0x00005555557cd631 in process_thread (arg=0x7fffffff7db0) at builtin-top.c:1114 #17 0x00007ffff7bb817a in start_thread () from /lib64/libpthread.so.0 #18 0x00007ffff5656dc3 in clone () from /lib64/libc.so.6 If you look at the frame #2, the code is: 488 if (he->srcline) { 489 he->srcline = strdup(he->srcline); 490 if (he->srcline == NULL) 491 goto err_rawdata; 492 } If he->srcline is not NULL (it is not NULL if it is uninitialized rubbish), it gets strdupped and strdupping a rubbish random string causes the problem. Also, if you look at the commit 1fb7d06, it adds the srcline property into the struct, but not initializing it everywhere needed. Committer notes: Now I see, when using --ignore-callees=do_idle we end up here at line 2189 in add_callchain_ip(): 2181 if (al.sym != NULL) { 2182 if (perf_hpp_list.parent && !*parent && 2183 symbol__match_regex(al.sym, &parent_regex)) 2184 *parent = al.sym; 2185 else if (have_ignore_callees && root_al && 2186 symbol__match_regex(al.sym, &ignore_callees_regex)) { 2187 /* Treat this symbol as the root, 2188 forgetting its callees. */ 2189 *root_al = al; 2190 callchain_cursor_reset(cursor); 2191 } 2192 } And the al that doesn't have the ->srcline field initialized will be copied to the root_al, so then, back to: 1211 int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al, 1212 int max_stack_depth, void *arg) 1213 { 1214 int err, err2; 1215 struct map *alm = NULL; 1216 1217 if (al) 1218 alm = map__get(al->map); 1219 1220 err = sample__resolve_callchain(iter->sample, &callchain_cursor, &iter->parent, 1221 iter->evsel, al, max_stack_depth); 1222 if (err) { 1223 map__put(alm); 1224 return err; 1225 } 1226 1227 err = iter->ops->prepare_entry(iter, al); 1228 if (err) 1229 goto out; 1230 1231 err = iter->ops->add_single_entry(iter, al); 1232 if (err) 1233 goto out; 1234 That al at line 1221 is what hist_entry_iter__add() (called from sample__resolve_callchain()) saw as 'root_al', and then: iter->ops->add_single_entry(iter, al); will go on with al->srcline with a bogus value, I'll add the above sequence to the cset and apply, thanks! Signed-off-by: Michael Petlan <mpetlan@redhat.com> CC: Milian Wolff <milian.wolff@kdab.com> Cc: Jiri Olsa <jolsa@redhat.com> Fixes: 1fb7d06 ("perf report Use srcline from callchain for hist entries") Link: https //lore.kernel.org/r/20210719145332.29747-1-mpetlan@redhat.com Reported-by: Juri Lelli <jlelli@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4224cfd ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f22881d ] In calipso_map_cat_ntoh(), in the for loop, if the return value of netlbl_bitmap_walk() is equal to (net_clen_bits - 1), when netlbl_bitmap_walk() is called next time, out-of-bounds memory accesses of bitmap[byte_offset] occurs. The bug was found during fuzzing. The following is the fuzzing report BUG: KASAN: slab-out-of-bounds in netlbl_bitmap_walk+0x3c/0xd0 Read of size 1 at addr ffffff8107bf6f70 by task err_OH/252 CPU: 7 PID: 252 Comm: err_OH Not tainted 5.17.0-rc7+ #17 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x21c/0x230 show_stack+0x1c/0x60 dump_stack_lvl+0x64/0x7c print_address_description.constprop.0+0x70/0x2d0 __kasan_report+0x158/0x16c kasan_report+0x74/0x120 __asan_load1+0x80/0xa0 netlbl_bitmap_walk+0x3c/0xd0 calipso_opt_getattr+0x1a8/0x230 calipso_sock_getattr+0x218/0x340 calipso_sock_getattr+0x44/0x60 netlbl_sock_getattr+0x44/0x80 selinux_netlbl_socket_setsockopt+0x138/0x170 selinux_socket_setsockopt+0x4c/0x60 security_socket_setsockopt+0x4c/0x90 __sys_setsockopt+0xbc/0x2b0 __arm64_sys_setsockopt+0x6c/0x84 invoke_syscall+0x64/0x190 el0_svc_common.constprop.0+0x88/0x200 do_el0_svc+0x88/0xa0 el0_svc+0x128/0x1b0 el0t_64_sync_handler+0x9c/0x120 el0t_64_sync+0x16c/0x170 Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Yufen <wangyufen@huawei.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 67d7d8a ] Hulk Robot reported a issue: ================================================================== BUG: KASAN: use-after-free in ext4_xattr_set_entry+0x18ab/0x3500 Write of size 4105 at addr ffff8881675ef5f4 by task syz-executor.0/7092 CPU: 1 PID: 7092 Comm: syz-executor.0 Not tainted 4.19.90-dirty #17 Call Trace: [...] memcpy+0x34/0x50 mm/kasan/kasan.c:303 ext4_xattr_set_entry+0x18ab/0x3500 fs/ext4/xattr.c:1747 ext4_xattr_ibody_inline_set+0x86/0x2a0 fs/ext4/xattr.c:2205 ext4_xattr_set_handle+0x940/0x1300 fs/ext4/xattr.c:2386 ext4_xattr_set+0x1da/0x300 fs/ext4/xattr.c:2498 __vfs_setxattr+0x112/0x170 fs/xattr.c:149 __vfs_setxattr_noperm+0x11b/0x2a0 fs/xattr.c:180 __vfs_setxattr_locked+0x17b/0x250 fs/xattr.c:238 vfs_setxattr+0xed/0x270 fs/xattr.c:255 setxattr+0x235/0x330 fs/xattr.c:520 path_setxattr+0x176/0x190 fs/xattr.c:539 __do_sys_lsetxattr fs/xattr.c:561 [inline] __se_sys_lsetxattr fs/xattr.c:557 [inline] __x64_sys_lsetxattr+0xc2/0x160 fs/xattr.c:557 do_syscall_64+0xdf/0x530 arch/x86/entry/common.c:298 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x459fe9 RSP: 002b:00007fa5e54b4c08 EFLAGS: 00000246 ORIG_RAX: 00000000000000bd RAX: ffffffffffffffda RBX: 000000000051bf60 RCX: 0000000000459fe9 RDX: 00000000200003c0 RSI: 0000000020000180 RDI: 0000000020000140 RBP: 000000000051bf60 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000001009 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc73c93fc0 R14: 000000000051bf60 R15: 00007fa5e54b4d80 [...] ================================================================== Above issue may happen as follows: ------------------------------------- ext4_xattr_set ext4_xattr_set_handle ext4_xattr_ibody_find >> s->end < s->base >> no EXT4_STATE_XATTR >> xattr_check_inode is not executed ext4_xattr_ibody_set ext4_xattr_set_entry >> size_t min_offs = s->end - s->base >> UAF in memcpy we can easily reproduce this problem with the following commands: mkfs.ext4 -F /dev/sda mount -o debug_want_extra_isize=128 /dev/sda /mnt touch /mnt/file setfattr -n user.cat -v `seq -s z 4096|tr -d '[:digit:]'` /mnt/file In ext4_xattr_ibody_find, we have the following assignment logic: header = IHDR(inode, raw_inode) = raw_inode + EXT4_GOOD_OLD_INODE_SIZE + i_extra_isize is->s.base = IFIRST(header) = header + sizeof(struct ext4_xattr_ibody_header) is->s.end = raw_inode + s_inode_size In ext4_xattr_set_entry min_offs = s->end - s->base = s_inode_size - EXT4_GOOD_OLD_INODE_SIZE - i_extra_isize - sizeof(struct ext4_xattr_ibody_header) last = s->first free = min_offs - ((void *)last - s->base) - sizeof(__u32) = s_inode_size - EXT4_GOOD_OLD_INODE_SIZE - i_extra_isize - sizeof(struct ext4_xattr_ibody_header) - sizeof(__u32) In the calculation formula, all values except s_inode_size and i_extra_size are fixed values. When i_extra_size is the maximum value s_inode_size - EXT4_GOOD_OLD_INODE_SIZE, min_offs is -4 and free is -8. The value overflows. As a result, the preceding issue is triggered when memcpy is executed. Therefore, when finding xattr or setting xattr, check whether there is space for storing xattr in the inode to resolve this issue. Cc: stable@kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220616021358.2504451-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cb5edce ] Upon physical link change, firmware reports to the kernel about the change along with the details like speed, lmac_type_id, etc. Kernel derives lmac_type based on lmac_type_id received from firmware. In a few scenarios, firmware returns an invalid lmac_type_id, which is resulting in below kernel panic. This patch adds the missing validation of the lmac_type_id field. Internal error: Oops: 96000005 [#1] PREEMPT SMP [ 35.321595] Modules linked in: [ 35.328982] CPU: 0 PID: 31 Comm: kworker/0:1 Not tainted 5.4.210-g2e3169d8e1bc-dirty #17 [ 35.337014] Hardware name: Marvell CN103XX board (DT) [ 35.344297] Workqueue: events work_for_cpu_fn [ 35.352730] pstate: 40400089 (nZcv daIf +PAN -UAO) [ 35.360267] pc : strncpy+0x10/0x30 [ 35.366595] lr : cgx_link_change_handler+0x90/0x180 Fixes: 61071a8 ("octeontx2-af: Forward CGX link notifications to PFs") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 0b0747d upstream. The following processes run into a deadlock. CPU 41 was waiting for CPU 29 to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29 was hung by that spinlock with IRQs disabled. PID: 17360 TASK: ffff95c1090c5c40 CPU: 41 COMMAND: "mrdiagd" !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0 !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0 !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0 # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0 # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0 # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0 # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0 # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0 # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0 # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0 #10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0 #11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0 #12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0 #13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0 #14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0 #15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0 #16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0 #17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0 #18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0 #19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 #20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 #21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 #22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 #23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 #24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 #25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 #26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 #27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 PID: 17355 TASK: ffff95c1090c3d80 CPU: 29 COMMAND: "mrdiagd" !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0 !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0 # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0 # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0 # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0 # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0 # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0 # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0 # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0 # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 #10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 #11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 #12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 #13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 #14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 #15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 #16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 #17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 The lock is used to synchronize different sysfs operations, it doesn't protect any resource that will be touched by an interrupt. Consequently it's not required to disable IRQs. Replace the spinlock with a mutex to fix the deadlock. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230828221018.19471-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a154f5f ] The following call trace shows a deadlock issue due to recursive locking of mutex "device_mutex". First lock acquire is in target_for_each_device() and second in target_free_device(). PID: 148266 TASK: ffff8be21ffb5d00 CPU: 10 COMMAND: "iscsi_ttx" #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224 #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7 #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3 #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod] #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod] #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f #9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583 #10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod] #11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc #12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod] #13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod] #14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod] #15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod] #16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07 #17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod] #18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod] #19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080 #20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364 Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230918225848.66463-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e3e82fc ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 8678b10 upstream. An errant disk backup on my desktop got into debugfs and triggered the following deadlock scenario in the amdgpu debugfs files. The machine also hard-resets immediately after those lines are printed (although I wasn't able to reproduce that part when reading by hand): [ 1318.016074][ T1082] ====================================================== [ 1318.016607][ T1082] WARNING: possible circular locking dependency detected [ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted [ 1318.017598][ T1082] ------------------------------------------------------ [ 1318.018096][ T1082] tar/1082 is trying to acquire lock: [ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80 [ 1318.019084][ T1082] [ 1318.019084][ T1082] but task is already holding lock: [ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.020607][ T1082] [ 1318.020607][ T1082] which lock already depends on the new lock. [ 1318.020607][ T1082] [ 1318.022081][ T1082] [ 1318.022081][ T1082] the existing dependency chain (in reverse order) is: [ 1318.023083][ T1082] [ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}: [ 1318.024114][ T1082] __ww_mutex_lock.constprop.0+0xe0/0x12f0 [ 1318.024639][ T1082] ww_mutex_lock+0x32/0x90 [ 1318.025161][ T1082] dma_resv_lockdep+0x18a/0x330 [ 1318.025683][ T1082] do_one_initcall+0x6a/0x350 [ 1318.026210][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.026728][ T1082] kernel_init+0x15/0x1a0 [ 1318.027242][ T1082] ret_from_fork+0x2c/0x40 [ 1318.027759][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.028281][ T1082] [ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}: [ 1318.029297][ T1082] dma_resv_lockdep+0x16c/0x330 [ 1318.029790][ T1082] do_one_initcall+0x6a/0x350 [ 1318.030263][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.030722][ T1082] kernel_init+0x15/0x1a0 [ 1318.031168][ T1082] ret_from_fork+0x2c/0x40 [ 1318.031598][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.032011][ T1082] [ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}: [ 1318.032778][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.033141][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.033487][ T1082] __might_fault+0x58/0x80 [ 1318.033814][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu] [ 1318.034181][ T1082] full_proxy_read+0x55/0x80 [ 1318.034487][ T1082] vfs_read+0xa7/0x360 [ 1318.034788][ T1082] ksys_read+0x70/0xf0 [ 1318.035085][ T1082] do_syscall_64+0x94/0x180 [ 1318.035375][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.035664][ T1082] [ 1318.035664][ T1082] other info that might help us debug this: [ 1318.035664][ T1082] [ 1318.036487][ T1082] Chain exists of: [ 1318.036487][ T1082] &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex [ 1318.036487][ T1082] [ 1318.037310][ T1082] Possible unsafe locking scenario: [ 1318.037310][ T1082] [ 1318.037838][ T1082] CPU0 CPU1 [ 1318.038101][ T1082] ---- ---- [ 1318.038350][ T1082] lock(reservation_ww_class_mutex); [ 1318.038590][ T1082] lock(reservation_ww_class_acquire); [ 1318.038839][ T1082] lock(reservation_ww_class_mutex); [ 1318.039083][ T1082] rlock(&mm->mmap_lock); [ 1318.039328][ T1082] [ 1318.039328][ T1082] *** DEADLOCK *** [ 1318.039328][ T1082] [ 1318.040029][ T1082] 1 lock held by tar/1082: [ 1318.040259][ T1082] #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.040560][ T1082] [ 1318.040560][ T1082] stack backtrace: [ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2 [ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023 [ 1318.041614][ T1082] Call Trace: [ 1318.041895][ T1082] <TASK> [ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80 [ 1318.042460][ T1082] check_noncircular+0x145/0x160 [ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.043301][ T1082] ? __might_fault+0x40/0x80 [ 1318.043580][ T1082] ? __might_fault+0x40/0x80 [ 1318.043856][ T1082] __might_fault+0x58/0x80 [ 1318.044131][ T1082] ? __might_fault+0x40/0x80 [ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab] [ 1318.044749][ T1082] full_proxy_read+0x55/0x80 [ 1318.045042][ T1082] vfs_read+0xa7/0x360 [ 1318.045333][ T1082] ksys_read+0x70/0xf0 [ 1318.045623][ T1082] do_syscall_64+0x94/0x180 [ 1318.045913][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.046201][ T1082] ? lockdep_hardirqs_on+0x7d/0x100 [ 1318.046487][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.046773][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047057][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047337][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047611][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d [ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83 [ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d [ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008 [ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007 [ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00 [ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800 [ 1318.050638][ T1082] </TASK> amdgpu_debugfs_mqd_read() holds a reservation when it calls put_user(), which may fault and acquire the mmap_sem. This violates the established locking order. Bounce the mqd data through a kernel buffer to get put_user() out of the illegal section. Fixes: 445d85e ("drm/amdgpu: add debugfs interface for reading MQDs") Cc: stable@vger.kernel.org # v6.5+ Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An errant disk backup on my desktop got into debugfs and triggered the following deadlock scenario in the amdgpu debugfs files. The machine also hard-resets immediately after those lines are printed (although I wasn't able to reproduce that part when reading by hand): [ 1318.016074][ T1082] ====================================================== [ 1318.016607][ T1082] WARNING: possible circular locking dependency detected [ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 openbmc#17 Not tainted [ 1318.017598][ T1082] ------------------------------------------------------ [ 1318.018096][ T1082] tar/1082 is trying to acquire lock: [ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80 [ 1318.019084][ T1082] [ 1318.019084][ T1082] but task is already holding lock: [ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.020607][ T1082] [ 1318.020607][ T1082] which lock already depends on the new lock. [ 1318.020607][ T1082] [ 1318.022081][ T1082] [ 1318.022081][ T1082] the existing dependency chain (in reverse order) is: [ 1318.023083][ T1082] [ 1318.023083][ T1082] -> openbmc#2 (reservation_ww_class_mutex){+.+.}-{3:3}: [ 1318.024114][ T1082] __ww_mutex_lock.constprop.0+0xe0/0x12f0 [ 1318.024639][ T1082] ww_mutex_lock+0x32/0x90 [ 1318.025161][ T1082] dma_resv_lockdep+0x18a/0x330 [ 1318.025683][ T1082] do_one_initcall+0x6a/0x350 [ 1318.026210][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.026728][ T1082] kernel_init+0x15/0x1a0 [ 1318.027242][ T1082] ret_from_fork+0x2c/0x40 [ 1318.027759][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.028281][ T1082] [ 1318.028281][ T1082] -> openbmc#1 (reservation_ww_class_acquire){+.+.}-{0:0}: [ 1318.029297][ T1082] dma_resv_lockdep+0x16c/0x330 [ 1318.029790][ T1082] do_one_initcall+0x6a/0x350 [ 1318.030263][ T1082] kernel_init_freeable+0x1a3/0x310 [ 1318.030722][ T1082] kernel_init+0x15/0x1a0 [ 1318.031168][ T1082] ret_from_fork+0x2c/0x40 [ 1318.031598][ T1082] ret_from_fork_asm+0x11/0x20 [ 1318.032011][ T1082] [ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}: [ 1318.032778][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.033141][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.033487][ T1082] __might_fault+0x58/0x80 [ 1318.033814][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu] [ 1318.034181][ T1082] full_proxy_read+0x55/0x80 [ 1318.034487][ T1082] vfs_read+0xa7/0x360 [ 1318.034788][ T1082] ksys_read+0x70/0xf0 [ 1318.035085][ T1082] do_syscall_64+0x94/0x180 [ 1318.035375][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.035664][ T1082] [ 1318.035664][ T1082] other info that might help us debug this: [ 1318.035664][ T1082] [ 1318.036487][ T1082] Chain exists of: [ 1318.036487][ T1082] &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex [ 1318.036487][ T1082] [ 1318.037310][ T1082] Possible unsafe locking scenario: [ 1318.037310][ T1082] [ 1318.037838][ T1082] CPU0 CPU1 [ 1318.038101][ T1082] ---- ---- [ 1318.038350][ T1082] lock(reservation_ww_class_mutex); [ 1318.038590][ T1082] lock(reservation_ww_class_acquire); [ 1318.038839][ T1082] lock(reservation_ww_class_mutex); [ 1318.039083][ T1082] rlock(&mm->mmap_lock); [ 1318.039328][ T1082] [ 1318.039328][ T1082] *** DEADLOCK *** [ 1318.039328][ T1082] [ 1318.040029][ T1082] 1 lock held by tar/1082: [ 1318.040259][ T1082] #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu] [ 1318.040560][ T1082] [ 1318.040560][ T1082] stack backtrace: [ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 openbmc#17 3316c85d50e282c5643b075d1f01a4f6365e39c2 [ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023 [ 1318.041614][ T1082] Call Trace: [ 1318.041895][ T1082] <TASK> [ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80 [ 1318.042460][ T1082] check_noncircular+0x145/0x160 [ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680 [ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0 [ 1318.043301][ T1082] ? __might_fault+0x40/0x80 [ 1318.043580][ T1082] ? __might_fault+0x40/0x80 [ 1318.043856][ T1082] __might_fault+0x58/0x80 [ 1318.044131][ T1082] ? __might_fault+0x40/0x80 [ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab] [ 1318.044749][ T1082] full_proxy_read+0x55/0x80 [ 1318.045042][ T1082] vfs_read+0xa7/0x360 [ 1318.045333][ T1082] ksys_read+0x70/0xf0 [ 1318.045623][ T1082] do_syscall_64+0x94/0x180 [ 1318.045913][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.046201][ T1082] ? lockdep_hardirqs_on+0x7d/0x100 [ 1318.046487][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.046773][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047057][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047337][ T1082] ? do_syscall_64+0xa0/0x180 [ 1318.047611][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e [ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d [ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83 [ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d [ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008 [ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007 [ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00 [ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800 [ 1318.050638][ T1082] </TASK> amdgpu_debugfs_mqd_read() holds a reservation when it calls put_user(), which may fault and acquire the mmap_sem. This violates the established locking order. Bounce the mqd data through a kernel buffer to get put_user() out of the illegal section. Fixes: 445d85e ("drm/amdgpu: add debugfs interface for reading MQDs") Cc: stable@vger.kernel.org # v6.5+ Reviewed-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[ Upstream commit f8bbc07 ] vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e #3 [fffffe00003fced0] do_nmi at ffffffff8922660d #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 #12 [ffffa65531497b68] printk at ffffffff89318306 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] #18 [ffffa65531497f10] kthread at ffffffff892d2e72 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 openbmc#1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 openbmc#2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e openbmc#3 [fffffe00003fced0] do_nmi at ffffffff8922660d openbmc#4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 openbmc#5 [ffffa655314979e8] io_serial_in at ffffffff89792594 openbmc#6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 openbmc#7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 openbmc#8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 openbmc#9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 openbmc#10 [ffffa65531497ac8] console_unlock at ffffffff89316124 openbmc#11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 openbmc#12 [ffffa65531497b68] printk at ffffffff89318306 openbmc#13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 openbmc#14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] openbmc#15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] openbmc#16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] openbmc#17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] openbmc#18 [ffffa65531497f10] kthread at ffffffff892d2e72 openbmc#19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Upstream commit 769e6a1 ] ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 #1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) #2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) #3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 #4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 #5 0x55c94045c776 in ui_browser__show ui/browser.c:288 #6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 #7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 #8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 #9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 #10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 #11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 #12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 #13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 #14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 #15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 #16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 #17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 #18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 #19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 #20 0x55c9400e53ad in main tools/perf/perf.c:561 #21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 #23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: 05e8b08 ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit be346c1 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a699781 ] A sysfs reader can race with a device reset or removal, attempting to read device state when the device is not actually present. eg: [exception RIP: qed_get_current_link+17] #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede] #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb crash> struct net_device.state ffff9a9d21336000 state = 5, state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100). The device is not present, note lack of __LINK_STATE_PRESENT (0b10). This is the same sort of panic as observed in commit 4224cfd ("net-sysfs: add check for netdevice being present to speed_show"). There are many other callers of __ethtool_get_link_ksettings() which don't have a device presence check. Move this check into ethtool to protect all callers. Fixes: d519e17 ("net: export device speed and duplex via sysfs") Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The fields in the hist_entry are filled on-demand which means they only have meaningful values when relevant sort keys are used. So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in the hist entry can be garbage. So it shouldn't access it unconditionally. I got a segfault, when I wanted to see cgroup profiles. $ sudo perf record -a --all-cgroups --synth=cgroup true $ sudo perf report -s cgroup Program received signal SIGSEGV, Segmentation fault. 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 48 return RC_CHK_ACCESS(map)->dso; (gdb) bt #0 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 openbmc#1 0x00005555557aa39b in map__load (map=0x0) at util/map.c:344 openbmc#2 0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385 openbmc#3 0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true) at util/hist.c:644 openbmc#4 0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761 openbmc#5 0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779 openbmc#6 0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015 openbmc#7 0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0) at util/hist.c:1260 openbmc#8 0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at builtin-report.c:334 openbmc#9 0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232 openbmc#10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128, sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271 openbmc#11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354 openbmc#12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132 openbmc#13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245 openbmc#14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324 openbmc#15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342 openbmc#16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60) at util/session.c:780 openbmc#17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688, file_path=0x555556038ff0 "perf.data") at util/session.c:1406 As you can see the entry->ms.map was NULL even if he->ms.map has a value. This is because 'sym' sort key is not given, so it cannot assume whether he->ms.sym and entry->ms.sym is the same. I only checked the 'sym' sort key here as it implies 'dso' behavior (so maps are the same). Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Matt Fleming <matt@readmodwrite.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
During noirq suspend phase the Raspberry Pi power driver suffer of firmware property timeouts. The reason is that the IRQ of the underlying BCM2835 mailbox is disabled and rpi_firmware_property_list() will always run into a timeout [1]. Since the VideoCore side isn't consider as a wakeup source, set the IRQF_NO_SUSPEND flag for the mailbox IRQ in order to keep it enabled during suspend-resume cycle. [1] PM: late suspend of devices complete after 1.754 msecs WARNING: CPU: 0 PID: 438 at drivers/firmware/raspberrypi.c:128 rpi_firmware_property_list+0x204/0x22c Firmware transaction 0x00028001 timeout Modules linked in: CPU: 0 PID: 438 Comm: bash Tainted: G C 6.9.3-dirty openbmc#17 Hardware name: BCM2835 Call trace: unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x34/0x44 dump_stack_lvl from __warn+0x88/0xec __warn from warn_slowpath_fmt+0x7c/0xb0 warn_slowpath_fmt from rpi_firmware_property_list+0x204/0x22c rpi_firmware_property_list from rpi_firmware_property+0x68/0x8c rpi_firmware_property from rpi_firmware_set_power+0x54/0xc0 rpi_firmware_set_power from _genpd_power_off+0xe4/0x148 _genpd_power_off from genpd_sync_power_off+0x7c/0x11c genpd_sync_power_off from genpd_finish_suspend+0xcc/0xe0 genpd_finish_suspend from dpm_run_callback+0x78/0xd0 dpm_run_callback from device_suspend_noirq+0xc0/0x238 device_suspend_noirq from dpm_suspend_noirq+0xb0/0x168 dpm_suspend_noirq from suspend_devices_and_enter+0x1b8/0x5ac suspend_devices_and_enter from pm_suspend+0x254/0x2e4 pm_suspend from state_store+0xa8/0xd4 state_store from kernfs_fop_write_iter+0x154/0x1a0 kernfs_fop_write_iter from vfs_write+0x12c/0x184 vfs_write from ksys_write+0x78/0xc0 ksys_write from ret_fast_syscall+0x0/0x54 Exception stack(0xcc93dfa8 to 0xcc93dff0) [...] PM: noirq suspend of devices complete after 3095.584 msecs Link: raspberrypi/firmware#1894 Fixes: 0bae6af ("mailbox: Enable BCM2835 mailbox support") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
[ Upstream commit dc09f00 ] During noirq suspend phase the Raspberry Pi power driver suffer of firmware property timeouts. The reason is that the IRQ of the underlying BCM2835 mailbox is disabled and rpi_firmware_property_list() will always run into a timeout [1]. Since the VideoCore side isn't consider as a wakeup source, set the IRQF_NO_SUSPEND flag for the mailbox IRQ in order to keep it enabled during suspend-resume cycle. [1] PM: late suspend of devices complete after 1.754 msecs WARNING: CPU: 0 PID: 438 at drivers/firmware/raspberrypi.c:128 rpi_firmware_property_list+0x204/0x22c Firmware transaction 0x00028001 timeout Modules linked in: CPU: 0 PID: 438 Comm: bash Tainted: G C 6.9.3-dirty #17 Hardware name: BCM2835 Call trace: unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x34/0x44 dump_stack_lvl from __warn+0x88/0xec __warn from warn_slowpath_fmt+0x7c/0xb0 warn_slowpath_fmt from rpi_firmware_property_list+0x204/0x22c rpi_firmware_property_list from rpi_firmware_property+0x68/0x8c rpi_firmware_property from rpi_firmware_set_power+0x54/0xc0 rpi_firmware_set_power from _genpd_power_off+0xe4/0x148 _genpd_power_off from genpd_sync_power_off+0x7c/0x11c genpd_sync_power_off from genpd_finish_suspend+0xcc/0xe0 genpd_finish_suspend from dpm_run_callback+0x78/0xd0 dpm_run_callback from device_suspend_noirq+0xc0/0x238 device_suspend_noirq from dpm_suspend_noirq+0xb0/0x168 dpm_suspend_noirq from suspend_devices_and_enter+0x1b8/0x5ac suspend_devices_and_enter from pm_suspend+0x254/0x2e4 pm_suspend from state_store+0xa8/0xd4 state_store from kernfs_fop_write_iter+0x154/0x1a0 kernfs_fop_write_iter from vfs_write+0x12c/0x184 vfs_write from ksys_write+0x78/0xc0 ksys_write from ret_fast_syscall+0x0/0x54 Exception stack(0xcc93dfa8 to 0xcc93dff0) [...] PM: noirq suspend of devices complete after 3095.584 msecs Link: raspberrypi/firmware#1894 Fixes: 0bae6af ("mailbox: Enable BCM2835 mailbox support") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 9af2efe upstream. The fields in the hist_entry are filled on-demand which means they only have meaningful values when relevant sort keys are used. So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in the hist entry can be garbage. So it shouldn't access it unconditionally. I got a segfault, when I wanted to see cgroup profiles. $ sudo perf record -a --all-cgroups --synth=cgroup true $ sudo perf report -s cgroup Program received signal SIGSEGV, Segmentation fault. 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 48 return RC_CHK_ACCESS(map)->dso; (gdb) bt #0 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 #1 0x00005555557aa39b in map__load (map=0x0) at util/map.c:344 #2 0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385 #3 0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true) at util/hist.c:644 #4 0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761 #5 0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779 #6 0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015 #7 0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0) at util/hist.c:1260 #8 0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at builtin-report.c:334 #9 0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232 #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128, sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271 #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354 #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132 #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245 #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324 #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342 #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60) at util/session.c:780 #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688, file_path=0x555556038ff0 "perf.data") at util/session.c:1406 As you can see the entry->ms.map was NULL even if he->ms.map has a value. This is because 'sym' sort key is not given, so it cannot assume whether he->ms.sym and entry->ms.sym is the same. I only checked the 'sym' sort key here as it implies 'dso' behavior (so maps are the same). Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Matt Fleming <matt@readmodwrite.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 826cc42 ] My colleague Wupeng found the following problems during fault injection: BUG: unable to handle page fault for address: fffffbfff809d073 PGD 6e648067 P4D 123ec8067 PUD 123ec4067 PMD 100e38067 PTE 0 Oops: Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 5 UID: 0 PID: 755 Comm: modprobe Not tainted 6.12.0-rc3+ #17 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:__asan_load8+0x4c/0xa0 ... Call Trace: <TASK> blkdev_put_whole+0x41/0x70 bdev_release+0x1a3/0x250 blkdev_release+0x11/0x20 __fput+0x1d7/0x4a0 task_work_run+0xfc/0x180 syscall_exit_to_user_mode+0x1de/0x1f0 do_syscall_64+0x6b/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e loop_init() is calling loop_add() after __register_blkdev() succeeds and is ignoring disk_add() failure from loop_add(), for loop_add() failure is not fatal and successfully created disks are already visible to bdev_open(). brd_init() is currently calling brd_alloc() before __register_blkdev() succeeds and is releasing successfully created disks when brd_init() returns an error. This can cause UAF for the latter two case: case 1: T1: modprobe brd brd_init brd_alloc(0) // success add_disk disk_scan_partitions bdev_file_open_by_dev // alloc file fput // won't free until back to userspace brd_alloc(1) // failed since mem alloc error inject // error path for modprobe will release code segment // back to userspace __fput blkdev_release bdev_release blkdev_put_whole bdev->bd_disk->fops->release // fops is freed now, UAF! case 2: T1: T2: modprobe brd brd_init brd_alloc(0) // success open(/dev/ram0) brd_alloc(1) // fail // error path for modprobe close(/dev/ram0) ... /* UAF! */ bdev->bd_disk->fops->release Fix this problem by following what loop_init() does. Besides, reintroduce brd_devices_mutex to help serialize modifications to brd_list. Fixes: 7f9b348 ("brd: convert to blk_alloc_disk/blk_cleanup_disk") Reported-by: Wupeng Ma <mawupeng1@huawei.com> Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241030034914.907829-1-yangerkun@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 17af2b3 upstream. The workqueue should be destroyed in mtk_jpeg_core.c since commit 09aea13 ("media: mtk-jpeg: refactor some variables"), otherwise the below calltrace can be easily triggered. [ 677.862514] Unable to handle kernel paging request at virtual address dfff800000000023 [ 677.863633] KASAN: null-ptr-deref in range [0x0000000000000118-0x000000000000011f] ... [ 677.879654] CPU: 6 PID: 1071 Comm: modprobe Tainted: G O 6.8.12-mtk+gfa1a78e5d24b+ #17 ... [ 677.882838] pc : destroy_workqueue+0x3c/0x770 [ 677.883413] lr : mtk_jpegdec_destroy_workqueue+0x70/0x88 [mtk_jpeg_dec_hw] [ 677.884314] sp : ffff80008ad974f0 [ 677.884744] x29: ffff80008ad974f0 x28: ffff0000d7115580 x27: ffff0000dd691070 [ 677.885669] x26: ffff0000dd691408 x25: ffff8000844af3e0 x24: ffff80008ad97690 [ 677.886592] x23: ffff0000e051d400 x22: ffff0000dd691010 x21: dfff800000000000 [ 677.887515] x20: 0000000000000000 x19: 0000000000000000 x18: ffff800085397ac0 [ 677.888438] x17: 0000000000000000 x16: ffff8000801b87c8 x15: 1ffff000115b2e10 [ 677.889361] x14: 00000000f1f1f1f1 x13: 0000000000000000 x12: ffff7000115b2e4d [ 677.890285] x11: 1ffff000115b2e4c x10: ffff7000115b2e4c x9 : ffff80000aa43e90 [ 677.891208] x8 : 00008fffeea4d1b4 x7 : ffff80008ad97267 x6 : 0000000000000001 [ 677.892131] x5 : ffff80008ad97260 x4 : ffff7000115b2e4d x3 : 0000000000000000 [ 677.893054] x2 : 0000000000000023 x1 : dfff800000000000 x0 : 0000000000000118 [ 677.893977] Call trace: [ 677.894297] destroy_workqueue+0x3c/0x770 [ 677.894826] mtk_jpegdec_destroy_workqueue+0x70/0x88 [mtk_jpeg_dec_hw] [ 677.895677] devm_action_release+0x50/0x90 [ 677.896211] release_nodes+0xe8/0x170 [ 677.896688] devres_release_all+0xf8/0x178 [ 677.897219] device_unbind_cleanup+0x24/0x170 [ 677.897785] device_release_driver_internal+0x35c/0x480 [ 677.898461] device_release_driver+0x20/0x38 ... [ 677.912665] ---[ end trace 0000000000000000 ]--- Fixes: 09aea13 ("media: mtk-jpeg: refactor some variables") Cc: <stable@vger.kernel.org> Signed-off-by: Guoqing Jiang <guoqing.jiang@canonical.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9af2efe upstream. The fields in the hist_entry are filled on-demand which means they only have meaningful values when relevant sort keys are used. So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in the hist entry can be garbage. So it shouldn't access it unconditionally. I got a segfault, when I wanted to see cgroup profiles. $ sudo perf record -a --all-cgroups --synth=cgroup true $ sudo perf report -s cgroup Program received signal SIGSEGV, Segmentation fault. 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 48 return RC_CHK_ACCESS(map)->dso; (gdb) bt #0 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 #1 0x00005555557aa39b in map__load (map=0x0) at util/map.c:344 #2 0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385 #3 0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true) at util/hist.c:644 #4 0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761 openbmc#5 0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779 openbmc#6 0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015 openbmc#7 0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0) at util/hist.c:1260 openbmc#8 0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at builtin-report.c:334 openbmc#9 0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232 openbmc#10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128, sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271 openbmc#11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354 openbmc#12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132 openbmc#13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245 openbmc#14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324 openbmc#15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342 openbmc#16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60) at util/session.c:780 openbmc#17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688, file_path=0x555556038ff0 "perf.data") at util/session.c:1406 As you can see the entry->ms.map was NULL even if he->ms.map has a value. This is because 'sym' sort key is not given, so it cannot assume whether he->ms.sym and entry->ms.sym is the same. I only checked the 'sym' sort key here as it implies 'dso' behavior (so maps are the same). Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Matt Fleming <matt@readmodwrite.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reproduce step:
1. Do not plug in LAN cable
2. AC on
3. Log in BMC
4. plug in LAN cable
5. Set ifconfig IP
6. ifconfig eth0 down
7. ifconfig eth0 up
8. kernel crash
The text was updated successfully, but these errors were encountered: