-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Most recent kernel update breaks intel wifi driver #17
Comments
Had the same issue with my WiFi Network controller: Intel Corporation Wireless 8260 (rev 3a) Rolled back from kernel-uek-5.15.0-200.131.27.el9uek.x86_64 to non UEK kernel kernel-5.14.0-284.30.1.el9_2.x86_64 |
We're looking into this now. |
[ Upstream commit a154f5f ] The following call trace shows a deadlock issue due to recursive locking of mutex "device_mutex". First lock acquire is in target_for_each_device() and second in target_free_device(). PID: 148266 TASK: ffff8be21ffb5d00 CPU: 10 COMMAND: "iscsi_ttx" #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224 #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7 #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3 #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod] #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod] #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f #9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583 #10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod] #11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc #12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod] #13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod] #14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod] #15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod] #16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07 #17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod] #18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod] #19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080 #20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364 Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230918225848.66463-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit eae6aabc48eea9b75cf50d98ddf9a19401d87adf)
i can confirm that |
Please try the latest update, 5.15.0-201.135.6. We believe this fixed the issue (verified on Dell-branded laptops inside the company). |
confirmed ! |
Thanks, we'll close this issue. (And I appreciate the |
[ Upstream commit e3e82fc ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 0511a9c56e5854958021de15967d879ceac5d821) Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
systems, using igb driver, crash while executing poweroff command as per following call stack: crash> bt -a PID: 62583 TASK: ffff97ebbf28dc40 CPU: 0 COMMAND: "poweroff" #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1 #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52 #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45 #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a #4 [ffffa7adcd64fa78] die at ffffffffa6033c32 #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0 #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7 #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320 #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a [exception RIP: free_msi_irqs+408] RIP: ffffffffa645d248 RSP: ffffa7adcd64fc88 RFLAGS: 00010286 RAX: ffff97eb1396fe00 RBX: 0000000000000000 RCX: ffff97eb1396fe00 RDX: ffff97eb1396fe00 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffa7adcd64fcb0 R8: 0000000000000002 R9: 000000000000fbff R10: 0000000000000000 R11: 0000000000000000 R12: ffff98c047af4720 R13: ffff97eb87cd32a0 R14: ffff97eb87cd3000 R15: ffffa7adcd64fd57 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896 #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb] #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb] #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb] #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260 #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725 #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1 #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9 #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1 This happens because igb_shutdown has not yet freed up allocated irqs and free_msi_irqs finds irq_has_action true for involved msi irqs here and this condition triggers BUG_ON. Freeing irqs before proceeding further in igb_clear_interrupt_scheme, fixes this problem. This issue does not happen in v5.17 or later kernel versions because 'commit 9fb9eb4 ("PCI/MSI: Let core code free MSI descriptors")', explicitly frees up MSI based irqs and hence indirectly fixes this issue as well. But this change is dependent on framework for runtime extension of MSI-X irqs and including these changes will break the kABI because it changes some core data structures like pci_device, device and others. So in kernels prior to v5.17 we need to have this change to fix this issue, without breaking the kABI. Orabug: 36547249 Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
systems, using igb driver, crash while executing poweroff command as per following call stack: crash> bt -a PID: 62583 TASK: ffff97ebbf28dc40 CPU: 0 COMMAND: "poweroff" #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1 #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52 #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45 #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a #4 [ffffa7adcd64fa78] die at ffffffffa6033c32 #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0 #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7 #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320 #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a [exception RIP: free_msi_irqs+408] RIP: ffffffffa645d248 RSP: ffffa7adcd64fc88 RFLAGS: 00010286 RAX: ffff97eb1396fe00 RBX: 0000000000000000 RCX: ffff97eb1396fe00 RDX: ffff97eb1396fe00 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffa7adcd64fcb0 R8: 0000000000000002 R9: 000000000000fbff R10: 0000000000000000 R11: 0000000000000000 R12: ffff98c047af4720 R13: ffff97eb87cd32a0 R14: ffff97eb87cd3000 R15: ffffa7adcd64fd57 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896 #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb] #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb] #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb] #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260 #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725 #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1 #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9 #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1 This happens because igb_shutdown has not yet freed up allocated irqs and free_msi_irqs finds irq_has_action true for involved msi irqs here and this condition triggers BUG_ON. Freeing irqs before proceeding further in igb_clear_interrupt_scheme, fixes this problem. This issue does not happen in v5.17 or later kernel versions because 'commit 9fb9eb4 ("PCI/MSI: Let core code free MSI descriptors")', explicitly frees up MSI based irqs and hence indirectly fixes this issue as well. But this change is dependent on framework for runtime extension of MSI-X irqs and including these changes will break the kABI because it changes some core data structures like pci_device, device and others. So in kernels prior to v5.17 we need to have this change to fix this issue, without breaking the kABI. Orabug: 36547250 Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Sherry Yang <sherry.yang@oracle.com>
systems, using igb driver, crash while executing poweroff command as per following call stack: crash> bt -a PID: 62583 TASK: ffff97ebbf28dc40 CPU: 0 COMMAND: "poweroff" #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1 #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52 #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45 #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a #4 [ffffa7adcd64fa78] die at ffffffffa6033c32 #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0 #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7 #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320 #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a [exception RIP: free_msi_irqs+408] RIP: ffffffffa645d248 RSP: ffffa7adcd64fc88 RFLAGS: 00010286 RAX: ffff97eb1396fe00 RBX: 0000000000000000 RCX: ffff97eb1396fe00 RDX: ffff97eb1396fe00 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffa7adcd64fcb0 R8: 0000000000000002 R9: 000000000000fbff R10: 0000000000000000 R11: 0000000000000000 R12: ffff98c047af4720 R13: ffff97eb87cd32a0 R14: ffff97eb87cd3000 R15: ffffa7adcd64fd57 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896 #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb] #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb] #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb] #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260 #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725 #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1 #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9 #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1 This happens because igb_shutdown has not yet freed up allocated irqs and free_msi_irqs finds irq_has_action true for involved msi irqs here and this condition triggers BUG_ON. Freeing irqs before proceeding further in igb_clear_interrupt_scheme, fixes this problem. This issue does not happen in v5.17 or later kernel versions because 'commit 9fb9eb4 ("PCI/MSI: Let core code free MSI descriptors")', explicitly frees up MSI based irqs and hence indirectly fixes this issue as well. But this change is dependent on framework for runtime extension of MSI-X irqs and including these changes will break the kABI because it changes some core data structures like pci_device, device and others. So in kernels prior to v5.17 we need to have this change to fix this issue, without breaking the kABI. Orabug: 36547251 Signed-off-by: Imran Khan <imran.f.khan@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
[ Upstream commit f8bbc07 ] vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e #3 [fffffe00003fced0] do_nmi at ffffffff8922660d #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 #12 [ffffa65531497b68] printk at ffffffff89318306 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] #18 [ffffa65531497f10] kthread at ffffffff892d2e72 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit a50dbeca28acf7051dfa92786b85f704c75db6eb) Signed-off-by: Vijayendra Suman <vijayendra.suman@oracle.com>
[ Upstream commit f8bbc07 ] vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e #3 [fffffe00003fced0] do_nmi at ffffffff8922660d #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 #12 [ffffa65531497b68] printk at ffffffff89318306 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] #18 [ffffa65531497f10] kthread at ffffffff892d2e72 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 4b0dcae5c4797bf31c63011ed62917210d3fdac3) Signed-off-by: Sherry Yang <sherry.yang@oracle.com>
[ Upstream commit f8bbc07 ] vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e #3 [fffffe00003fced0] do_nmi at ffffffff8922660d #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 #12 [ffffa65531497b68] printk at ffffffff89318306 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] #18 [ffffa65531497f10] kthread at ffffffff892d2e72 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 68459b8e3ee554ce71878af9eb69659b9462c588) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> (cherry picked from commit eaa8c23a83b5a719ac9bc795481595bbfc02fc18) Signed-off-by: Yifei Liu <yifei.l.liu@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36879156 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36879157 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> (cherry picked from commit e7fd2c2) Signed-off-by: Sherry Yang <sherry.yang@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36879158 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36879159 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> In UEK4 stats is not a pointer, change the dropped code. Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
commit be346c1 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 320273b5649bbcee87f9e65343077189699d2a7a) Signed-off-by: Vijayendra Suman <vijayendra.suman@oracle.com>
Upon physical link change, firmware reports to the kernel about the change along with the details like speed, lmac_type_id, etc. Kernel derives lmac_type based on lmac_type_id received from firmware. In a few scenarios, firmware returns an invalid lmac_type_id, which is resulting in below kernel panic. This patch adds the missing validation of the lmac_type_id field. Internal error: Oops: 96000005 [#1] PREEMPT SMP [ 35.321595] Modules linked in: [ 35.328982] CPU: 0 PID: 31 Comm: kworker/0:1 Not tainted 5.4.210-g2e3169d8e1bc-dirty #17 [ 35.337014] Hardware name: Marvell CN103XX board (DT) [ 35.344297] Workqueue: events work_for_cpu_fn [ 35.352730] pstate: 40400089 (nZcv daIf +PAN -UAO) [ 35.360267] pc : strncpy+0x10/0x30 [ 35.366595] lr : cgx_link_change_handler+0x90/0x180 Fixes: 61071a8 ("octeontx2-af: Forward CGX link notifications to PFs") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit cb5edce) Orabug: 36725601 Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Brian Maly <brian.maly@oracle.com> Reviewed-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
[ Upstream commit a699781 ] A sysfs reader can race with a device reset or removal, attempting to read device state when the device is not actually present. eg: [exception RIP: qed_get_current_link+17] #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede] #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb crash> struct net_device.state ffff9a9d21336000 state = 5, state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100). The device is not present, note lack of __LINK_STATE_PRESENT (0b10). This is the same sort of panic as observed in commit 4224cfd ("net-sysfs: add check for netdevice being present to speed_show"). There are many other callers of __ethtool_get_link_ksettings() which don't have a device presence check. Move this check into ethtool to protect all callers. Fixes: d519e17 ("net: export device speed and duplex via sysfs") Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit ec7b4f7f644018ac293cb1b02528a40a32917e62) Signed-off-by: Sherry Yang <sherry.yang@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36879156 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com> (cherry picked from commit 0dd4b99) Orabug: 36879126 CVE: CVE-2024-41090 CVE: CVE-2024-41091 Signed-off-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com> Reviewed-by: Vijayendra Suman <vijayendra.suman@oracle.com>
[ Upstream commit dc09f00 ] During noirq suspend phase the Raspberry Pi power driver suffer of firmware property timeouts. The reason is that the IRQ of the underlying BCM2835 mailbox is disabled and rpi_firmware_property_list() will always run into a timeout [1]. Since the VideoCore side isn't consider as a wakeup source, set the IRQF_NO_SUSPEND flag for the mailbox IRQ in order to keep it enabled during suspend-resume cycle. [1] PM: late suspend of devices complete after 1.754 msecs WARNING: CPU: 0 PID: 438 at drivers/firmware/raspberrypi.c:128 rpi_firmware_property_list+0x204/0x22c Firmware transaction 0x00028001 timeout Modules linked in: CPU: 0 PID: 438 Comm: bash Tainted: G C 6.9.3-dirty #17 Hardware name: BCM2835 Call trace: unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x34/0x44 dump_stack_lvl from __warn+0x88/0xec __warn from warn_slowpath_fmt+0x7c/0xb0 warn_slowpath_fmt from rpi_firmware_property_list+0x204/0x22c rpi_firmware_property_list from rpi_firmware_property+0x68/0x8c rpi_firmware_property from rpi_firmware_set_power+0x54/0xc0 rpi_firmware_set_power from _genpd_power_off+0xe4/0x148 _genpd_power_off from genpd_sync_power_off+0x7c/0x11c genpd_sync_power_off from genpd_finish_suspend+0xcc/0xe0 genpd_finish_suspend from dpm_run_callback+0x78/0xd0 dpm_run_callback from device_suspend_noirq+0xc0/0x238 device_suspend_noirq from dpm_suspend_noirq+0xb0/0x168 dpm_suspend_noirq from suspend_devices_and_enter+0x1b8/0x5ac suspend_devices_and_enter from pm_suspend+0x254/0x2e4 pm_suspend from state_store+0xa8/0xd4 state_store from kernfs_fop_write_iter+0x154/0x1a0 kernfs_fop_write_iter from vfs_write+0x12c/0x184 vfs_write from ksys_write+0x78/0xc0 ksys_write from ret_fast_syscall+0x0/0x54 Exception stack(0xcc93dfa8 to 0xcc93dff0) [...] PM: noirq suspend of devices complete after 3095.584 msecs Link: raspberrypi/firmware#1894 Fixes: 0bae6af ("mailbox: Enable BCM2835 mailbox support") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit b0de20de29b13950493a36bd4cf531200eb0e807) Signed-off-by: Sherry Yang <sherry.yang@oracle.com>
…ress ACPICA commit c14708336bd18552b28643575de7b5beb9b864e9 Before this change we see the following UBSAN stack trace in Fuchsia: #0 0x0000220c98288eba in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:331 <platform-bus-x86.so>+0x8f6eba #1.2 0x000023625f46077f in ubsan_get_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:41 <libclang_rt.asan.so>+0x3d77f #1.1 0x000023625f46077f in maybe_print_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:51 <libclang_rt.asan.so>+0x3d77f #1 0x000023625f46077f in ~scoped_report() compiler-rt/lib/ubsan/ubsan_diag.cpp:387 <libclang_rt.asan.so>+0x3d77f #2 0x000023625f461385 in handletype_mismatch_impl() compiler-rt/lib/ubsan/ubsan_handlers.cpp:137 <libclang_rt.asan.so>+0x3e385 #3 0x000023625f460ead in compiler-rt/lib/ubsan/ubsan_handlers.cpp:142 <libclang_rt.asan.so>+0x3dead #4 0x0000220c98288eba in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:331 <platform-bus-x86.so>+0x8f6eba #5 0x0000220c9828ea57 in acpi_rs_convert_aml_to_resource(struct acpi_resource*, union aml_resource*, struct acpi_rsconvert_info*) ../../third_party/acpica/source/components/resources/rsmisc.c:352 <platform-bus-x86.so>+0x8fca57 #6 0x0000220c9828992c in acpi_rs_convert_aml_to_resources(u8*, u32, u32, u8, void**) ../../third_party/acpica/source/components/resources/rslist.c:132 <platform-bus-x86.so>+0x8f792c #7 0x0000220c982d1cfc in acpi_ut_walk_aml_resources(struct acpi_walk_state*, u8*, acpi_size, acpi_walk_aml_callback, void**) ../../third_party/acpica/source/components/utilities/utresrc.c:234 <platform-bus-x86.so>+0x93fcfc #8 0x0000220c98281e46 in acpi_rs_create_resource_list(union acpi_operand_object*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rscreate.c:199 <platform-bus-x86.so>+0x8efe46 #9 0x0000220c98293b51 in acpi_rs_get_method_data(acpi_handle, const char*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rsutils.c:770 <platform-bus-x86.so>+0x901b51 #10 0x0000220c9829438d in acpi_walk_resources(acpi_handle, char*, acpi_walk_resource_callback, void*) ../../third_party/acpica/source/components/resources/rsxface.c:731 <platform-bus-x86.so>+0x90238d #11 0x0000220c97db272b in acpi::acpi_impl::walk_resources(acpi::acpi_impl*, acpi_handle, const char*, acpi::Acpi::resources_callable) ../../src/devices/board/lib/acpi/acpi-impl.cc:41 <platform-bus-x86.so>+0x42072b #12 0x0000220c97dcec59 in acpi::device_builder::gather_resources(acpi::device_builder*, acpi::Acpi*, fidl::any_arena&, acpi::Manager*, acpi::device_builder::gather_resources_callback) ../../src/devices/board/lib/acpi/device-builder.cc:52 <platform-bus-x86.so>+0x43cc59 #13 0x0000220c97f94a3f in acpi::Manager::configure_discovered_devices(acpi::Manager*) ../../src/devices/board/lib/acpi/manager.cc:75 <platform-bus-x86.so>+0x602a3f #14 0x0000220c97c642c7 in publish_acpi_devices(acpi::Manager*, zx_device_t*, zx_device_t*) ../../src/devices/board/drivers/x86/acpi-nswalk.cc:102 <platform-bus-x86.so>+0x2d22c7 #15 0x0000220c97caf3e6 in x86::X86::do_init(x86::X86*) ../../src/devices/board/drivers/x86/x86.cc:65 <platform-bus-x86.so>+0x31d3e6 #16 0x0000220c97cd72ae in λ(x86::X86::ddk_init::(anon class)*) ../../src/devices/board/drivers/x86/x86.cc:82 <platform-bus-x86.so>+0x3452ae #17 0x0000220c97cd7223 in fit::internal::target<(lambda at../../src/devices/board/drivers/x86/x86.cc:81:19), false, false, void>::invoke(void*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:181 <platform-bus-x86.so>+0x345223 #18 0x0000220c97f48eb0 in fit::internal::function_base<16UL, false, void()>::invoke(const fit::internal::function_base<16UL, false, void ()>*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:505 <platform-bus-x86.so>+0x5b6eb0 #19 0x0000220c97f48d2a in fit::function_impl<16UL, false, void()>::operator()(const fit::function_impl<16UL, false, void ()>*) ../../sdk/lib/fit/include/lib/fit/function.h:300 <platform-bus-x86.so>+0x5b6d2a #20 0x0000220c982f9245 in async::internal::retained_task::Handler(async_dispatcher_t*, async_task_t*, zx_status_t) ../../zircon/system/ulib/async/task.cc:25 <platform-bus-x86.so>+0x967245 #21 0x000022e2aa1cd91e in λ(const driver_runtime::Dispatcher::post_task::(anon class)*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/dispatcher.cc:715 <libdriver_runtime.so>+0xed91e #22 0x000022e2aa1cd621 in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:714:7), true, false, void, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int>::invoke(void*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0xed621 #23 0x000022e2aa1a8482 in fit::internal::function_base<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int)>::invoke(const fit::internal::function_base<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int)>*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:505 <libdriver_runtime.so>+0xc8482 #24 0x000022e2aa1a80f8 in fit::callback_impl<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int)>::operator()(fit::callback_impl<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int)>*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/function.h:451 <libdriver_runtime.so>+0xc80f8 #25 0x000022e2aa17fc76 in driver_runtime::callback_request::Call(driver_runtime::callback_request*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/callback_request.h:67 <libdriver_runtime.so>+0x9fc76 #26 0x000022e2aa18c7ef in driver_runtime::Dispatcher::dispatch_callback(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >) ../../src/devices/bin/driver_runtime/dispatcher.cc:1093 <libdriver_runtime.so>+0xac7ef #27 0x000022e2aa18fd67 in driver_runtime::Dispatcher::dispatch_callbacks(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:1169 <libdriver_runtime.so>+0xafd67 #28 0x000022e2aa1bc9a2 in λ(const driver_runtime::Dispatcher::create_with_adder::(anon class)*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:338 <libdriver_runtime.so>+0xdc9a2 #29 0x000022e2aa1bc6d2 in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:337:7), true, false, void, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>>::invoke(void*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0xdc6d2 #30 0x000022e2aa1aa1e5 in fit::internal::function_base<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>)>::invoke(const fit::internal::function_base<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>)>*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:505 <libdriver_runtime.so>+0xca1e5 #31 0x000022e2aa1a9e32 in fit::function_impl<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>)>::operator()(const fit::function_impl<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>)>*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/function.h:300 <libdriver_runtime.so>+0xc9e32 #32 0x000022e2aa193444 in driver_runtime::Dispatcher::event_waiter::invoke_callback(driver_runtime::Dispatcher::event_waiter*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.h:299 <libdriver_runtime.so>+0xb3444 #33 0x000022e2aa192feb in driver_runtime::Dispatcher::event_waiter::handle_event(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/dispatcher.cc:1259 <libdriver_runtime.so>+0xb2feb #34 0x000022e2aa1bcf74 in async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event(async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>*, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/async_loop_owned_event_handler.h:59 <libdriver_runtime.so>+0xdcf74 #35 0x000022e2aa1bd1cb in async::wait_method<async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>, &async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event>::call_handler(async_dispatcher_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../zircon/system/ulib/async/include/lib/async/cpp/wait.h:201 <libdriver_runtime.so>+0xdd1cb #36 0x000022e2aa2303a9 in async_loop_dispatch_wait(async_loop_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../zircon/system/ulib/async-loop/loop.c:381 <libdriver_runtime.so>+0x1503a9 #37 0x000022e2aa229a82 in async_loop_run_once(async_loop_t*, zx_time_t) ../../zircon/system/ulib/async-loop/loop.c:330 <libdriver_runtime.so>+0x149a82 #38 0x000022e2aa229102 in async_loop_run(async_loop_t*, zx_time_t, _Bool) ../../zircon/system/ulib/async-loop/loop.c:288 <libdriver_runtime.so>+0x149102 #39 0x000022e2aa22aeb7 in async_loop_run_thread(void*) ../../zircon/system/ulib/async-loop/loop.c:840 <libdriver_runtime.so>+0x14aeb7 #40 0x000041a874980f1c in start_c11(void*) ../../zircon/third_party/ulib/musl/pthread/pthread_create.c:55 <libc.so>+0xd7f1c #41 0x000041a874aabe8d in thread_trampoline(uintptr_t, uintptr_t) ../../zircon/system/ulib/runtime/thread.cc:100 <libc.so>+0x202e8d Link: acpica/acpica@c1470833 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (cherry picked from commit 24d9609) Orabug: 37311726 Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Thomas Tai <thomas.tai@oracle.com> Signed-off-by: Vijayendra Suman <vijayendra.suman@oracle.com>
[ Upstream commit 826cc42adf44930a633d11a5993676d85ddb0842 ] My colleague Wupeng found the following problems during fault injection: BUG: unable to handle page fault for address: fffffbfff809d073 PGD 6e648067 P4D 123ec8067 PUD 123ec4067 PMD 100e38067 PTE 0 Oops: Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 5 UID: 0 PID: 755 Comm: modprobe Not tainted 6.12.0-rc3+ #17 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:__asan_load8+0x4c/0xa0 ... Call Trace: <TASK> blkdev_put_whole+0x41/0x70 bdev_release+0x1a3/0x250 blkdev_release+0x11/0x20 __fput+0x1d7/0x4a0 task_work_run+0xfc/0x180 syscall_exit_to_user_mode+0x1de/0x1f0 do_syscall_64+0x6b/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e loop_init() is calling loop_add() after __register_blkdev() succeeds and is ignoring disk_add() failure from loop_add(), for loop_add() failure is not fatal and successfully created disks are already visible to bdev_open(). brd_init() is currently calling brd_alloc() before __register_blkdev() succeeds and is releasing successfully created disks when brd_init() returns an error. This can cause UAF for the latter two case: case 1: T1: modprobe brd brd_init brd_alloc(0) // success add_disk disk_scan_partitions bdev_file_open_by_dev // alloc file fput // won't free until back to userspace brd_alloc(1) // failed since mem alloc error inject // error path for modprobe will release code segment // back to userspace __fput blkdev_release bdev_release blkdev_put_whole bdev->bd_disk->fops->release // fops is freed now, UAF! case 2: T1: T2: modprobe brd brd_init brd_alloc(0) // success open(/dev/ram0) brd_alloc(1) // fail // error path for modprobe close(/dev/ram0) ... /* UAF! */ bdev->bd_disk->fops->release Fix this problem by following what loop_init() does. Besides, reintroduce brd_devices_mutex to help serialize modifications to brd_list. Fixes: 7f9b348 ("brd: convert to blk_alloc_disk/blk_cleanup_disk") Reported-by: Wupeng Ma <mawupeng1@huawei.com> Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241030034914.907829-1-yangerkun@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 41219c147df8bbd6591f59af5d695fb6c9a1cbff) Signed-off-by: Vijayendra Suman <vijayendra.suman@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
The cited commit adds a compeletion to remove dependency on rtnl lock. But it causes a deadlock for multiple encapsulations: crash> bt ffff8aece8a64000 PID: 1514557 TASK: ffff8aece8a64000 CPU: 3 COMMAND: "tc" #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418 #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898 #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8 #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core] #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core] #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core] #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core] #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core] #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core] #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core] #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower] #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower] #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c crash> bt 0xffff8aeb07544000 PID: 1110766 TASK: ffff8aeb07544000 CPU: 0 COMMAND: "kworker/u20:9" #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45 #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418 #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88 #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core] #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core] #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core] #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012 #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f After the first encap is attached, flow will be added to encap entry's flows list. If neigh update is running at this time, the following encaps of the flow can't hold the encap_tbl_lock and sleep. If neigh update thread is waiting for that flow's init_done, deadlock happens. Fix it by holding lock outside of the for loop. If neigh update is running, prevent encap flows from offloading. Since the lock is held outside of the for loop, concurrent creation of encap entries is not allowed. So remove unnecessary wait_for_completion call for res_ready. Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Orabug: 35383105 (cherry picked from commit 37c3b9f) cherry-pick-repo=kernel/git/torvalds/linux.git unmodified-from-upstream: 37c3b9f Signed-off-by: Mikhael Goikhman <migo@nvidia.com> Signed-off-by: Qing Huang <qing.huang@oracle.com> Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com> Signed-off-by: Brian Maly <brian.maly@oracle.com>
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed packet, with shorter length can eventually cause system panic further down in the code path. Avoid it by validating the length and dropping it at the earliest. Following is seen in our env with shorter skb->len crash> bt PID: 76981 TASK: ff19828cfe508000 CPU: 106 COMMAND: "vhost-76942" #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd [exception RIP: memcpy_erms+6] RIP: ffffffffae261ab6 RSP: ff2d20159b39f700 RFLAGS: 00010293 RAX: ff198291741ecf2e RBX: ff19828e70d6a100 RCX: fffffffffea1af2b RDX: fffffffffffffffd RSI: ff19828eba6d7e5e RDI: ff198291757d2000 RBP: ff2d20159b39f760 R8: ff198291741ecf00 R9: 000000000000037c R10: 000000000000003c R11: ff19828ffe953940 R12: ff198291741ecf20 R13: ff198267dcb1b600 R14: ff19828eeebb09c0 R15: ff198291741ecf00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core] #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core] #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q] #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan] #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap] #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net] #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net] #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net] #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net] #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost] #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364 This change was discussed with Nvidia and they are in agreement. Orabug: 36660755 Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration") Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Lost wifi on Dell XPS 13 laptop after installing latest kernel on OL 9.2.
Offending kernel version: vmlinuz-5.15.0-200.131.27.el9uek.x86_64
Broken network adapter:
*-network
description: Wireless interface
product: Wireless-AC 9260
vendor: Intel Corporation
physical id: 0
bus info: pci@0000:3a:00.0
logical name: wlp58s0
version: 29
serial: 34:13:e8:99:2b:c2
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress msix bus_master cap_list ethernet physical wireless
configuration: broadcast=yes driver=iwlwifi driverversion=5.15.0-106.131.4.el9uek.x86_64 firmware=46.ff18e32a.0 9260-th-b0-jf-b0- ip= latency=0 link=yes multicast=yes wireless=IEEE 802.11
resources: irq:16 memory:dc200000-dc203fff
Reverting to previous kernel fixed issue.
The text was updated successfully, but these errors were encountered: