Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MADV_DOEXEC and MADV_DONTEXEC should be redefined within a separate numerical range #23

Closed
cnqpzhang opened this issue Feb 23, 2024 · 3 comments

Comments

@cnqpzhang
Copy link

Description:

MADV_DOEXEC and MADV_DONTEXEC should be redefined within a separate numerical range otherwise break binary compatibility with mainline kernels.

Diagnostic Info:

Kernel v5.4 build from uek6/u3 has a commit a91ae4f, which had two madvise mode numbers defined as below:

#define MADV_DOEXEC	22		/* do inherit across exec */
#define MADV_DONTEXEC	23		/* don't inherit across exec */

In comparison, Linux mainline v5.4 does not have such definitions, the top number was 21, see v5.4 219d54332

Moving forward, Linux mainline started to use number 22 and 23, at its 5.14-rc1 timeframe, see torvalds/linux@4ca9b38.

#define MADV_POPULATE_READ	22	/* populate (prefault) page tables readable */
#define MADV_POPULATE_WRITE	23	/* populate (prefault) page tables writable */

According to Linux man-pages, the way to tell whether MADV_POPULATE_WRITE is supported on a testing system is:

MADV_POPULATE_WRITE (since Linux 5.14)
madvise(0, 0, advice) will return zero iff advice is supported by the kernel and can be relied on to probe for support.

As a result, when we do a syscall madvise(0, 0, 23) on UEKR6 v5.4.17 kernel will return 0 which means supported, while Linux v5.4 mainline returns -1 that means not-supported. The duplicate definition breaks the binary compatibility.

This issue is currently causing a practical failure on OpenJDK. See the ticket JDK-8324776 and discussion upon for details.

Other issue:

Kernel v5.15 on uek7/u2 has a similar problem. The commit 4693c5d integrated the defs of 22 and 23 from Linux mainline, while used 24 and 25 for the two customized mode numbers.

#define MADV_POPULATE_READ	22	/* populate (prefault) page tables readable */
#define MADV_POPULATE_WRITE	23	/* populate (prefault) page tables writable */
#define MADV_DOEXEC	24		/* do inherit across exec */
#define MADV_DONTEXEC	25		/* don't inherit across exec */

This created another incompatibility against Linux mainline's mode MADV_DONTNEED_LOCKED 24 introduced by torvalds/linux@9457056 since v5.18-rc1, and mode MADV_COLLAPSE 25 added by torvalds/linux@7d8faaf since v6.1-rc1.

See details at mainline b401b621:

#define MADV_POPULATE_READ	22	/* populate (prefault) page tables readable */
#define MADV_POPULATE_WRITE	23	/* populate (prefault) page tables writable */
#define MADV_DONTNEED_LOCKED	24	/* like DONTNEED, but drop locked pages too */
#define MADV_COLLAPSE	25		/* Synchronous hugepage collapse */

Proposed changes:

Redefine the customized modes MADV_DOEXEC and MADV_DONTEXEC within a separate numerical range, for example 101, 102.

As such it can avoid binary compatibility broken issues, UEK6 and UEK7 can also have same definitions of these two modes, and future UEKs do not need to move them to any new numbers, better for maintenance.

Any other similar or workable solution is also acceptable.

@YoderExMachina
Copy link
Member

Thanks for reporting this. I have created an internal ticket to track the issue.

@almaclang
Copy link

almaclang commented Feb 24, 2024 via email

@YoderExMachina
Copy link
Member

This is now be fixed in UEK7U2, UEK6U3, and UEK5U5.

oraclelinuxkernel pushed a commit that referenced this issue Jun 12, 2024
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Orabug: 35383105

(cherry picked from commit 37c3b9f)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 37c3b9f

Signed-off-by: Mikhael Goikhman <migo@nvidia.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
mark-nicholson pushed a commit that referenced this issue Jul 2, 2024
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Orabug: 35383105

(cherry picked from commit 37c3b9f)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 37c3b9f

Signed-off-by: Mikhael Goikhman <migo@nvidia.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jul 26, 2024
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879156
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jul 26, 2024
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879157
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
(cherry picked from commit e7fd2c2)
Signed-off-by: Sherry Yang <sherry.yang@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jul 26, 2024
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879158
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jul 26, 2024
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879159
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>

In UEK4 stats is not a pointer, change the dropped code.

Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Nov 12, 2024
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879156
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
(cherry picked from commit 0dd4b99)

Orabug: 36879126
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Signed-off-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>
Reviewed-by: Vijayendra Suman <vijayendra.suman@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jan 17, 2025
…ress

ACPICA commit c14708336bd18552b28643575de7b5beb9b864e9

Before this change we see the following UBSAN stack trace in Fuchsia:

  #0    0x0000220c98288eba in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:331 <platform-bus-x86.so>+0x8f6eba
  #1.2  0x000023625f46077f in ubsan_get_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:41 <libclang_rt.asan.so>+0x3d77f
  #1.1  0x000023625f46077f in maybe_print_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:51 <libclang_rt.asan.so>+0x3d77f
  #1    0x000023625f46077f in ~scoped_report() compiler-rt/lib/ubsan/ubsan_diag.cpp:387 <libclang_rt.asan.so>+0x3d77f
  #2    0x000023625f461385 in handletype_mismatch_impl() compiler-rt/lib/ubsan/ubsan_handlers.cpp:137 <libclang_rt.asan.so>+0x3e385
  #3    0x000023625f460ead in compiler-rt/lib/ubsan/ubsan_handlers.cpp:142 <libclang_rt.asan.so>+0x3dead
  #4    0x0000220c98288eba in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:331 <platform-bus-x86.so>+0x8f6eba
  #5    0x0000220c9828ea57 in acpi_rs_convert_aml_to_resource(struct acpi_resource*, union aml_resource*, struct acpi_rsconvert_info*) ../../third_party/acpica/source/components/resources/rsmisc.c:352 <platform-bus-x86.so>+0x8fca57
  #6    0x0000220c9828992c in acpi_rs_convert_aml_to_resources(u8*, u32, u32, u8, void**) ../../third_party/acpica/source/components/resources/rslist.c:132 <platform-bus-x86.so>+0x8f792c
  #7    0x0000220c982d1cfc in acpi_ut_walk_aml_resources(struct acpi_walk_state*, u8*, acpi_size, acpi_walk_aml_callback, void**) ../../third_party/acpica/source/components/utilities/utresrc.c:234 <platform-bus-x86.so>+0x93fcfc
  #8    0x0000220c98281e46 in acpi_rs_create_resource_list(union acpi_operand_object*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rscreate.c:199 <platform-bus-x86.so>+0x8efe46
  #9    0x0000220c98293b51 in acpi_rs_get_method_data(acpi_handle, const char*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rsutils.c:770 <platform-bus-x86.so>+0x901b51
  #10   0x0000220c9829438d in acpi_walk_resources(acpi_handle, char*, acpi_walk_resource_callback, void*) ../../third_party/acpica/source/components/resources/rsxface.c:731 <platform-bus-x86.so>+0x90238d
  #11   0x0000220c97db272b in acpi::acpi_impl::walk_resources(acpi::acpi_impl*, acpi_handle, const char*, acpi::Acpi::resources_callable) ../../src/devices/board/lib/acpi/acpi-impl.cc:41 <platform-bus-x86.so>+0x42072b
  #12   0x0000220c97dcec59 in acpi::device_builder::gather_resources(acpi::device_builder*, acpi::Acpi*, fidl::any_arena&, acpi::Manager*, acpi::device_builder::gather_resources_callback) ../../src/devices/board/lib/acpi/device-builder.cc:52 <platform-bus-x86.so>+0x43cc59
  #13   0x0000220c97f94a3f in acpi::Manager::configure_discovered_devices(acpi::Manager*) ../../src/devices/board/lib/acpi/manager.cc:75 <platform-bus-x86.so>+0x602a3f
  #14   0x0000220c97c642c7 in publish_acpi_devices(acpi::Manager*, zx_device_t*, zx_device_t*) ../../src/devices/board/drivers/x86/acpi-nswalk.cc:102 <platform-bus-x86.so>+0x2d22c7
  #15   0x0000220c97caf3e6 in x86::X86::do_init(x86::X86*) ../../src/devices/board/drivers/x86/x86.cc:65 <platform-bus-x86.so>+0x31d3e6
  #16   0x0000220c97cd72ae in λ(x86::X86::ddk_init::(anon class)*) ../../src/devices/board/drivers/x86/x86.cc:82 <platform-bus-x86.so>+0x3452ae
  #17   0x0000220c97cd7223 in fit::internal::target<(lambda at../../src/devices/board/drivers/x86/x86.cc:81:19), false, false, void>::invoke(void*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:181 <platform-bus-x86.so>+0x345223
  #18   0x0000220c97f48eb0 in fit::internal::function_base<16UL, false, void()>::invoke(const fit::internal::function_base<16UL, false, void ()>*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:505 <platform-bus-x86.so>+0x5b6eb0
  #19   0x0000220c97f48d2a in fit::function_impl<16UL, false, void()>::operator()(const fit::function_impl<16UL, false, void ()>*) ../../sdk/lib/fit/include/lib/fit/function.h:300 <platform-bus-x86.so>+0x5b6d2a
  #20   0x0000220c982f9245 in async::internal::retained_task::Handler(async_dispatcher_t*, async_task_t*, zx_status_t) ../../zircon/system/ulib/async/task.cc:25 <platform-bus-x86.so>+0x967245
  #21   0x000022e2aa1cd91e in λ(const driver_runtime::Dispatcher::post_task::(anon class)*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/dispatcher.cc:715 <libdriver_runtime.so>+0xed91e
  #22   0x000022e2aa1cd621 in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:714:7), true, false, void, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int>::invoke(void*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0xed621
  #23   0x000022e2aa1a8482 in fit::internal::function_base<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int)>::invoke(const fit::internal::function_base<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int)>*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:505 <libdriver_runtime.so>+0xc8482
  #24   0x000022e2aa1a80f8 in fit::callback_impl<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int)>::operator()(fit::callback_impl<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int)>*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/function.h:451 <libdriver_runtime.so>+0xc80f8
  #25   0x000022e2aa17fc76 in driver_runtime::callback_request::Call(driver_runtime::callback_request*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/callback_request.h:67 <libdriver_runtime.so>+0x9fc76
  #26   0x000022e2aa18c7ef in driver_runtime::Dispatcher::dispatch_callback(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >) ../../src/devices/bin/driver_runtime/dispatcher.cc:1093 <libdriver_runtime.so>+0xac7ef
  #27   0x000022e2aa18fd67 in driver_runtime::Dispatcher::dispatch_callbacks(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:1169 <libdriver_runtime.so>+0xafd67
  #28   0x000022e2aa1bc9a2 in λ(const driver_runtime::Dispatcher::create_with_adder::(anon class)*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:338 <libdriver_runtime.so>+0xdc9a2
  #29   0x000022e2aa1bc6d2 in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:337:7), true, false, void, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>>::invoke(void*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0xdc6d2
  #30   0x000022e2aa1aa1e5 in fit::internal::function_base<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>)>::invoke(const fit::internal::function_base<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>)>*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:505 <libdriver_runtime.so>+0xca1e5
  #31   0x000022e2aa1a9e32 in fit::function_impl<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>)>::operator()(const fit::function_impl<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>)>*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/function.h:300 <libdriver_runtime.so>+0xc9e32
  #32   0x000022e2aa193444 in driver_runtime::Dispatcher::event_waiter::invoke_callback(driver_runtime::Dispatcher::event_waiter*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.h:299 <libdriver_runtime.so>+0xb3444
  #33   0x000022e2aa192feb in driver_runtime::Dispatcher::event_waiter::handle_event(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/dispatcher.cc:1259 <libdriver_runtime.so>+0xb2feb
  #34   0x000022e2aa1bcf74 in async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event(async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>*, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/async_loop_owned_event_handler.h:59 <libdriver_runtime.so>+0xdcf74
  #35   0x000022e2aa1bd1cb in async::wait_method<async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>, &async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event>::call_handler(async_dispatcher_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../zircon/system/ulib/async/include/lib/async/cpp/wait.h:201 <libdriver_runtime.so>+0xdd1cb
  #36   0x000022e2aa2303a9 in async_loop_dispatch_wait(async_loop_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../zircon/system/ulib/async-loop/loop.c:381 <libdriver_runtime.so>+0x1503a9
  #37   0x000022e2aa229a82 in async_loop_run_once(async_loop_t*, zx_time_t) ../../zircon/system/ulib/async-loop/loop.c:330 <libdriver_runtime.so>+0x149a82
  #38   0x000022e2aa229102 in async_loop_run(async_loop_t*, zx_time_t, _Bool) ../../zircon/system/ulib/async-loop/loop.c:288 <libdriver_runtime.so>+0x149102
  #39   0x000022e2aa22aeb7 in async_loop_run_thread(void*) ../../zircon/system/ulib/async-loop/loop.c:840 <libdriver_runtime.so>+0x14aeb7
  #40   0x000041a874980f1c in start_c11(void*) ../../zircon/third_party/ulib/musl/pthread/pthread_create.c:55 <libc.so>+0xd7f1c
  #41   0x000041a874aabe8d in thread_trampoline(uintptr_t, uintptr_t) ../../zircon/system/ulib/runtime/thread.cc:100 <libc.so>+0x202e8d

Link: acpica/acpica@c1470833
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 24d9609)

Orabug: 37311726

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reviewed-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: Vijayendra Suman <vijayendra.suman@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jan 24, 2025
[ Upstream commit 146b6f1112eb30a19776d6c323c994e9d67790db ]

Under certain kernel configurations when building with Clang/LLVM, the
compiler does not generate a return or jump as the terminator
instruction for ip_vs_protocol_init(), triggering the following objtool
warning during build time:

  vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6()

At runtime, this either causes an oops when trying to load the ipvs
module or a boot-time panic if ipvs is built-in. This same issue has
been reported by the Intel kernel test robot previously.

Digging deeper into both LLVM and the kernel code reveals this to be a
undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer
of 64 chars to store the registered protocol names and leaves it
uninitialized after definition. The function calls strnlen() when
concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE
strnlen() performs an extra step to check whether the last byte of the
input char buffer is a null character (commit 3009f89 ("fortify:
Allow strlen() and strnlen() to pass compile-time known lengths")).
This, together with possibly other configurations, cause the following
IR to be generated:

  define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #5 section ".init.text" align 16 !kcfi_type !29 {
    %1 = alloca [64 x i8], align 16
    ...

  14:                                               ; preds = %11
    %15 = getelementptr inbounds i8, ptr %1, i64 63
    %16 = load i8, ptr %15, align 1
    %17 = tail call i1 @llvm.is.constant.i8(i8 %16)
    %18 = icmp eq i8 %16, 0
    %19 = select i1 %17, i1 %18, i1 false
    br i1 %19, label %20, label %23

  20:                                               ; preds = %14
    %21 = call i64 @strlen(ptr noundef nonnull dereferenceable(1) %1) #23
    ...

  23:                                               ; preds = %14, %11, %20
    %24 = call i64 @strnlen(ptr noundef nonnull dereferenceable(1) %1, i64 noundef 64) #24
    ...
  }

The above code calculates the address of the last char in the buffer
(value %15) and then loads from it (value %16). Because the buffer is
never initialized, the LLVM GVN pass marks value %16 as undefined:

  %13 = getelementptr inbounds i8, ptr %1, i64 63
  br i1 undef, label %14, label %17

This gives later passes (SCCP, in particular) more DCE opportunities by
propagating the undef value further, and eventually removes everything
after the load on the uninitialized stack location:

  define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #0 section ".init.text" align 16 !kcfi_type !11 {
    %1 = alloca [64 x i8], align 16
    ...

  12:                                               ; preds = %11
    %13 = getelementptr inbounds i8, ptr %1, i64 63
    unreachable
  }

In this way, the generated native code will just fall through to the
next function, as LLVM does not generate any code for the unreachable IR
instruction and leaves the function without a terminator.

Zero the on-stack buffer to avoid this possible UB.

Fixes: 1da177e ("Linux-2.6.12-rc2")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402100205.PWXIz1ZK-lkp@intel.com/
Co-developed-by: Ruowen Qin <ruqin@redhat.com>
Signed-off-by: Ruowen Qin <ruqin@redhat.com>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d6e1776f51c95827142f1d7064118e255e2deec1)
Signed-off-by: Vijayendra Suman <vijayendra.suman@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jan 24, 2025
[ Upstream commit 146b6f1112eb30a19776d6c323c994e9d67790db ]

Under certain kernel configurations when building with Clang/LLVM, the
compiler does not generate a return or jump as the terminator
instruction for ip_vs_protocol_init(), triggering the following objtool
warning during build time:

  vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6()

At runtime, this either causes an oops when trying to load the ipvs
module or a boot-time panic if ipvs is built-in. This same issue has
been reported by the Intel kernel test robot previously.

Digging deeper into both LLVM and the kernel code reveals this to be a
undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer
of 64 chars to store the registered protocol names and leaves it
uninitialized after definition. The function calls strnlen() when
concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE
strnlen() performs an extra step to check whether the last byte of the
input char buffer is a null character (commit 3009f89 ("fortify:
Allow strlen() and strnlen() to pass compile-time known lengths")).
This, together with possibly other configurations, cause the following
IR to be generated:

  define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #5 section ".init.text" align 16 !kcfi_type !29 {
    %1 = alloca [64 x i8], align 16
    ...

  14:                                               ; preds = %11
    %15 = getelementptr inbounds i8, ptr %1, i64 63
    %16 = load i8, ptr %15, align 1
    %17 = tail call i1 @llvm.is.constant.i8(i8 %16)
    %18 = icmp eq i8 %16, 0
    %19 = select i1 %17, i1 %18, i1 false
    br i1 %19, label %20, label %23

  20:                                               ; preds = %14
    %21 = call i64 @strlen(ptr noundef nonnull dereferenceable(1) %1) #23
    ...

  23:                                               ; preds = %14, %11, %20
    %24 = call i64 @strnlen(ptr noundef nonnull dereferenceable(1) %1, i64 noundef 64) #24
    ...
  }

The above code calculates the address of the last char in the buffer
(value %15) and then loads from it (value %16). Because the buffer is
never initialized, the LLVM GVN pass marks value %16 as undefined:

  %13 = getelementptr inbounds i8, ptr %1, i64 63
  br i1 undef, label %14, label %17

This gives later passes (SCCP, in particular) more DCE opportunities by
propagating the undef value further, and eventually removes everything
after the load on the uninitialized stack location:

  define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #0 section ".init.text" align 16 !kcfi_type !11 {
    %1 = alloca [64 x i8], align 16
    ...

  12:                                               ; preds = %11
    %13 = getelementptr inbounds i8, ptr %1, i64 63
    unreachable
  }

In this way, the generated native code will just fall through to the
next function, as LLVM does not generate any code for the unreachable IR
instruction and leaves the function without a terminator.

Zero the on-stack buffer to avoid this possible UB.

Fixes: 1da177e ("Linux-2.6.12-rc2")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402100205.PWXIz1ZK-lkp@intel.com/
Co-developed-by: Ruowen Qin <ruqin@redhat.com>
Signed-off-by: Ruowen Qin <ruqin@redhat.com>
Signed-off-by: Jinghao Jia <jinghao7@illinois.edu>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 31d1ddc1ce8e8d3f101a679243abb42a313ee88a)
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jan 28, 2025
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Orabug: 35383105

(cherry picked from commit 37c3b9f)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 37c3b9f

Signed-off-by: Mikhael Goikhman <migo@nvidia.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jan 28, 2025
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Orabug: 35383105

(cherry picked from commit 37c3b9f)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 37c3b9f

Signed-off-by: Mikhael Goikhman <migo@nvidia.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jan 28, 2025
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Orabug: 35383105

(cherry picked from commit 37c3b9f)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 37c3b9f

Signed-off-by: Mikhael Goikhman <migo@nvidia.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jan 28, 2025
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Orabug: 35383105

(cherry picked from commit 37c3b9f)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 37c3b9f

Signed-off-by: Mikhael Goikhman <migo@nvidia.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jan 28, 2025
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Orabug: 35383105

(cherry picked from commit 37c3b9f)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 37c3b9f

Signed-off-by: Mikhael Goikhman <migo@nvidia.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jan 28, 2025
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Orabug: 35383105

(cherry picked from commit 37c3b9f)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 37c3b9f

Signed-off-by: Mikhael Goikhman <migo@nvidia.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jan 28, 2025
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Orabug: 35383105

(cherry picked from commit 37c3b9f)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 37c3b9f

Signed-off-by: Mikhael Goikhman <migo@nvidia.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jan 28, 2025
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36660755

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants