selftests: diag: failing: found msk after flush while expected none #339

matttbe · 2023-01-25T11:08:24Z

With this patch provided by @pabeni

diff --git a/include/net/sock.h b/include/net/sock.h
index dcd72e6285b2..23425610177f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1534,11 +1534,8 @@ struct prot_inuse {
        int val[PROTO_INUSE_NR];
 };
 
-static inline void sock_prot_inuse_add(const struct net *net,
-                                      const struct proto *prot, int val)
-{
-       this_cpu_add(net->core.prot_inuse->val[prot->inuse_idx], val);
-}
+void sock_prot_inuse_add(const struct net *net,
+                        const struct proto *prot, int val);
 
 static inline void sock_inuse_add(const struct net *net, int val)
 {
diff --git a/net/core/sock.c b/net/core/sock.c
index f954d5893e79..7a884b8b323e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3744,6 +3744,16 @@ int sock_inuse_get(struct net *net)
 
 EXPORT_SYMBOL_GPL(sock_inuse_get);
 
+void sock_prot_inuse_add(const struct net *net,
+                                      const struct proto *prot, int val)
+{
+       static int cnt;
+
+       this_cpu_add(net->core.prot_inuse->val[prot->inuse_idx], val);
+       cnt = this_cpu_read(net->core.prot_inuse->val[prot->inuse_idx]);
+}
+EXPORT_SYMBOL(sock_prot_inuse_add);
+
 static int __net_init sock_inuse_init_net(struct net *net)
 {
        net->core.prot_inuse = alloc_percpu(struct prot_inuse);
diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
index ef628b16fe9b..69d203704f3e 100755
--- a/tools/testing/selftests/net/mptcp/diag.sh
+++ b/tools/testing/selftests/net/mptcp/diag.sh
@@ -178,6 +178,18 @@ chk_msk_inuse()
        done
 
        __chk_nr get_msk_inuse $expected $*
+       if [ $((ret + 1)) == $test_cnt ]; then
+               ip netns exec $ns grep MPTCP /proc/net/protocols
+               ss -pinmHMN $ns
+               ip netns exec $ns nstat -as Tcp* MPTcp*
+
+               echo "let mptcp timeout elapse"
+               sleep 70
+               ss -pinmHMN $ns
+               ip netns exec $ns nstat -as Tcp* MPTcp*
+               ip netns exec $ns grep MPTCP /proc/net/protocols
+               exit $ret
+       fi
 }
 
 # $1: ns, $2: port

I got:

++ perf record -ag -e probe:sock_prot_inuse_add -e probe:__mptcp_destroy_sock ./diag.sh
no msk on netns creation                          [  ok  ]
listen match for dport 10000                      [  ok  ]
listen match for sport 10000                      [  ok  ]
listen match for saddr and sport                  [  ok  ]
all listen sockets                                [  ok  ]
after MPC handshake                               [  ok  ]
....chk remote_key                                [  ok  ]
....chk no fallback                               [  ok  ]
....chk 2 msk in use                              [  ok  ]
....chk 0 msk in use after flush                  [  ok  ]
check fallback                                    [  ok  ]
....chk 1 msk in use                              [  ok  ]
....chk 0 msk in use after flush                  [ fail ] expected 0 found 1
MPTCPv6   2936      0       0   no       0   no   kernel      y  y  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
MPTCP     2784      1       0   no       0   no   kernel      y  y  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
#kernel
TcpActiveOpens                  2                  0.0
TcpPassiveOpens                 2                  0.0
TcpInSegs                       25                 0.0
TcpOutSegs                      25                 0.0
TcpOutRsts                      1                  0.0
TcpExtTW                        1                  0.0
TcpExtTCPPureAcks               11                 0.0
TcpExtTCPHPAcks                 1                  0.0
TcpExtTCPAbortOnClose           1                  0.0
TcpExtTCPOrigDataSent           8                  0.0
TcpExtTCPDelivered              8                  0.0
MPTcpExtMPCapableSYNRX          1                  0.0
MPTcpExtMPCapableSYNTX          2                  0.0
MPTcpExtMPCapableSYNACKRX       1                  0.0
MPTcpExtMPCapableACKRX          1                  0.0
MPTcpExtMPCapableFallbackSYNACK 1                  0.0
let mptcp timeout elapse
#kernel
TcpActiveOpens                  2                  0.0
TcpPassiveOpens                 2                  0.0
TcpInSegs                       25                 0.0
TcpOutSegs                      25                 0.0
TcpOutRsts                      1                  0.0
TcpExtTW                        1                  0.0
TcpExtTCPPureAcks               11                 0.0
TcpExtTCPHPAcks                 1                  0.0
TcpExtTCPAbortOnClose           1                  0.0
TcpExtTCPOrigDataSent           8                  0.0
TcpExtTCPDelivered              8                  0.0
MPTcpExtMPCapableSYNRX          1                  0.0
MPTcpExtMPCapableSYNTX          2                  0.0
MPTcpExtMPCapableSYNACKRX       1                  0.0
MPTcpExtMPCapableACKRX          1                  0.0
MPTcpExtMPCapableFallbackSYNACK 1                  0.0
MPTCPv6   2936      0       0   no       0   no   kernel      y  y  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
MPTCP     2784      1       0   no       0   no   kernel      y  y  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
[ perf record: Woken up 1 times to write data ]
failed to mmap file
[ perf record: Captured and wrote 0.260 MB perf.data ]
++ rc=1

So still the same after 70 sec.

Info from Perf coming from:

perf probe -a sock_prot_inuse_add
perf probe -a __mptcp_destroy_sock
cd tools/testing/selftests/net/mptcp
run_loop perf record -ag -e probe:sock_prot_inuse_add -e probe:__mptcp_destroy_sock ./diag.sh
perf script > perf.data.txt

perf.data.txt.zip

With more probes:

perf probe -a sock_prot_inuse_add
perf probe -a __mptcp_destroy_sock
perf probe -k .virtme/build/vmlinux -a 'subflow_state_change sk sk->__sk_common.skc_state'
perf probe -k .virtme/build/vmlinux -a '__mptcp_close:8 msk=sk sk=msk->first sk->__sk_common.skc_state'
cd tools/testing/selftests/net/mptcp
run_loop _tap ${RESULTS_DIR}/perf.tap perf record -ag -e probe:sock_prot_inuse_add -e probe:__mptcp_destroy_sock -e probe:subflow_state_change -e probe:__mptcp_close_L8 -o perf.data ./diag.sh
perf script > perf.data.txt

perf.data_202301241347.zip

Reproducible using:

cd <kernel source code>

cat <<'EOF' > .virtme-exec-run
perf probe -a sock_prot_inuse_add
perf probe -a __mptcp_destroy_sock
perf probe -k .virtme/build/vmlinux -a 'subflow_state_change sk sk->__sk_common.skc_state'
perf probe -k .virtme/build/vmlinux -a '__mptcp_close:8 msk=sk sk=msk->first sk->__sk_common.skc_state'
cd tools/testing/selftests/net/mptcp
run_loop _tap ${RESULTS_DIR}/perf.tap perf record -ag -e probe:sock_prot_inuse_add -e probe:__mptcp_destroy_sock -e probe:subflow_state_change -e probe:__mptcp_close_L8 -o perf.data ./diag.sh
perf script > perf.data.txt
EOF

docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it --pull always mptcp/mptcp-upstream-virtme-docker:latest auto-debug

The text was updated successfully, but these errors were encountered:

matttbe · 2023-01-25T18:58:01Z

@pabeni: I managed to reproduce the same warning but a different issue on top of your patches from #320 and the fix you sent for this #339:

# no msk on netns creation                          [  ok  ]
# listen match for dport 10000                      [  ok  ]
# listen match for sport 10000                      [  ok  ]
# listen match for saddr and sport                  [  ok  ]
# all listen sockets                                [  ok  ]
# after MPC handshake                               [  ok  ]
# ....chk remote_key                                [  ok  ]
# ....chk no fallback                               [  ok  ]
# ....chk 2 msk in use                              [  ok  ]
# ....chk 0 msk in use after flush                  [  ok  ]
# check fallback                                    [  ok  ]
# ....chk 1 msk in use                              [  ok  ]
# ....chk 0 msk in use after flush                  [  ok  ]
# many msk socket present                           [  ok  ]
# ....chk many msk in use                           [  ok  ]
# ....chk 0 msk in use after flush                  [ fail ] expected 0 found 2
# MPTCPv6   2936      0       0   no       0   no   kernel      y  y  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
# MPTCP     2784      2       0   no       0   no   kernel      y  y  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
# LAST-ACK 0      0      127.0.0.1:38114 127.0.0.1:10006
#        skmem:(r0,rb131072,t0,tb16384,f0,w0,o0,bl0,d0) subflows_max:2 remote_key token:2a999a38 write_seq:4bbde1055e01f245 snd_una:4bbde1055e01f244 rcv_nxt:d9795a810e41cfd4
# LAST-ACK 0      0      127.0.0.1:36714 127.0.0.1:10066
#        skmem:(r0,rb131072,t0,tb16384,f0,w0,o0,bl0,d0) subflows_max:2 remote_key token:9dec73d1 write_seq:ae1ec9a088d6f32f snd_una:ae1ec9a088d6f32e rcv_nxt:37fd3816906823f6
# #kernel
# TcpActiveOpens                  102                0.0
# TcpPassiveOpens                 102                0.0
# TcpEstabResets                  198                0.0
# TcpInSegs                       1016               0.0
# TcpOutSegs                      1016               0.0
# TcpOutRsts                      110                0.0
# TcpExtTW                        4                  0.0
# TcpExtTCPPureAcks               480                0.0
# TcpExtTCPHPAcks                 1                  0.0
# TcpExtTCPAbortOnData            99                 0.0
# TcpExtTCPOrigDataSent           210                0.0
# TcpExtTCPDelivered              312                0.0
# MPTcpExtMPCapableSYNRX          101                0.0
# MPTcpExtMPCapableSYNTX          102                0.0
# MPTcpExtMPCapableSYNACKRX       101                0.0
# MPTcpExtMPCapableACKRX          101                0.0
# MPTcpExtMPCapableFallbackSYNACK 1                  0.0
# MPTcpExtMPFastcloseTx           99                 0.0
# MPTcpExtMPFastcloseRx           94                 0.0
# MPTcpExtMPRstTx                 99                 0.0
# MPTcpExtMPRstRx                 94                 0.0
# let mptcp timeout elapse
# #kernel
# TcpActiveOpens                  102                0.0
# TcpPassiveOpens                 102                0.0
# TcpEstabResets                  198                0.0
# TcpInSegs                       1016               0.0
# TcpOutSegs                      1016               0.0
# TcpOutRsts                      110                0.0
# TcpExtTW                        4                  0.0
# TcpExtTCPPureAcks               480                0.0
# TcpExtTCPHPAcks                 1                  0.0
# TcpExtTCPAbortOnData            99                 0.0
# TcpExtTCPOrigDataSent           210                0.0
# TcpExtTCPDelivered              312                0.0
# MPTcpExtMPCapableSYNRX          101                0.0
# MPTcpExtMPCapableSYNTX          102                0.0
# MPTcpExtMPCapableSYNACKRX       101                0.0
# MPTcpExtMPCapableACKRX          101                0.0
# MPTcpExtMPCapableFallbackSYNACK 1                  0.0
# MPTcpExtMPFastcloseTx           99                 0.0
# MPTcpExtMPFastcloseRx           94                 0.0
# MPTcpExtMPRstTx                 99                 0.0
# MPTcpExtMPRstRx                 94                 0.0
# MPTCPv6   2936      0       0   no       0   no   kernel      y  y  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
# MPTCP     2784      0       0   no       0   no   kernel      y  y  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
# [ perf record: Woken up 6 times to write data ]
# failed to mmap file
# [ perf record: Captured and wrote 1.575 MB perf.data ]
++ rc=12

Here, we can see the sockets are removed after 70 seconds.

perf.data_202301251954.zip

(I guess this is not related to your patches from #320 but a timing issue)

pabeni · 2023-01-25T21:13:53Z

@pabeni: I managed to reproduce the same warning but a different issue on top of your patches from #320 and the fix you sent for this #339:
[...]
(I guess this is not related to your patches from #320 but a timing issue)

I think this is instead related to the patches from #320, as I could not reproduce the new issue without the changes from #320.

For sure is a different issue from the one addressed with https://lore.kernel.org/mptcp/411cc0b4-af7f-5019-b2ac-a7361f3dcaa9@tessares.net/T/#t, as here no fallback socket is involved.

I think the fastclose is not generated at close time for some edgy scenarios

matttbe · 2023-01-26T09:11:05Z

After 5 hours of tests and 757 attempts, I managed to reproduce it on top of the export branch + the patch you sent for this issue (#339 - mptcp: do not propagate fallback subflow status on error) and your debug commit from above:

# no msk on netns creation                          [  ok  ]
# listen match for dport 10000                      [  ok  ]
# listen match for sport 10000                      [  ok  ]
# listen match for saddr and sport                  [  ok  ]
# all listen sockets                                [  ok  ]
# after MPC handshake                               [  ok  ]
# ....chk remote_key                                [  ok  ]
# ....chk no fallback                               [  ok  ]
# ....chk 2 msk in use                              [  ok  ]
# ....chk 0 msk in use after flush                  [  ok  ]
# check fallback                                    [  ok  ]
# ....chk 1 msk in use                              [  ok  ]
# ....chk 0 msk in use after flush                  [  ok  ]
# many msk socket present                           [  ok  ]
# ....chk many msk in use                           [  ok  ]
# ....chk 0 msk in use after flush                  [ fail ] expected 0 found 2
# MPTCPv6   2936      0       0   no       0   no   kernel      y  y  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
# MPTCP     2784      2       0   no       0   no   kernel      y  y  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
# LAST-ACK 0      0      127.0.0.1:38114 127.0.0.1:10006
#        skmem:(r0,rb131072,t0,tb16384,f0,w0,o0,bl0,d0) subflows_max:2 remote_key token:2a999a38 write_seq:4bbde1055e01f245 snd_una:4bbde1055e01f244 rcv_nxt:d9795a810e41cfd4
# LAST-ACK 0      0      127.0.0.1:36714 127.0.0.1:10066
#        skmem:(r0,rb131072,t0,tb16384,f0,w0,o0,bl0,d0) subflows_max:2 remote_key token:9dec73d1 write_seq:ae1ec9a088d6f32f snd_una:ae1ec9a088d6f32e rcv_nxt:37fd3816906823f6
# #kernel
# TcpActiveOpens                  102                0.0
# TcpPassiveOpens                 102                0.0
# TcpEstabResets                  198                0.0
# TcpInSegs                       1016               0.0
# TcpOutSegs                      1016               0.0
# TcpOutRsts                      110                0.0
# TcpExtTW                        4                  0.0
# TcpExtTCPPureAcks               480                0.0
# TcpExtTCPHPAcks                 1                  0.0
# TcpExtTCPAbortOnData            99                 0.0
# TcpExtTCPOrigDataSent           210                0.0
# TcpExtTCPDelivered              312                0.0
# MPTcpExtMPCapableSYNRX          101                0.0
# MPTcpExtMPCapableSYNTX          102                0.0
# MPTcpExtMPCapableSYNACKRX       101                0.0
# MPTcpExtMPCapableACKRX          101                0.0
# MPTcpExtMPCapableFallbackSYNACK 1                  0.0
# MPTcpExtMPFastcloseTx           99                 0.0
# MPTcpExtMPFastcloseRx           94                 0.0
# MPTcpExtMPRstTx                 99                 0.0
# MPTcpExtMPRstRx                 94                 0.0
# let mptcp timeout elapse
# #kernel
# TcpActiveOpens                  102                0.0
# TcpPassiveOpens                 102                0.0
# TcpEstabResets                  198                0.0
# TcpInSegs                       1016               0.0
# TcpOutSegs                      1016               0.0
# TcpOutRsts                      110                0.0
# TcpExtTW                        4                  0.0
# TcpExtTCPPureAcks               480                0.0
# TcpExtTCPHPAcks                 1                  0.0
# TcpExtTCPAbortOnData            99                 0.0
# TcpExtTCPOrigDataSent           210                0.0
# TcpExtTCPDelivered              312                0.0
# MPTcpExtMPCapableSYNRX          101                0.0
# MPTcpExtMPCapableSYNTX          102                0.0
# MPTcpExtMPCapableSYNACKRX       101                0.0
# MPTcpExtMPCapableACKRX          101                0.0
# MPTcpExtMPCapableFallbackSYNACK 1                  0.0
# MPTcpExtMPFastcloseTx           99                 0.0
# MPTcpExtMPFastcloseRx           94                 0.0
# MPTcpExtMPRstTx                 99                 0.0
# MPTcpExtMPRstRx                 94                 0.0
# MPTCPv6   2936      0       0   no       0   no   kernel      y  y  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
# MPTCP     2784      0       0   no       0   no   kernel      y  y  y  y  y  y  y  y  y  y  y  y  n  n  n  y  y  y  n
# [ perf record: Woken up 6 times to write data ]
# failed to mmap file
# [ perf record: Captured and wrote 1.575 MB perf.data ]
++ rc=12

perf.data_202301260100.zip

Currently the subflow error report callback unconditionally propagates the fallback subflow status to the owning msk. That is not needed as there is a great deal of infrastructure trying to propagate correctly the fallback subflow status to the owning mptcp socket, e.g. via mptcp_subflow_eof() and subflow_sched_work_if_closed(). And in some circumstances - specifically if the msk is already orphaned - it prevents the code from correctly tracking the msk moving to the TCP_CLOSE state and doing the appropriate cleanup. All the above causes increasing memory usage over time and sporadic self-tests failures. Address the issue simply removing the unneeded state update. Closes: multipath-tcp/mptcp_net-next#339 Fixes: 15cc104 ("mptcp: deliver ssk errors to msk") Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Currently the subflow error report callback unconditionally propagates the fallback subflow status to the owning msk. If the msk is already orphaned, the above prevents the code from correctly tracking the msk moving to the TCP_CLOSE state and doing the appropriate cleanup. All the above causes increasing memory usage over time and sporadic self-tests failures. There is a great deal of infrastructure trying to propagate correctly the fallback subflow status to the owning mptcp socket, e.g. via mptcp_subflow_eof() and subflow_sched_work_if_closed(): in the error propagation path we need only to cope with unorphaned sockets. Closes: multipath-tcp/mptcp_net-next#339 Fixes: 15cc104 ("mptcp: deliver ssk errors to msk") Signed-off-by: Paolo Abeni <pabeni@redhat.com> -- v1 -> v2: - propagate the status for non orphaned sockets

Currently the subflow error report callback unconditionally propagates the fallback subflow status to the owning msk. If the msk is already orphaned, the above prevents the code from correctly tracking the msk moving to the TCP_CLOSE state and doing the appropriate cleanup. All the above causes increasing memory usage over time and sporadic self-tests failures. There is a great deal of infrastructure trying to propagate correctly the fallback subflow status to the owning mptcp socket, e.g. via mptcp_subflow_eof() and subflow_sched_work_if_closed(): in the error propagation path we need only to cope with unorphaned sockets. Closes: multipath-tcp/mptcp_net-next#339 Fixes: 15cc104 ("mptcp: deliver ssk errors to msk") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> -- v2 -> v3: - cleanup code comment v1 -> v2: - propagate the status for non orphaned sockets

Currently the subflow error report callback unconditionally propagates the fallback subflow status to the owning msk. If the msk is already orphaned, the above prevents the code from correctly tracking the msk moving to the TCP_CLOSE state and doing the appropriate cleanup. All the above causes increasing memory usage over time and sporadic self-tests failures. There is a great deal of infrastructure trying to propagate correctly the fallback subflow status to the owning mptcp socket, e.g. via mptcp_subflow_eof() and subflow_sched_work_if_closed(): in the error propagation path we need only to cope with unorphaned sockets. Closes: #339 Fixes: 15cc104 ("mptcp: deliver ssk errors to msk") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>

Currently the subflow error report callback unconditionally propagates the fallback subflow status to the owning msk. If the msk is already orphaned, the above prevents the code from correctly tracking the msk moving to the TCP_CLOSE state and doing the appropriate cleanup. All the above causes increasing memory usage over time and sporadic self-tests failures. There is a great deal of infrastructure trying to propagate correctly the fallback subflow status to the owning mptcp socket, e.g. via mptcp_subflow_eof() and subflow_sched_work_if_closed(): in the error propagation path we need only to cope with unorphaned sockets. Closes: multipath-tcp/mptcp_net-next#339 Fixes: 15cc104 ("mptcp: deliver ssk errors to msk") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>

commit 1249db4 upstream. Currently the subflow error report callback unconditionally propagates the fallback subflow status to the owning msk. If the msk is already orphaned, the above prevents the code from correctly tracking the msk moving to the TCP_CLOSE state and doing the appropriate cleanup. All the above causes increasing memory usage over time and sporadic self-tests failures. There is a great deal of infrastructure trying to propagate correctly the fallback subflow status to the owning mptcp socket, e.g. via mptcp_subflow_eof() and subflow_sched_work_if_closed(): in the error propagation path we need only to cope with unorphaned sockets. Closes: multipath-tcp/mptcp_net-next#339 Fixes: 15cc104 ("mptcp: deliver ssk errors to msk") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1249db4 upstream. Currently the subflow error report callback unconditionally propagates the fallback subflow status to the owning msk. If the msk is already orphaned, the above prevents the code from correctly tracking the msk moving to the TCP_CLOSE state and doing the appropriate cleanup. All the above causes increasing memory usage over time and sporadic self-tests failures. There is a great deal of infrastructure trying to propagate correctly the fallback subflow status to the owning mptcp socket, e.g. via mptcp_subflow_eof() and subflow_sched_work_if_closed(): in the error propagation path we need only to cope with unorphaned sockets. Closes: multipath-tcp/mptcp_net-next#339 Fixes: 15cc104 ("mptcp: deliver ssk errors to msk") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 8d13f2c3e2ba1d6e4d6daf5993de3d6d4e15693f) Signed-off-by: Jack Vogel <jack.vogel@oracle.com>

Currently the subflow error report callback unconditionally propagates the fallback subflow status to the owning msk. If the msk is already orphaned, the above prevents the code from correctly tracking the msk moving to the TCP_CLOSE state and doing the appropriate cleanup. All the above causes increasing memory usage over time and sporadic self-tests failures. There is a great deal of infrastructure trying to propagate correctly the fallback subflow status to the owning mptcp socket, e.g. via mptcp_subflow_eof() and subflow_sched_work_if_closed(): in the error propagation path we need only to cope with unorphaned sockets. Closes: multipath-tcp/mptcp_net-next#339 Fixes: 15cc104 ("mptcp: deliver ssk errors to msk") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>

Add a test case which replaces an active ingress qdisc while keeping the miniq in-tact during the transition period to the new clsact qdisc. # ./vmtest.sh -- ./test_progs -t tc_link [...] ./test_progs -t tc_link [ 3.412871] bpf_testmod: loading out-of-tree module taints kernel. [ 3.413343] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #332 tc_links_after:OK #333 tc_links_append:OK #334 tc_links_basic:OK #335 tc_links_before:OK #336 tc_links_chain_classic:OK #337 tc_links_chain_mixed:OK #338 tc_links_dev_chain0:OK #339 tc_links_dev_cleanup:OK #340 tc_links_dev_mixed:OK #341 tc_links_ingress:OK #342 tc_links_invalid:OK #343 tc_links_prepend:OK #344 tc_links_replace:OK #345 tc_links_revision:OK Summary: 14/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240708133130.11609-2-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

Using mutex lock in IO hot path causes the kernel BUG sleeping while atomic. Shinichiro[1], first encountered this issue while running blktest nvme/052 shown below: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 996, name: (udev-worker) preempt_count: 0, expected: 0 RCU nest depth: 1, expected: 0 2 locks held by (udev-worker)/996: #0: ffff8881004570c8 (mapping.invalidate_lock){.+.+}-{3:3}, at: page_cache_ra_unbounded+0x155/0x5c0 #1: ffffffff8607eaa0 (rcu_read_lock){....}-{1:2}, at: blk_mq_flush_plug_list+0xa75/0x1950 CPU: 2 UID: 0 PID: 996 Comm: (udev-worker) Not tainted 6.12.0-rc3+ #339 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x6a/0x90 __might_resched.cold+0x1f7/0x23d ? __pfx___might_resched+0x10/0x10 ? vsnprintf+0xdeb/0x18f0 __mutex_lock+0xf4/0x1220 ? nvmet_subsys_nsid_exists+0xb9/0x150 [nvmet] ? __pfx_vsnprintf+0x10/0x10 ? __pfx___mutex_lock+0x10/0x10 ? snprintf+0xa5/0xe0 ? xas_load+0x1ce/0x3f0 ? nvmet_subsys_nsid_exists+0xb9/0x150 [nvmet] nvmet_subsys_nsid_exists+0xb9/0x150 [nvmet] ? __pfx_nvmet_subsys_nsid_exists+0x10/0x10 [nvmet] nvmet_req_find_ns+0x24e/0x300 [nvmet] nvmet_req_init+0x694/0xd40 [nvmet] ? blk_mq_start_request+0x11c/0x750 ? nvme_setup_cmd+0x369/0x990 [nvme_core] nvme_loop_queue_rq+0x2a7/0x7a0 [nvme_loop] ? __pfx___lock_acquire+0x10/0x10 ? __pfx_nvme_loop_queue_rq+0x10/0x10 [nvme_loop] __blk_mq_issue_directly+0xe2/0x1d0 ? __pfx___blk_mq_issue_directly+0x10/0x10 ? blk_mq_request_issue_directly+0xc2/0x140 blk_mq_plug_issue_direct+0x13f/0x630 ? lock_acquire+0x2d/0xc0 ? blk_mq_flush_plug_list+0xa75/0x1950 blk_mq_flush_plug_list+0xa9d/0x1950 ? __pfx_blk_mq_flush_plug_list+0x10/0x10 ? __pfx_mpage_readahead+0x10/0x10 __blk_flush_plug+0x278/0x4d0 ? __pfx___blk_flush_plug+0x10/0x10 ? lock_release+0x460/0x7a0 blk_finish_plug+0x4e/0x90 read_pages+0x51b/0xbc0 ? __pfx_read_pages+0x10/0x10 ? lock_release+0x460/0x7a0 page_cache_ra_unbounded+0x326/0x5c0 force_page_cache_ra+0x1ea/0x2f0 filemap_get_pages+0x59e/0x17b0 ? __pfx_filemap_get_pages+0x10/0x10 ? lock_is_held_type+0xd5/0x130 ? __pfx___might_resched+0x10/0x10 ? find_held_lock+0x2d/0x110 filemap_read+0x317/0xb70 ? up_write+0x1ba/0x510 ? __pfx_filemap_read+0x10/0x10 ? inode_security+0x54/0xf0 ? selinux_file_permission+0x36d/0x420 blkdev_read_iter+0x143/0x3b0 vfs_read+0x6ac/0xa20 ? __pfx_vfs_read+0x10/0x10 ? __pfx_vm_mmap_pgoff+0x10/0x10 ? __pfx___seccomp_filter+0x10/0x10 ksys_read+0xf7/0x1d0 ? __pfx_ksys_read+0x10/0x10 do_syscall_64+0x93/0x180 ? lockdep_hardirqs_on_prepare+0x16d/0x400 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on+0x78/0x100 ? do_syscall_64+0x9f/0x180 ? lockdep_hardirqs_on_prepare+0x16d/0x400 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f565bd1ce11 Code: 00 48 8b 15 09 90 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd e8 d0 ad 01 00 f3 0f 1e fa 80 3d 35 12 0e 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 4f c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec RSP: 002b:00007ffd6e7a20c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f565bd1ce11 RDX: 0000000000001000 RSI: 00007f565babb000 RDI: 0000000000000014 RBP: 00007ffd6e7a2130 R08: 00000000ffffffff R09: 0000000000000000 R10: 0000556000bfa610 R11: 0000000000000246 R12: 000000003ffff000 R13: 0000556000bfa5b0 R14: 0000000000000e00 R15: 0000556000c07328 </TASK> Apparently, the above issue is caused due to using mutex lock while we're in IO hot path. It's a regression caused with commit 5053639 ("nvmet: fix nvme status code when namespace is disabled"). The mutex ->su_mutex is used to find whether a disabled nsid exists in the config group or not. This is to differentiate between a nsid that is disabled vs non-existent. To mitigate the above issue, we've worked upon a fix[2] where we now insert nsid in subsys Xarray as soon as it's created under config group and later when that nsid is enabled, we add an Xarray mark on it and set ns->enabled to true. The Xarray mark is useful while we need to loop through all enabled namepsaces under a subsystem using xa_for_each_marked() API. If later a nsid is disabled then we clear Xarray mark from it and also set ns->enabled to false. It's only when nsid is deleted from the config group we delete it from the Xarray. So with this change, now we could easily differentiate a nsid is disabled (i.e. Xarray entry for ns exists but ns->enabled is set to false) vs non- existent (i.e.Xarray entry for ns doesn't exist). Link: https://lore.kernel.org/linux-nvme/20241022070252.GA11389@lst.de/ [2] Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Closes: https://lore.kernel.org/linux-nvme/tqcy3sveity7p56v7ywp7ssyviwcb3w4623cnxj3knoobfcanq@yxgt2mjkbkam/ [1] Fixes: 5053639 ("nvmet: fix nvme status code when namespace is disabled") Fix-suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Keith Busch <kbusch@kernel.org>

matttbe mentioned this issue Jan 25, 2023

Broken SELinux/LSM labelling with MPTCP and accept(2) #320

Closed

matttbe added bug selftests labels Jan 26, 2023

matttbe assigned pabeni Jan 26, 2023

matttbe closed this as completed in 5c34d69 Jan 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

selftests: diag: failing: found msk after flush while expected none #339

selftests: diag: failing: found msk after flush while expected none #339

matttbe commented Jan 25, 2023 •

edited

Loading

matttbe commented Jan 25, 2023 •

edited

Loading

pabeni commented Jan 25, 2023

matttbe commented Jan 26, 2023 •

edited

Loading

selftests: diag: failing: found msk after flush while expected none #339

selftests: diag: failing: found msk after flush while expected none #339

Comments

matttbe commented Jan 25, 2023 • edited Loading

matttbe commented Jan 25, 2023 • edited Loading

pabeni commented Jan 25, 2023

matttbe commented Jan 26, 2023 • edited Loading

matttbe commented Jan 25, 2023 •

edited

Loading

matttbe commented Jan 25, 2023 •

edited

Loading

matttbe commented Jan 26, 2023 •

edited

Loading