Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: permit map_ptr arithmetic with opcode add and offset 0 #20

Closed
wants to merge 3 commits into from

Conversation

kernel-patches-bot
Copy link

Pull request for series with
subject: bpf: permit map_ptr arithmetic with opcode add and offset 0
version: 3
url: https://patchwork.ozlabs.org/project/netdev/list/?series=200277

tsipa and others added 3 commits September 8, 2020 15:00
to bpf map fields") added support to access map fields
with CORE support. For example,

            struct bpf_map {
                    __u32 max_entries;
            } __attribute__((preserve_access_index));

            struct bpf_array {
                    struct bpf_map map;
                    __u32 elem_size;
            } __attribute__((preserve_access_index));

            struct {
                    __uint(type, BPF_MAP_TYPE_ARRAY);
                    __uint(max_entries, 4);
                    __type(key, __u32);
                    __type(value, __u32);
            } m_array SEC(".maps");

            SEC("cgroup_skb/egress")
            int cg_skb(void *ctx)
            {
                    struct bpf_array *array = (struct bpf_array *)&m_array;

                    /* .. array->map.max_entries .. */
            }

In kernel, bpf_htab has similar structure,

	    struct bpf_htab {
		    struct bpf_map map;
                    ...
            }

In the above cg_skb(), to access array->map.max_entries, with CORE, the clang will
generate two builtin's.
            base = &m_array;
            /* access array.map */
            map_addr = __builtin_preserve_struct_access_info(base, 0, 0);
            /* access array.map.max_entries */
            max_entries_addr = __builtin_preserve_struct_access_info(map_addr, 0, 0);
	    max_entries = *max_entries_addr;

In the current llvm, if two builtin's are in the same function or
in the same function after inlining, the compiler is smart enough to chain
them together and generates like below:
            base = &m_array;
            max_entries = *(base + reloc_offset); /* reloc_offset = 0 in this case */
and we are fine.

But if we force no inlining for one of functions in test_map_ptr() selftest, e.g.,
check_default(), the above two __builtin_preserve_* will be in two different
functions. In this case, we will have code like:
   func check_hash():
            reloc_offset_map = 0;
            base = &m_array;
            map_base = base + reloc_offset_map;
            check_default(map_base, ...)
   func check_default(map_base, ...):
            max_entries = *(map_base + reloc_offset_max_entries);

In kernel, map_ptr (CONST_PTR_TO_MAP) does not allow any arithmetic.
The above "map_base = base + reloc_offset_map" will trigger a verifier failure.
  ; VERIFY(check_default(&hash->map, map));
  0: (18) r7 = 0xffffb4fe8018a004
  2: (b4) w1 = 110
  3: (63) *(u32 *)(r7 +0) = r1
   R1_w=invP110 R7_w=map_value(id=0,off=4,ks=4,vs=8,imm=0) R10=fp0
  ; VERIFY_TYPE(BPF_MAP_TYPE_HASH, check_hash);
  4: (18) r1 = 0xffffb4fe8018a000
  6: (b4) w2 = 1
  7: (63) *(u32 *)(r1 +0) = r2
   R1_w=map_value(id=0,off=0,ks=4,vs=8,imm=0) R2_w=invP1 R7_w=map_value(id=0,off=4,ks=4,vs=8,imm=0) R10=fp0
  8: (b7) r2 = 0
  9: (18) r8 = 0xffff90bcb500c000
  11: (18) r1 = 0xffff90bcb500c000
  13: (0f) r1 += r2
  R1 pointer arithmetic on map_ptr prohibited

To fix the issue, let us permit map_ptr + 0 arithmetic which will
result in exactly the same map_ptr.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/verifier.c | 4 ++++
 1 file changed, 4 insertions(+)
one of subtests, which will fail the test without previous
verifier change. Also added to verifier test for both
"map_ptr += scalar" and "scalar += map_ptr" arithmetic.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 .../selftests/bpf/progs/map_ptr_kern.c        | 10 +++++-
 .../testing/selftests/bpf/verifier/map_ptr.c  | 32 +++++++++++++++++++
 2 files changed, 41 insertions(+), 1 deletion(-)
@kernel-patches-bot
Copy link
Author

At least one diff in series https://patchwork.ozlabs.org/project/netdev/list/?series=200277 irrelevant now. Closing PR.

@kernel-patches-bot kernel-patches-bot deleted the series/199627 branch September 15, 2020 17:49
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
The test_generic_metric() missed to release entries in the pctx.  Asan
reported following leak (and more):

  Direct leak of 128 byte(s) in 1 object(s) allocated from:
    #0 0x7f4c9396980e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
    #1 0x55f7e748cc14 in hashmap_grow (/home/namhyung/project/linux/tools/perf/perf+0x90cc14)
    #2 0x55f7e748d497 in hashmap__insert (/home/namhyung/project/linux/tools/perf/perf+0x90d497)
    #3 0x55f7e7341667 in hashmap__set /home/namhyung/project/linux/tools/perf/util/hashmap.h:111
    #4 0x55f7e7341667 in expr__add_ref util/expr.c:120
    #5 0x55f7e7292436 in prepare_metric util/stat-shadow.c:783
    #6 0x55f7e729556d in test_generic_metric util/stat-shadow.c:858
    #7 0x55f7e712390b in compute_single tests/parse-metric.c:128
    #8 0x55f7e712390b in __compute_metric tests/parse-metric.c:180
    #9 0x55f7e712446d in compute_metric tests/parse-metric.c:196
    #10 0x55f7e712446d in test_dcache_l2 tests/parse-metric.c:295
    #11 0x55f7e712446d in test__parse_metric tests/parse-metric.c:355
    #12 0x55f7e70be09b in run_test tests/builtin-test.c:410
    #13 0x55f7e70be09b in test_and_print tests/builtin-test.c:440
    #14 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661
    #15 0x55f7e70c101a in cmd_test tests/builtin-test.c:807
    #16 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #17 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #18 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #19 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #20 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: 6d432c4 ("perf tools: Add test_generic_metric function")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
kernel-patches-bot pushed a commit that referenced this pull request Nov 20, 2020
This fix is for a failure that occurred in the DWARF unwind perf test.

Stack unwinders may probe memory when looking for frames.

Memory sanitizer will poison and track uninitialized memory on the
stack, and on the heap if the value is copied to the heap.

This can lead to false memory sanitizer failures for the use of an
uninitialized value.

Avoid this problem by removing the poison on the copied stack.

The full msan failure with track origins looks like:

==2168==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8
    #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #23 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22
    #1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13
    #2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #24 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9
    #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #23 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10
    #1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13
    #2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18
    #3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #25 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3
    #1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2
    #2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9
    #3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6
    #4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    #16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    #17 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events'
    #0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445

SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: clang-built-linux@googlegroups.com
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandeep Dasgupta <sdasgup@google.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201113182053.754625-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
kernel-patches-bot pushed a commit that referenced this pull request Mar 10, 2021
Calling btrfs_qgroup_reserve_meta_prealloc from
btrfs_delayed_inode_reserve_metadata can result in flushing delalloc
while holding a transaction and delayed node locks. This is deadlock
prone. In the past multiple commits:

 * ae5e070 ("btrfs: qgroup: don't try to wait flushing if we're
already holding a transaction")

 * 6f23277 ("btrfs: qgroup: don't commit transaction when we already
 hold the handle")

Tried to solve various aspects of this but this was always a
whack-a-mole game. Unfortunately those 2 fixes don't solve a deadlock
scenario involving btrfs_delayed_node::mutex. Namely, one thread
can call btrfs_dirty_inode as a result of reading a file and modifying
its atime:

  PID: 6963   TASK: ffff8c7f3f94c000  CPU: 2   COMMAND: "test"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_timeout at ffffffffa52a1bdd
  #3  wait_for_completion at ffffffffa529eeea             <-- sleeps with delayed node mutex held
  #4  start_delalloc_inodes at ffffffffc0380db5
  #5  btrfs_start_delalloc_snapshot at ffffffffc0393836
  #6  try_flush_qgroup at ffffffffc03f04b2
  #7  __btrfs_qgroup_reserve_meta at ffffffffc03f5bb6     <-- tries to reserve space and starts delalloc inodes.
  #8  btrfs_delayed_update_inode at ffffffffc03e31aa      <-- acquires delayed node mutex
  #9  btrfs_update_inode at ffffffffc0385ba8
 #10  btrfs_dirty_inode at ffffffffc038627b               <-- TRANSACTIION OPENED
 #11  touch_atime at ffffffffa4cf0000
 #12  generic_file_read_iter at ffffffffa4c1f123
 #13  new_sync_read at ffffffffa4ccdc8a
 #14  vfs_read at ffffffffa4cd0849
 #15  ksys_read at ffffffffa4cd0bd1
 #16  do_syscall_64 at ffffffffa4a052eb
 #17  entry_SYSCALL_64_after_hwframe at ffffffffa540008c

This will cause an asynchronous work to flush the delalloc inodes to
happen which can try to acquire the same delayed_node mutex:

  PID: 455    TASK: ffff8c8085fa4000  CPU: 5   COMMAND: "kworker/u16:30"
  #0  __schedule at ffffffffa529e07d
  #1  schedule at ffffffffa529e4ff
  #2  schedule_preempt_disabled at ffffffffa529e80a
  #3  __mutex_lock at ffffffffa529fdcb                    <-- goes to sleep, never wakes up.
  #4  btrfs_delayed_update_inode at ffffffffc03e3143      <-- tries to acquire the mutex
  #5  btrfs_update_inode at ffffffffc0385ba8              <-- this is the same inode that pid 6963 is holding
  #6  cow_file_range_inline.constprop.78 at ffffffffc0386be7
  #7  cow_file_range at ffffffffc03879c1
  #8  btrfs_run_delalloc_range at ffffffffc038894c
  #9  writepage_delalloc at ffffffffc03a3c8f
 #10  __extent_writepage at ffffffffc03a4c01
 #11  extent_write_cache_pages at ffffffffc03a500b
 #12  extent_writepages at ffffffffc03a6de2
 #13  do_writepages at ffffffffa4c277eb
 #14  __filemap_fdatawrite_range at ffffffffa4c1e5bb
 #15  btrfs_run_delalloc_work at ffffffffc0380987         <-- starts running delayed nodes
 #16  normal_work_helper at ffffffffc03b706c
 #17  process_one_work at ffffffffa4aba4e4
 #18  worker_thread at ffffffffa4aba6fd
 #19  kthread at ffffffffa4ac0a3d
 #20  ret_from_fork at ffffffffa54001ff

To fully address those cases the complete fix is to never issue any
flushing while holding the transaction or the delayed node lock. This
patch achieves it by calling qgroup_reserve_meta directly which will
either succeed without flushing or will fail and return -EDQUOT. In the
latter case that return value is going to be propagated to
btrfs_dirty_inode which will fallback to start a new transaction. That's
fine as the majority of time we expect the inode will have
BTRFS_DELAYED_NODE_INODE_DIRTY flag set which will result in directly
copying the in-memory state.

Fixes: c53e965 ("btrfs: qgroup: try to flush qgroup space when we get -EDQUOT")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
kernel-patches-bot pushed a commit that referenced this pull request Mar 10, 2021
The evlist and the cpu/thread maps should be released together.
Otherwise following error was reported by Asan.

Note that this test still has memory leaks in DSOs so it still fails
even after this change.  I'll take a look at that too.

  # perf test -v 26
  26: Object code reading                        :
  --- start ---
  test child forked, pid 154184
  Looking at the vmlinux_path (8 entries long)
  symsrc__init: build id mismatch for vmlinux.
  symsrc__init: cannot get elf header.
  Using /proc/kcore for kernel data
  Using /proc/kallsyms for symbols
  Parsing event 'cycles'
  mmap size 528384B
  ...
  =================================================================
  ==154184==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 439 byte(s) in 1 object(s) allocated from:
    #0 0x7fcb66e77037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x55ad9b7e821e in dso__new_id util/dso.c:1256
    #2 0x55ad9b8cfd4a in __machine__addnew_vdso util/vdso.c:132
    #3 0x55ad9b8cfd4a in machine__findnew_vdso util/vdso.c:347
    #4 0x55ad9b845b7e in map__new util/map.c:176
    #5 0x55ad9b8415a2 in machine__process_mmap2_event util/machine.c:1787
    #6 0x55ad9b8fab16 in perf_tool__process_synth_event util/synthetic-events.c:64
    #7 0x55ad9b8fab16 in perf_event__synthesize_mmap_events util/synthetic-events.c:499
    #8 0x55ad9b8fbfdf in __event__synthesize_thread util/synthetic-events.c:741
    #9 0x55ad9b8ff3e3 in perf_event__synthesize_thread_map util/synthetic-events.c:833
    #10 0x55ad9b738585 in do_test_code_reading tests/code-reading.c:608
    #11 0x55ad9b73b25d in test__code_reading tests/code-reading.c:722
    #12 0x55ad9b6f28fb in run_test tests/builtin-test.c:428
    #13 0x55ad9b6f28fb in test_and_print tests/builtin-test.c:458
    #14 0x55ad9b6f4a53 in __cmd_test tests/builtin-test.c:679
    #15 0x55ad9b6f4a53 in cmd_test tests/builtin-test.c:825
    #16 0x55ad9b760cc4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #17 0x55ad9b5eaa88 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #18 0x55ad9b5eaa88 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #19 0x55ad9b5eaa88 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #20 0x7fcb669acd09 in __libc_start_main ../csu/libc-start.c:308

    ...
  SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s).
  test child finished with 1
  ---- end ----
  Object code reading: FAILED!

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210301140409.184570-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
kernel-patches-bot pushed a commit that referenced this pull request Mar 26, 2021
Pablo Neira Ayuso says:

====================
netfilter: flowtable enhancements

[ This is v2 that includes documentation enhancements, including
  existing limitations. This is a rebase on top on net-next. ]

The following patchset augments the Netfilter flowtable fastpath to
support for network topologies that combine IP forwarding, bridge,
classic VLAN devices, bridge VLAN filtering, DSA and PPPoE. This
includes support for the flowtable software and hardware datapaths.

The following pictures provides an example scenario:

                        fast path!
                .------------------------.
               /                          \
               |           IP forwarding  |
               |          /             \ \/
               |       br0               wan ..... eth0
               .       / \                         host C
               -> veth1  veth2
                   .           switch/router
                   .
                   .
                 eth0
                host A

The bridge master device 'br0' has an IP address and a DHCP server is
also assumed to be running to provide connectivity to host A which
reaches the Internet through 'br0' as default gateway. Then, packet
enters the IP forwarding path and Netfilter is used to NAT the packets
before they leave through the wan device.

The general idea is to accelerate forwarding by building a fast path
that takes packets from the ingress path of the bridge port and place
them in the egress path of the wan device (and vice versa). Hence,
skipping the classic bridge and IP stack paths.

** Patch from #1 to #6 add the infrastructure which describes the list of
   netdevice hops to reach a given destination MAC address in the local
   network topology.

Patch #1 adds dev_fill_forward_path() and .ndo_fill_forward_path() to
         netdev_ops.

Patch #2 adds .ndo_fill_forward_path for vlan devices, which provides
         the next device hop via vlan->real_dev, the vlan ID and the
         protocol.

Patch #3 adds .ndo_fill_forward_path for bridge devices, which allows to make
         lookups to the FDB to locate the next device hop (bridge port) in the
         forwarding path.

Patch #4 extends bridge .ndo_fill_forward_path to support for bridge VLAN
         filtering.

Patch #5 adds .ndo_fill_forward_path for PPPoE devices.

Patch #6 adds .ndo_fill_forward_path for DSA.

Patches from #7 to #14 update the flowtable software datapath:

Patch #7 adds the transmit path type field to the flow tuple. Two transmit
         paths are supported so far: the neighbour and the xfrm transmit
         paths.

Patch #8 and #9 update the flowtable datapath to use dev_fill_forward_path()
         to obtain the real ingress/egress device for the flowtable datapath.
         This adds the new ethernet xmit direct path to the flowtable.

Patch #10 adds native flowtable VLAN support (up to 2 VLAN tags) through
          dev_fill_forward_path(). The flowtable stores the VLAN id and
          protocol in the flow tuple.

Patch #11 adds native flowtable bridge VLAN filter support through
          dev_fill_forward_path().

Patch #12 adds native flowtable bridge PPPoE through dev_fill_forward_path().

Patch #13 adds DSA support through dev_fill_forward_path().

Patch #14 extends flowtable selftests to cover for flowtable software
          datapath enhancements.

** Patches from #15 to #20 update the flowtable hardware offload datapath:

Patch #15 extends the flowtable hardware offload to support for the
          direct ethernet xmit path. This also includes VLAN support.

Patch #16 stores the egress real device in the flow tuple. The software
          flowtable datapath uses dev_hard_header() to transmit packets,
          hence it might refer to VLAN/DSA/PPPoE software device, not
          the real ethernet device.

Patch #17 deals with switchdev PVID hardware offload to skip it on
          egress.

Patch #18 adds FLOW_ACTION_PPPOE_PUSH to the flow_offload action API.

Patch #19 extends the flowtable hardware offload to support for PPPoE

Patch #20 adds TC_SETUP_FT support for DSA.

** Patches from #20 to #23: Felix Fietkau adds a new driver which support
   hardware offload for the mtk PPE engine through the existing flow
   offload API which supports for the flowtable enhancements coming in
   this batch.

Patch #24 extends the documentation and describe existing limitations.

Please, apply, thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
kernel-patches-bot pushed a commit that referenced this pull request Apr 16, 2021
I got several memory leak reports from Asan with a simple command.  It
was because VDSO is not released due to the refcount.  Like in
__dsos_addnew_id(), it should put the refcount after adding to the list.

  $ perf record true
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ]

  =================================================================
  ==692599==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 439 byte(s) in 1 object(s) allocated from:
    #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559bce4aa8ee in dso__new_id util/dso.c:1256
    #2 0x559bce59245a in __machine__addnew_vdso util/vdso.c:132
    #3 0x559bce59245a in machine__findnew_vdso util/vdso.c:347
    #4 0x559bce50826c in map__new util/map.c:175
    #5 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
    #6 0x559bce512f6b in machines__deliver_event util/session.c:1481
    #7 0x559bce515107 in perf_session__deliver_event util/session.c:1551
    #8 0x559bce51d4d2 in do_flush util/ordered-events.c:244
    #9 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
    #10 0x559bce519bea in __perf_session__process_events util/session.c:2268
    #11 0x559bce519bea in perf_session__process_events util/session.c:2297
    #12 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
    #13 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
    #14 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
    #15 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
    #16 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #17 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #18 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #19 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #20 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308

  Indirect leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559bce520907 in nsinfo__copy util/namespaces.c:169
    #2 0x559bce50821b in map__new util/map.c:168
    #3 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
    #4 0x559bce512f6b in machines__deliver_event util/session.c:1481
    #5 0x559bce515107 in perf_session__deliver_event util/session.c:1551
    #6 0x559bce51d4d2 in do_flush util/ordered-events.c:244
    #7 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
    #8 0x559bce519bea in __perf_session__process_events util/session.c:2268
    #9 0x559bce519bea in perf_session__process_events util/session.c:2297
    #10 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
    #11 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
    #12 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
    #13 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
    #14 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #15 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #16 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #17 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #18 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s).

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210315045641.700430-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
kernel-patches-bot pushed a commit that referenced this pull request Jun 28, 2021
Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
stops working:
[  509.691509] Bluetooth: hci0: HCI reset during shutdown failed
[  514.897584] Bluetooth: hci0: MSFT filter_enable is already on
[  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
[  545.660350] usb 3-10: device descriptor read/64, error -110
[  561.283530] usb 3-10: device descriptor read/64, error -110
[  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
[  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
[  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
[  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
[  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
[  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)

Or kernel panics because other workqueues already freed skb:
[ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 2048.663775] #PF: supervisor read access in kernel mode
[ 2048.663779] #PF: error_code(0x0000) - not-present page
[ 2048.663782] PGD 0 P4D 0
[ 2048.663787] Oops: 0000 [#1] SMP NOPTI
[ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20
[ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
[ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
[ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
07 83 f8 01
[ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
[ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
[ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
[ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
[ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
[ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
[ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
[ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
[ 2048.663856] PKRU: 55555554
[ 2048.663859] Call Trace:
[ 2048.663865]  ? skb_release_head_state+0x5e/0x80
[ 2048.663873]  kfree_skb+0x2f/0xb0
[ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]
[ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]
[ 2048.663954]  ? __cond_resched+0x1a/0x50
[ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]
[ 2048.664007]  rfkill_set_block+0x98/0x170
[ 2048.664016]  rfkill_fop_write+0x136/0x1e0
[ 2048.664022]  vfs_write+0xc7/0x260
[ 2048.664030]  ksys_write+0xb1/0xe0
[ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0
[ 2048.664042]  __x64_sys_write+0x1a/0x20
[ 2048.664048]  do_syscall_64+0x40/0xb0
[ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 2048.664060] RIP: 0033:0x7fe02ac23c27
[ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
[ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
[ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
[ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
[ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff

So move the shutdown callback to a place where workqueues are either
flushed or cancelled to resolve the issue.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
kernel-patches-bot pushed a commit that referenced this pull request Jul 29, 2021
When the kernel is built with CONFIG_KASAN_HW_TAGS and the CPU supports
MTE, memory accesses are checked at 16-byte granularity, and
out-of-bounds accesses can result in tag check faults. Our current
implementation of strlen() makes unaligned 16-byte accesses (within a
naturally aligned 4096-byte window), and can trigger tag check faults.

This can be seen at boot time, e.g.

| BUG: KASAN: invalid-access in __pi_strlen+0x14/0x150
| Read at addr f4ff0000c0028300 by task swapper/0/0
| Pointer tag: [f4], memory tag: [fe]
|
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.13.0-09550-g03c2813535a2-dirty #20
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace+0x0/0x1b0
|  show_stack+0x1c/0x30
|  dump_stack_lvl+0x68/0x84
|  print_address_description+0x7c/0x2b4
|  kasan_report+0x138/0x38c
|  __do_kernel_fault+0x190/0x1c4
|  do_tag_check_fault+0x78/0x90
|  do_mem_abort+0x44/0xb4
|  el1_abort+0x40/0x60
|  el1h_64_sync_handler+0xb0/0xd0
|  el1h_64_sync+0x78/0x7c
|  __pi_strlen+0x14/0x150
|  __register_sysctl_table+0x7c4/0x890
|  register_leaf_sysctl_tables+0x1a4/0x210
|  register_leaf_sysctl_tables+0xc8/0x210
|  __register_sysctl_paths+0x22c/0x290
|  register_sysctl_table+0x2c/0x40
|  sysctl_init+0x20/0x30
|  proc_sys_init+0x3c/0x48
|  proc_root_init+0x80/0x9c
|  start_kernel+0x640/0x69c
|  __primary_switched+0xc0/0xc8

To fix this, we can reduce the (strlen-internal) MIN_PAGE_SIZE to 16
bytes when CONFIG_KASAN_HW_TAGS is selected. This will cause strlen() to
align the base pointer downwards to a 16-byte boundary, and to discard
the additional prefix bytes without counting them. All subsequent
accesses will be 16-byte aligned 16-byte LDPs. While the comments say
the body of the loop will access 32 bytes, this is performed as two
16-byte acceses, with the second made only if the first did not
encounter a NUL byte, so the body of the loop will not over-read across
a 16-byte boundary.

No other string routines are affected. The other str*() routines will
not make any access which straddles a 16-byte boundary, and the mem*()
routines will only make acceses which straddle a 16-byte boundary when
which is entirely within the bounds of the relevant base and size
arguments.

Fixes: 325a1de ("arm64: Import updated version of Cortex Strings' strlen")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20210712090043.20847-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
kernel-patches-bot pushed a commit that referenced this pull request Sep 9, 2021
This patch adds "-j" mode to test_progs, executing tests in multiple process.
"-j" mode is optional, and works with all existing test selection mechanism, as
well as "-v", "-l" etc.

In "-j" mode, main process use UDS/DGRAM to communicate to each forked worker,
commanding it to run tests and collect logs. After all tests are finished, a
summary is printed. main process use multiple competing threads to dispatch
work to worker, trying to keep them all busy.

Example output:

  > ./test_progs -n 15-20 -j
  [    8.584709] bpf_testmod: loading out-of-tree module taints kernel.
  Launching 2 workers.
  [0]: Running test 15.
  [1]: Running test 16.
  [1]: Running test 17.
  [1]: Running test 18.
  [1]: Running test 19.
  [1]: Running test 20.
  [1]: worker exit.
  [0]: worker exit.
  #15 btf_dump:OK
  #16 btf_endian:OK
  #17 btf_map_in_map:OK
  #18 btf_module:OK
  #19 btf_skc_cls_ingress:OK
  #20 btf_split:OK
  Summary: 6/20 PASSED, 0 SKIPPED, 0 FAILED

Know issue:

Some tests fail when running concurrently, later patch will either
fix the test or pin them to worker 0.

Signed-off-by: Yucong Sun <sunyucong@gmail.com>

V3 -> V2: fix missing outputs in commit messages.
V2 -> V1: switch to UDS client/server model.
kernel-patches-bot pushed a commit that referenced this pull request Sep 9, 2021
This patch adds "-j" mode to test_progs, executing tests in multiple process.
"-j" mode is optional, and works with all existing test selection mechanism, as
well as "-v", "-l" etc.

In "-j" mode, main process use UDS/DGRAM to communicate to each forked worker,
commanding it to run tests and collect logs. After all tests are finished, a
summary is printed. main process use multiple competing threads to dispatch
work to worker, trying to keep them all busy.

Example output:

  > ./test_progs -n 15-20 -j
  [    8.584709] bpf_testmod: loading out-of-tree module taints kernel.
  Launching 2 workers.
  [0]: Running test 15.
  [1]: Running test 16.
  [1]: Running test 17.
  [1]: Running test 18.
  [1]: Running test 19.
  [1]: Running test 20.
  [1]: worker exit.
  [0]: worker exit.
  #15 btf_dump:OK
  #16 btf_endian:OK
  #17 btf_map_in_map:OK
  #18 btf_module:OK
  #19 btf_skc_cls_ingress:OK
  #20 btf_split:OK
  Summary: 6/20 PASSED, 0 SKIPPED, 0 FAILED

Know issue:

Some tests fail when running concurrently, later patch will either
fix the test or pin them to worker 0.

Signed-off-by: Yucong Sun <sunyucong@gmail.com>

V3 -> V2: fix missing outputs in commit messages.
V2 -> V1: switch to UDS client/server model.
kernel-patches-bot pushed a commit that referenced this pull request Sep 10, 2021
This patch adds "-j" mode to test_progs, executing tests in multiple process.
"-j" mode is optional, and works with all existing test selection mechanism, as
well as "-v", "-l" etc.

In "-j" mode, main process use UDS/DGRAM to communicate to each forked worker,
commanding it to run tests and collect logs. After all tests are finished, a
summary is printed. main process use multiple competing threads to dispatch
work to worker, trying to keep them all busy.

Example output:

  > ./test_progs -n 15-20 -j
  [    8.584709] bpf_testmod: loading out-of-tree module taints kernel.
  Launching 2 workers.
  [0]: Running test 15.
  [1]: Running test 16.
  [1]: Running test 17.
  [1]: Running test 18.
  [1]: Running test 19.
  [1]: Running test 20.
  [1]: worker exit.
  [0]: worker exit.
  #15 btf_dump:OK
  #16 btf_endian:OK
  #17 btf_map_in_map:OK
  #18 btf_module:OK
  #19 btf_skc_cls_ingress:OK
  #20 btf_split:OK
  Summary: 6/20 PASSED, 0 SKIPPED, 0 FAILED

Know issue:

Some tests fail when running concurrently, later patch will either
fix the test or pin them to worker 0.

Signed-off-by: Yucong Sun <sunyucong@gmail.com>

V3 -> V2: fix missing outputs in commit messages.
V2 -> V1: switch to UDS client/server model.
kernel-patches-bot pushed a commit that referenced this pull request Sep 13, 2021
This patch adds "-j" mode to test_progs, executing tests in multiple process.
"-j" mode is optional, and works with all existing test selection mechanism, as
well as "-v", "-l" etc.

In "-j" mode, main process use UDS/DGRAM to communicate to each forked worker,
commanding it to run tests and collect logs. After all tests are finished, a
summary is printed. main process use multiple competing threads to dispatch
work to worker, trying to keep them all busy.

Example output:

  > ./test_progs -n 15-20 -j
  [    8.584709] bpf_testmod: loading out-of-tree module taints kernel.
  Launching 2 workers.
  [0]: Running test 15.
  [1]: Running test 16.
  [1]: Running test 17.
  [1]: Running test 18.
  [1]: Running test 19.
  [1]: Running test 20.
  [1]: worker exit.
  [0]: worker exit.
  #15 btf_dump:OK
  #16 btf_endian:OK
  #17 btf_map_in_map:OK
  #18 btf_module:OK
  #19 btf_skc_cls_ingress:OK
  #20 btf_split:OK
  Summary: 6/20 PASSED, 0 SKIPPED, 0 FAILED

Know issue:

Some tests fail when running concurrently, later patch will either
fix the test or pin them to worker 0.

Signed-off-by: Yucong Sun <sunyucong@gmail.com>

V4 -> V3: address style warnings.
V3 -> V2: fix missing outputs in commit messages.
V2 -> V1: switch to UDS client/server model.
kernel-patches-bot pushed a commit that referenced this pull request Sep 13, 2021
This patch adds "-j" mode to test_progs, executing tests in multiple process.
"-j" mode is optional, and works with all existing test selection mechanism, as
well as "-v", "-l" etc.

In "-j" mode, main process use UDS/DGRAM to communicate to each forked worker,
commanding it to run tests and collect logs. After all tests are finished, a
summary is printed. main process use multiple competing threads to dispatch
work to worker, trying to keep them all busy.

Example output:

  > ./test_progs -n 15-20 -j
  [    8.584709] bpf_testmod: loading out-of-tree module taints kernel.
  Launching 2 workers.
  [0]: Running test 15.
  [1]: Running test 16.
  [1]: Running test 17.
  [1]: Running test 18.
  [1]: Running test 19.
  [1]: Running test 20.
  [1]: worker exit.
  [0]: worker exit.
  #15 btf_dump:OK
  #16 btf_endian:OK
  #17 btf_map_in_map:OK
  #18 btf_module:OK
  #19 btf_skc_cls_ingress:OK
  #20 btf_split:OK
  Summary: 6/20 PASSED, 0 SKIPPED, 0 FAILED

Know issue:

Some tests fail when running concurrently, later patch will either
fix the test or pin them to worker 0.

Signed-off-by: Yucong Sun <sunyucong@gmail.com>

V4 -> V3: address style warnings.
V3 -> V2: fix missing outputs in commit messages.
V2 -> V1: switch to UDS client/server model.
kernel-patches-bot pushed a commit that referenced this pull request Sep 13, 2021
This patch adds "-j" mode to test_progs, executing tests in multiple process.
"-j" mode is optional, and works with all existing test selection mechanism, as
well as "-v", "-l" etc.

In "-j" mode, main process use UDS/DGRAM to communicate to each forked worker,
commanding it to run tests and collect logs. After all tests are finished, a
summary is printed. main process use multiple competing threads to dispatch
work to worker, trying to keep them all busy.

Example output:

  > ./test_progs -n 15-20 -j
  [    8.584709] bpf_testmod: loading out-of-tree module taints kernel.
  Launching 2 workers.
  [0]: Running test 15.
  [1]: Running test 16.
  [1]: Running test 17.
  [1]: Running test 18.
  [1]: Running test 19.
  [1]: Running test 20.
  [1]: worker exit.
  [0]: worker exit.
  #15 btf_dump:OK
  #16 btf_endian:OK
  #17 btf_map_in_map:OK
  #18 btf_module:OK
  #19 btf_skc_cls_ingress:OK
  #20 btf_split:OK
  Summary: 6/20 PASSED, 0 SKIPPED, 0 FAILED

Know issue:

Some tests fail when running concurrently, later patch will either
fix the test or pin them to worker 0.

Signed-off-by: Yucong Sun <sunyucong@gmail.com>

V4 -> V3: address style warnings.
V3 -> V2: fix missing outputs in commit messages.
V2 -> V1: switch to UDS client/server model.
kernel-patches-bot pushed a commit that referenced this pull request Sep 14, 2021
This patch adds "-j" mode to test_progs, executing tests in multiple process.
"-j" mode is optional, and works with all existing test selection mechanism, as
well as "-v", "-l" etc.

In "-j" mode, main process use UDS/DGRAM to communicate to each forked worker,
commanding it to run tests and collect logs. After all tests are finished, a
summary is printed. main process use multiple competing threads to dispatch
work to worker, trying to keep them all busy.

Example output:

  > ./test_progs -n 15-20 -j
  [    8.584709] bpf_testmod: loading out-of-tree module taints kernel.
  Launching 2 workers.
  [0]: Running test 15.
  [1]: Running test 16.
  [1]: Running test 17.
  [1]: Running test 18.
  [1]: Running test 19.
  [1]: Running test 20.
  [1]: worker exit.
  [0]: worker exit.
  #15 btf_dump:OK
  #16 btf_endian:OK
  #17 btf_map_in_map:OK
  #18 btf_module:OK
  #19 btf_skc_cls_ingress:OK
  #20 btf_split:OK
  Summary: 6/20 PASSED, 0 SKIPPED, 0 FAILED

Know issue:

Some tests fail when running concurrently, later patch will either
fix the test or pin them to worker 0.

Signed-off-by: Yucong Sun <sunyucong@gmail.com>

V4 -> V3: address style warnings.
V3 -> V2: fix missing outputs in commit messages.
V2 -> V1: switch to UDS client/server model.
kernel-patches-bot pushed a commit that referenced this pull request Sep 16, 2021
This patch adds "-j" mode to test_progs, executing tests in multiple process.
"-j" mode is optional, and works with all existing test selection mechanism, as
well as "-v", "-l" etc.

In "-j" mode, main process use UDS to communicate to each forked worker,
commanding it to run tests and collect logs. After all tests are
finished, a summary is printed. main process use multiple competing
threads to dispatch work to worker, trying to keep them all busy.

The test status will be printed as soon as it is finished, if there are error
logs, it will be printed after the final summary line.

By specifying "--debug", additional debug information on server/worker
communication will be printed.

Example output:
  > ./test_progs -n 15-20 -j
  [   12.801730] bpf_testmod: loading out-of-tree module taints kernel.
  Launching 8 workers.
  #20 btf_split:OK
  #16 btf_endian:OK
  #18 btf_module:OK
  #17 btf_map_in_map:OK
  #19 btf_skc_cls_ingress:OK
  #15 btf_dump:OK
  Summary: 6/20 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Yucong Sun <sunyucong@gmail.com>
kernel-patches-bot pushed a commit that referenced this pull request Sep 17, 2021
This patch adds "-j" mode to test_progs, executing tests in multiple process.
"-j" mode is optional, and works with all existing test selection mechanism, as
well as "-v", "-l" etc.

In "-j" mode, main process use UDS to communicate to each forked worker,
commanding it to run tests and collect logs. After all tests are
finished, a summary is printed. main process use multiple competing
threads to dispatch work to worker, trying to keep them all busy.

The test status will be printed as soon as it is finished, if there are error
logs, it will be printed after the final summary line.

By specifying "--debug", additional debug information on server/worker
communication will be printed.

Example output:
  > ./test_progs -n 15-20 -j
  [   12.801730] bpf_testmod: loading out-of-tree module taints kernel.
  Launching 8 workers.
  #20 btf_split:OK
  #16 btf_endian:OK
  #18 btf_module:OK
  #17 btf_map_in_map:OK
  #19 btf_skc_cls_ingress:OK
  #15 btf_dump:OK
  Summary: 6/20 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Yucong Sun <sunyucong@gmail.com>
kernel-patches-bot pushed a commit that referenced this pull request Sep 17, 2021
This patch adds "-j" mode to test_progs, executing tests in multiple process.
"-j" mode is optional, and works with all existing test selection mechanism, as
well as "-v", "-l" etc.

In "-j" mode, main process use UDS to communicate to each forked worker,
commanding it to run tests and collect logs. After all tests are
finished, a summary is printed. main process use multiple competing
threads to dispatch work to worker, trying to keep them all busy.

The test status will be printed as soon as it is finished, if there are error
logs, it will be printed after the final summary line.

By specifying "--debug", additional debug information on server/worker
communication will be printed.

Example output:
  > ./test_progs -n 15-20 -j
  [   12.801730] bpf_testmod: loading out-of-tree module taints kernel.
  Launching 8 workers.
  #20 btf_split:OK
  #16 btf_endian:OK
  #18 btf_module:OK
  #17 btf_map_in_map:OK
  #19 btf_skc_cls_ingress:OK
  #15 btf_dump:OK
  Summary: 6/20 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Yucong Sun <sunyucong@gmail.com>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Sep 21, 2023
The following processes run into a deadlock. CPU 41 was waiting for CPU 29
to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29
was hung by that spinlock with IRQs disabled.

  PID: 17360    TASK: ffff95c1090c5c40  CPU: 41  COMMAND: "mrdiagd"
  !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0
  !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0
  !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0
   # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0
   # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0
   # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0
   # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0
   # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0
   # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0
   # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0
   #10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0
   #11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0
   #12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0
   #13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0
   #14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0
   #15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0
   #16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0
   #17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0
   #18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0
   #19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0
   #20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0
   #21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0
   #22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0
   #23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0
   #24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0
   #25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0
   #26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0
   #27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0

  PID: 17355    TASK: ffff95c1090c3d80  CPU: 29  COMMAND: "mrdiagd"
  !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0
  !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0
   # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0
   # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0
   # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0
   # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0
   # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0
   # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0
   # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0
   # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0
   #10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0
   #11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0
   #12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0
   #13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0
   #14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0
   #15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0
   #16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0
   #17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0

The lock is used to synchronize different sysfs operations, it doesn't
protect any resource that will be touched by an interrupt. Consequently
it's not required to disable IRQs. Replace the spinlock with a mutex to fix
the deadlock.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Link: https://lore.kernel.org/r/20230828221018.19471-1-junxiao.bi@oracle.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Oct 6, 2023
Inject fault while probing mdpy.ko, if kstrdup() of create_dir() fails in
kobject_add_internal() in kobject_init_and_add() in mdev_type_add()
in parent_create_sysfs_files(), it will return 0 and probe successfully.
And when rmmod mdpy.ko, the mdpy_dev_exit() will call
mdev_unregister_parent(), the mdev_type_remove() may traverse uninitialized
parent->types[i] in parent_remove_sysfs_files(), and it will cause
below null-ptr-deref.

If mdev_type_add() fails, return the error code and kset_unregister()
to fix the issue.

 general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
 KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
 CPU: 2 PID: 10215 Comm: rmmod Tainted: G        W        N 6.6.0-rc2+ #20
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
 RIP: 0010:__kobject_del+0x62/0x1c0
 Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 51 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6b 28 48 8d 7d 10 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 24 01 00 00 48 8b 75 10 48 89 df 48 8d 6b 3c e8
 RSP: 0018:ffff88810695fd30 EFLAGS: 00010202
 RAX: dffffc0000000000 RBX: ffffffffa0270268 RCX: 0000000000000000
 RDX: 0000000000000002 RSI: 0000000000000004 RDI: 0000000000000010
 RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed10233a4ef1
 R10: ffff888119d2778b R11: 0000000063666572 R12: 0000000000000000
 R13: fffffbfff404e2d4 R14: dffffc0000000000 R15: ffffffffa0271660
 FS:  00007fbc81981540(0000) GS:ffff888119d00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fc14a142dc0 CR3: 0000000110a62003 CR4: 0000000000770ee0
 DR0: ffffffff8fb0bce8 DR1: ffffffff8fb0bce9 DR2: ffffffff8fb0bcea
 DR3: ffffffff8fb0bceb DR6: 00000000fffe0ff0 DR7: 0000000000000600
 PKRU: 55555554
 Call Trace:
  <TASK>
  ? die_addr+0x3d/0xa0
  ? exc_general_protection+0x144/0x220
  ? asm_exc_general_protection+0x22/0x30
  ? __kobject_del+0x62/0x1c0
  kobject_del+0x32/0x50
  parent_remove_sysfs_files+0xd6/0x170 [mdev]
  mdev_unregister_parent+0xfb/0x190 [mdev]
  ? mdev_register_parent+0x270/0x270 [mdev]
  ? find_module_all+0x9d/0xe0
  mdpy_dev_exit+0x17/0x63 [mdpy]
  __do_sys_delete_module.constprop.0+0x2fa/0x4b0
  ? module_flags+0x300/0x300
  ? __fput+0x4e7/0xa00
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0
 RIP: 0033:0x7fbc813221b7
 Code: 73 01 c3 48 8b 0d d1 8c 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 8c 2c 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffe780e0648 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
 RAX: ffffffffffffffda RBX: 00007ffe780e06a8 RCX: 00007fbc813221b7
 RDX: 000000000000000a RSI: 0000000000000800 RDI: 000055e214df9b58
 RBP: 000055e214df9af0 R08: 00007ffe780df5c1 R09: 0000000000000000
 R10: 00007fbc8139ecc0 R11: 0000000000000206 R12: 00007ffe780e0870
 R13: 00007ffe780e0ed0 R14: 000055e214df9260 R15: 000055e214df9af0
  </TASK>
 Modules linked in: mdpy(-) mdev vfio_iommu_type1 vfio [last unloaded: mdpy]
 Dumping ftrace buffer:
    (ftrace buffer empty)
 ---[ end trace 0000000000000000 ]---
 RIP: 0010:__kobject_del+0x62/0x1c0
 Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 51 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 6b 28 48 8d 7d 10 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 24 01 00 00 48 8b 75 10 48 89 df 48 8d 6b 3c e8
 RSP: 0018:ffff88810695fd30 EFLAGS: 00010202
 RAX: dffffc0000000000 RBX: ffffffffa0270268 RCX: 0000000000000000
 RDX: 0000000000000002 RSI: 0000000000000004 RDI: 0000000000000010
 RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed10233a4ef1
 R10: ffff888119d2778b R11: 0000000063666572 R12: 0000000000000000
 R13: fffffbfff404e2d4 R14: dffffc0000000000 R15: ffffffffa0271660
 FS:  00007fbc81981540(0000) GS:ffff888119d00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fc14a142dc0 CR3: 0000000110a62003 CR4: 0000000000770ee0
 DR0: ffffffff8fb0bce8 DR1: ffffffff8fb0bce9 DR2: ffffffff8fb0bcea
 DR3: ffffffff8fb0bceb DR6: 00000000fffe0ff0 DR7: 0000000000000600
 PKRU: 55555554
 Kernel panic - not syncing: Fatal exception
 Dumping ftrace buffer:
    (ftrace buffer empty)
 Kernel Offset: disabled
 Rebooting in 1 seconds..

Fixes: da44c34 ("vfio/mdev: simplify mdev_type handling")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20230918115551.1423193-1-ruanjinjie@huawei.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Oct 6, 2023
The following call trace shows a deadlock issue due to recursive locking of
mutex "device_mutex". First lock acquire is in target_for_each_device() and
second in target_free_device().

 PID: 148266   TASK: ffff8be21ffb5d00  CPU: 10   COMMAND: "iscsi_ttx"
  #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f
  #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224
  #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee
  #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7
  #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3
  #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c
  #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod]
  #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod]
  #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f
  #9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583
 #10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod]
 #11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc
 #12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod]
 #13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod]
 #14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod]
 #15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod]
 #16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07
 #17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod]
 #18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod]
 #19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080
 #20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364

Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Link: https://lore.kernel.org/r/20230918225848.66463-1-junxiao.bi@oracle.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Dec 19, 2023
When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a
cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when
removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be
dereferenced as wrong struct in irdma_free_pending_cqp_request().

  PID: 3669   TASK: ffff88aef892c000  CPU: 28  COMMAND: "kworker/28:0"
   #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34
   #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2
   #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f
   #3 [fffffe0000549eb8] do_nmi at ffffffff81079582
   #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4
      [exception RIP: native_queued_spin_lock_slowpath+1291]
      RIP: ffffffff8127e72b  RSP: ffff88aa841ef778  RFLAGS: 00000046
      RAX: 0000000000000000  RBX: ffff88b01f849700  RCX: ffffffff8127e47e
      RDX: 0000000000000000  RSI: 0000000000000004  RDI: ffffffff83857ec0
      RBP: ffff88afe3e4efc8   R8: ffffed15fc7c9dfa   R9: ffffed15fc7c9dfa
      R10: 0000000000000001  R11: ffffed15fc7c9df9  R12: 0000000000740000
      R13: ffff88b01f849708  R14: 0000000000000003  R15: ffffed1603f092e1
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  -- <NMI exception stack> --
   #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b
   #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4
   #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363
   #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma]
   #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma]
   #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma]
   #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma]
   #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb
   #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6
   #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278
   #15 [ffff88aa841efb88] device_del at ffffffff82179d23
   #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice]
   #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice]
   #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a
   #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff
   #20 [ffff88aa841eff10] kthread at ffffffff811d87a0
   #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f

Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions")
Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn
Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com>
Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Mar 8, 2024
Commit a5a9230 (fbdev: fbcon: Properly revert changes when
vc_resize() failed) started restoring old font data upon failure (of
vc_resize()). But it performs so only for user fonts. It means that the
"system"/internal fonts are not restored at all. So in result, the very
first call to fbcon_do_set_font() performs no restore at all upon
failing vc_resize().

This can be reproduced by Syzkaller to crash the system on the next
invocation of font_get(). It's rather hard to hit the allocation failure
in vc_resize() on the first font_set(), but not impossible. Esp. if
fault injection is used to aid the execution/failure. It was
demonstrated by Sirius:
  BUG: unable to handle page fault for address: fffffffffffffff8
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD cb7b067 P4D cb7b067 PUD cb7d067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP KASAN
  CPU: 1 PID: 8007 Comm: poc Not tainted 6.7.0-g9d1694dc91ce #20
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  RIP: 0010:fbcon_get_font+0x229/0x800 drivers/video/fbdev/core/fbcon.c:2286
  Call Trace:
   <TASK>
   con_font_get drivers/tty/vt/vt.c:4558 [inline]
   con_font_op+0x1fc/0xf20 drivers/tty/vt/vt.c:4673
   vt_k_ioctl drivers/tty/vt/vt_ioctl.c:474 [inline]
   vt_ioctl+0x632/0x2ec0 drivers/tty/vt/vt_ioctl.c:752
   tty_ioctl+0x6f8/0x1570 drivers/tty/tty_io.c:2803
   vfs_ioctl fs/ioctl.c:51 [inline]
  ...

So restore the font data in any case, not only for user fonts. Note the
later 'if' is now protected by 'old_userfont' and not 'old_data' as the
latter is always set now. (And it is supposed to be non-NULL. Otherwise
we would see the bug above again.)

Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Fixes: a5a9230 ("fbdev: fbcon: Properly revert changes when vc_resize() failed")
Reported-and-tested-by: Ubisectech Sirius <bugreport@ubisectech.com>
Cc: Ubisectech Sirius <bugreport@ubisectech.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-fbdev@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20240208114411.14604-1-jirislaby@kernel.org
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request May 1, 2024
Hi Luiz, could you review this patch?

This patch prevents a div-by-zero error and potential int overflow by
adding a range check for MTU in hci_cc_le_read_buffer_size() and
hci_cc_le_read_buffer_size_v2().
Also, hci_connect_le() will refuse to allocate hcon if the MTU is not in
the valid range.

Bug description:

l2cap_le_flowctl_init() can cause both div-by-zero and an integer overflow.

l2cap_le_flowctl_init()
  chan->mps = min_t(u16, chan->imtu, chan->conn->mtu - L2CAP_HDR_SIZE);
  chan->rx_credits = (chan->imtu / chan->mps) + 1;  <- div-by-zero

Here, chan->conn->mtu could be less than or equal to L2CAP_HDR_SIZE (4).
If mtu is 4, it causes div-by-zero. If mtu is less than 4, it causes an
integer overflow.

How mtu could have such low value:

hci_cc_le_read_buffer_size()
  hdev->le_mtu = __le16_to_cpu(rp->le_mtu);

l2cap_conn_add()
  conn->mtu = hcon->hdev->le_mtu;

As shown, mtu is an input from an HCI device. So, any HCI device can
set mtu value to any value, such as lower than 4.
According to the spec v5.4 7.8.2 LE Read Buffer Size command, the value
should be fall in [0x001b, 0xffff].

Thank you,
Sungwoo.

divide error: 0000 [kernel-patches#1] PREEMPT SMP KASAN NOPTI
CPU: 0 PID: 67 Comm: kworker/u5:0 Tainted: G        W          6.9.0-rc5+ kernel-patches#20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: hci0 hci_rx_work
RIP: 0010:l2cap_le_flowctl_init+0x19e/0x3f0 net/bluetooth/l2cap_core.c:547
Code: e8 17 17 0c 00 66 41 89 9f 84 00 00 00 bf 01 00 00 00 41 b8 02 00 00 00 4c 89 fe 4c 89 e2 89 d9 e8 27 17 0c 00 44 89 f0 31 d2 <66> f7 f3 89 c3 ff c3 4d 8d b7 88 00 00 00 4c 89 f0 48 c1 e8 03 42
RSP: 0018:ffff88810bc0f858 EFLAGS: 00010246
RAX: 00000000000002a0 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffff88810bc0f7c0 RDI: ffffc90002dcb66f
RBP: ffff88810bc0f880 R08: aa69db2dda70ff01 R09: 0000ffaaaaaaaaaa
R10: 0084000000ffaaaa R11: 0000000000000000 R12: ffff88810d65a084
R13: dffffc0000000000 R14: 00000000000002a0 R15: ffff88810d65a000
FS:  0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000100 CR3: 0000000103268003 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
 <TASK>
 l2cap_le_connect_req net/bluetooth/l2cap_core.c:4902 [inline]
 l2cap_le_sig_cmd net/bluetooth/l2cap_core.c:5420 [inline]
 l2cap_le_sig_channel net/bluetooth/l2cap_core.c:5486 [inline]
 l2cap_recv_frame+0xe59d/0x11710 net/bluetooth/l2cap_core.c:6809
 l2cap_recv_acldata+0x544/0x10a0 net/bluetooth/l2cap_core.c:7506
 hci_acldata_packet net/bluetooth/hci_core.c:3939 [inline]
 hci_rx_work+0x5e5/0xb20 net/bluetooth/hci_core.c:4176
 process_one_work kernel/workqueue.c:3254 [inline]
 process_scheduled_works+0x90f/0x1530 kernel/workqueue.c:3335
 worker_thread+0x926/0xe70 kernel/workqueue.c:3416
 kthread+0x2e3/0x380 kernel/kthread.c:388
 ret_from_fork+0x5c/0x90 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---

Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request May 1, 2024
Hi Luiz, could you review this patch?

This patch prevents a div-by-zero error and potential int overflow by
adding a range check for MTU in hci_cc_le_read_buffer_size() and
hci_cc_le_read_buffer_size_v2().
Also, hci_connect_le() will refuse to allocate hcon if the MTU is not in
the valid range.

Bug description:

l2cap_le_flowctl_init() can cause both div-by-zero and an integer overflow.

l2cap_le_flowctl_init()
  chan->mps = min_t(u16, chan->imtu, chan->conn->mtu - L2CAP_HDR_SIZE);
  chan->rx_credits = (chan->imtu / chan->mps) + 1;  <- div-by-zero

Here, chan->conn->mtu could be less than or equal to L2CAP_HDR_SIZE (4).
If mtu is 4, it causes div-by-zero. If mtu is less than 4, it causes an
integer overflow.

How mtu could have such low value:

hci_cc_le_read_buffer_size()
  hdev->le_mtu = __le16_to_cpu(rp->le_mtu);

l2cap_conn_add()
  conn->mtu = hcon->hdev->le_mtu;

As shown, mtu is an input from an HCI device. So, any HCI device can
set mtu value to any value, such as lower than 4.
According to the spec v5.4 7.8.2 LE Read Buffer Size command, the value
should be fall in [0x001b, 0xffff].

Thank you,
Sungwoo.

divide error: 0000 [kernel-patches#1] PREEMPT SMP KASAN NOPTI
CPU: 0 PID: 67 Comm: kworker/u5:0 Tainted: G        W          6.9.0-rc5+ kernel-patches#20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: hci0 hci_rx_work
RIP: 0010:l2cap_le_flowctl_init+0x19e/0x3f0 net/bluetooth/l2cap_core.c:547
Code: e8 17 17 0c 00 66 41 89 9f 84 00 00 00 bf 01 00 00 00 41 b8 02 00 00 00 4c 89 fe 4c 89 e2 89 d9 e8 27 17 0c 00 44 89 f0 31 d2 <66> f7 f3 89 c3 ff c3 4d 8d b7 88 00 00 00 4c 89 f0 48 c1 e8 03 42
RSP: 0018:ffff88810bc0f858 EFLAGS: 00010246
RAX: 00000000000002a0 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffff88810bc0f7c0 RDI: ffffc90002dcb66f
RBP: ffff88810bc0f880 R08: aa69db2dda70ff01 R09: 0000ffaaaaaaaaaa
R10: 0084000000ffaaaa R11: 0000000000000000 R12: ffff88810d65a084
R13: dffffc0000000000 R14: 00000000000002a0 R15: ffff88810d65a000
FS:  0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000100 CR3: 0000000103268003 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
 <TASK>
 l2cap_le_connect_req net/bluetooth/l2cap_core.c:4902 [inline]
 l2cap_le_sig_cmd net/bluetooth/l2cap_core.c:5420 [inline]
 l2cap_le_sig_channel net/bluetooth/l2cap_core.c:5486 [inline]
 l2cap_recv_frame+0xe59d/0x11710 net/bluetooth/l2cap_core.c:6809
 l2cap_recv_acldata+0x544/0x10a0 net/bluetooth/l2cap_core.c:7506
 hci_acldata_packet net/bluetooth/hci_core.c:3939 [inline]
 hci_rx_work+0x5e5/0xb20 net/bluetooth/hci_core.c:4176
 process_one_work kernel/workqueue.c:3254 [inline]
 process_scheduled_works+0x90f/0x1530 kernel/workqueue.c:3335
 worker_thread+0x926/0xe70 kernel/workqueue.c:3416
 kthread+0x2e3/0x380 kernel/kthread.c:388
 ret_from_fork+0x5c/0x90 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---

Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request May 14, 2024
l2cap_le_flowctl_init() can cause both div-by-zero and an integer
overflow since hdev->le_mtu may not fall in the valid range.

Move MTU from hci_dev to hci_conn to validate MTU and stop the connection
process earlier if MTU is invalid.
Also, add a missing validation in read_buffer_size() and make it return
an error value if the validation fails.
Now hci_conn_add() returns ERR_PTR() as it can fail due to the both a
kzalloc failure and invalid MTU value.

divide error: 0000 [kernel-patches#1] PREEMPT SMP KASAN NOPTI
CPU: 0 PID: 67 Comm: kworker/u5:0 Tainted: G        W          6.9.0-rc5+ kernel-patches#20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: hci0 hci_rx_work
RIP: 0010:l2cap_le_flowctl_init+0x19e/0x3f0 net/bluetooth/l2cap_core.c:547
Code: e8 17 17 0c 00 66 41 89 9f 84 00 00 00 bf 01 00 00 00 41 b8 02 00 00 00 4c
89 fe 4c 89 e2 89 d9 e8 27 17 0c 00 44 89 f0 31 d2 <66> f7 f3 89 c3 ff c3 4d 8d
b7 88 00 00 00 4c 89 f0 48 c1 e8 03 42
RSP: 0018:ffff88810bc0f858 EFLAGS: 00010246
RAX: 00000000000002a0 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffff88810bc0f7c0 RDI: ffffc90002dcb66f
RBP: ffff88810bc0f880 R08: aa69db2dda70ff01 R09: 0000ffaaaaaaaaaa
R10: 0084000000ffaaaa R11: 0000000000000000 R12: ffff88810d65a084
R13: dffffc0000000000 R14: 00000000000002a0 R15: ffff88810d65a000
FS:  0000000000000000(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000100 CR3: 0000000103268003 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
 <TASK>
 l2cap_le_connect_req net/bluetooth/l2cap_core.c:4902 [inline]
 l2cap_le_sig_cmd net/bluetooth/l2cap_core.c:5420 [inline]
 l2cap_le_sig_channel net/bluetooth/l2cap_core.c:5486 [inline]
 l2cap_recv_frame+0xe59d/0x11710 net/bluetooth/l2cap_core.c:6809
 l2cap_recv_acldata+0x544/0x10a0 net/bluetooth/l2cap_core.c:7506
 hci_acldata_packet net/bluetooth/hci_core.c:3939 [inline]
 hci_rx_work+0x5e5/0xb20 net/bluetooth/hci_core.c:4176
 process_one_work kernel/workqueue.c:3254 [inline]
 process_scheduled_works+0x90f/0x1530 kernel/workqueue.c:3335
 worker_thread+0x926/0xe70 kernel/workqueue.c:3416
 kthread+0x2e3/0x380 kernel/kthread.c:388
 ret_from_fork+0x5c/0x90 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---

Fixes: 6ed58ec ("Bluetooth: Use LE buffers for LE traffic")
Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request May 15, 2024
… non head_frag

The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.

When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.

call stack:
...
   [exception RIP: skb_segment+3016]
    RIP: ffffffffb97df2a8  RSP: ffffa3f2cce08728  RFLAGS: 00010293
    RAX: 000000000000007d  RBX: 00000000fffff7b3  RCX: 0000000000000011
    RDX: 0000000000000000  RSI: ffff895ea32c76c0  RDI: 00000000000008c1
    RBP: ffffa3f2cce087f8   R8: 000000000000088f   R9: 0000000000000011
    R10: 000000000000090c  R11: ffff895e47e68000  R12: ffff895eb2022f00
    R13: 000000000000004b  R14: ffff895ecdaf2000  R15: ffff895eb2023f00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...

The skb has the following properties:
    doffset = 66
    list_skb = skb_shinfo(skb)->frag_list
    list_skb->head_frag = true
    skb->len = 2441 && skb->data_len = 2250
    skb_shinfo(skb)->nr_frags = 17
    skb_shinfo(skb)->gso_size = 75
    skb_shinfo(skb)->frags[0...16].bv_len = 125
    list_skb->len = 125
    list_skb->data_len = 0

3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963                             netdev_features_t features)
3964 {
3965         struct sk_buff *segs = NULL;
3966         struct sk_buff *tail = NULL;
...
4181                 while (pos < offset + len) {
4182                         if (i >= nfrags) {
4183                                 i = 0;
4184                                 nfrags = skb_shinfo(list_skb)->nr_frags;
4185                                 frag = skb_shinfo(list_skb)->frags;
4186                                 frag_skb = list_skb;

After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.

Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.

In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.

Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request May 16, 2024
… non head_frag

The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.

When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.

call stack:
...
   [exception RIP: skb_segment+3016]
    RIP: ffffffffb97df2a8  RSP: ffffa3f2cce08728  RFLAGS: 00010293
    RAX: 000000000000007d  RBX: 00000000fffff7b3  RCX: 0000000000000011
    RDX: 0000000000000000  RSI: ffff895ea32c76c0  RDI: 00000000000008c1
    RBP: ffffa3f2cce087f8   R8: 000000000000088f   R9: 0000000000000011
    R10: 000000000000090c  R11: ffff895e47e68000  R12: ffff895eb2022f00
    R13: 000000000000004b  R14: ffff895ecdaf2000  R15: ffff895eb2023f00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...

The skb has the following properties:
    doffset = 66
    list_skb = skb_shinfo(skb)->frag_list
    list_skb->head_frag = true
    skb->len = 2441 && skb->data_len = 2250
    skb_shinfo(skb)->nr_frags = 17
    skb_shinfo(skb)->gso_size = 75
    skb_shinfo(skb)->frags[0...16].bv_len = 125
    list_skb->len = 125
    list_skb->data_len = 0

3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963                             netdev_features_t features)
3964 {
3965         struct sk_buff *segs = NULL;
3966         struct sk_buff *tail = NULL;
...
4181                 while (pos < offset + len) {
4182                         if (i >= nfrags) {
4183                                 i = 0;
4184                                 nfrags = skb_shinfo(list_skb)->nr_frags;
4185                                 frag = skb_shinfo(list_skb)->frags;
4186                                 frag_skb = list_skb;

After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.

Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.

In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.

Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request May 16, 2024
… non head_frag

The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.

When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.

call stack:
...
   [exception RIP: skb_segment+3016]
    RIP: ffffffffb97df2a8  RSP: ffffa3f2cce08728  RFLAGS: 00010293
    RAX: 000000000000007d  RBX: 00000000fffff7b3  RCX: 0000000000000011
    RDX: 0000000000000000  RSI: ffff895ea32c76c0  RDI: 00000000000008c1
    RBP: ffffa3f2cce087f8   R8: 000000000000088f   R9: 0000000000000011
    R10: 000000000000090c  R11: ffff895e47e68000  R12: ffff895eb2022f00
    R13: 000000000000004b  R14: ffff895ecdaf2000  R15: ffff895eb2023f00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...

The skb has the following properties:
    doffset = 66
    list_skb = skb_shinfo(skb)->frag_list
    list_skb->head_frag = true
    skb->len = 2441 && skb->data_len = 2250
    skb_shinfo(skb)->nr_frags = 17
    skb_shinfo(skb)->gso_size = 75
    skb_shinfo(skb)->frags[0...16].bv_len = 125
    list_skb->len = 125
    list_skb->data_len = 0

3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963                             netdev_features_t features)
3964 {
3965         struct sk_buff *segs = NULL;
3966         struct sk_buff *tail = NULL;
...
4181                 while (pos < offset + len) {
4182                         if (i >= nfrags) {
4183                                 i = 0;
4184                                 nfrags = skb_shinfo(list_skb)->nr_frags;
4185                                 frag = skb_shinfo(list_skb)->frags;
4186                                 frag_skb = list_skb;

After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.

Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.

In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.

Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request May 16, 2024
… non head_frag

The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.

When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.

call stack:
...
   [exception RIP: skb_segment+3016]
    RIP: ffffffffb97df2a8  RSP: ffffa3f2cce08728  RFLAGS: 00010293
    RAX: 000000000000007d  RBX: 00000000fffff7b3  RCX: 0000000000000011
    RDX: 0000000000000000  RSI: ffff895ea32c76c0  RDI: 00000000000008c1
    RBP: ffffa3f2cce087f8   R8: 000000000000088f   R9: 0000000000000011
    R10: 000000000000090c  R11: ffff895e47e68000  R12: ffff895eb2022f00
    R13: 000000000000004b  R14: ffff895ecdaf2000  R15: ffff895eb2023f00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...

The skb has the following properties:
    doffset = 66
    list_skb = skb_shinfo(skb)->frag_list
    list_skb->head_frag = true
    skb->len = 2441 && skb->data_len = 2250
    skb_shinfo(skb)->nr_frags = 17
    skb_shinfo(skb)->gso_size = 75
    skb_shinfo(skb)->frags[0...16].bv_len = 125
    list_skb->len = 125
    list_skb->data_len = 0

3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963                             netdev_features_t features)
3964 {
3965         struct sk_buff *segs = NULL;
3966         struct sk_buff *tail = NULL;
...
4181                 while (pos < offset + len) {
4182                         if (i >= nfrags) {
4183                                 i = 0;
4184                                 nfrags = skb_shinfo(list_skb)->nr_frags;
4185                                 frag = skb_shinfo(list_skb)->frags;
4186                                 frag_skb = list_skb;

After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.

Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.

In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.

Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request May 16, 2024
… non head_frag

The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.

When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.

call stack:
...
   [exception RIP: skb_segment+3016]
    RIP: ffffffffb97df2a8  RSP: ffffa3f2cce08728  RFLAGS: 00010293
    RAX: 000000000000007d  RBX: 00000000fffff7b3  RCX: 0000000000000011
    RDX: 0000000000000000  RSI: ffff895ea32c76c0  RDI: 00000000000008c1
    RBP: ffffa3f2cce087f8   R8: 000000000000088f   R9: 0000000000000011
    R10: 000000000000090c  R11: ffff895e47e68000  R12: ffff895eb2022f00
    R13: 000000000000004b  R14: ffff895ecdaf2000  R15: ffff895eb2023f00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...

The skb has the following properties:
    doffset = 66
    list_skb = skb_shinfo(skb)->frag_list
    list_skb->head_frag = true
    skb->len = 2441 && skb->data_len = 2250
    skb_shinfo(skb)->nr_frags = 17
    skb_shinfo(skb)->gso_size = 75
    skb_shinfo(skb)->frags[0...16].bv_len = 125
    list_skb->len = 125
    list_skb->data_len = 0

3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963                             netdev_features_t features)
3964 {
3965         struct sk_buff *segs = NULL;
3966         struct sk_buff *tail = NULL;
...
4181                 while (pos < offset + len) {
4182                         if (i >= nfrags) {
4183                                 i = 0;
4184                                 nfrags = skb_shinfo(list_skb)->nr_frags;
4185                                 frag = skb_shinfo(list_skb)->frags;
4186                                 frag_skb = list_skb;

After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.

Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.

In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.

Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request May 16, 2024
… non head_frag

The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.

When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.

call stack:
...
   [exception RIP: skb_segment+3016]
    RIP: ffffffffb97df2a8  RSP: ffffa3f2cce08728  RFLAGS: 00010293
    RAX: 000000000000007d  RBX: 00000000fffff7b3  RCX: 0000000000000011
    RDX: 0000000000000000  RSI: ffff895ea32c76c0  RDI: 00000000000008c1
    RBP: ffffa3f2cce087f8   R8: 000000000000088f   R9: 0000000000000011
    R10: 000000000000090c  R11: ffff895e47e68000  R12: ffff895eb2022f00
    R13: 000000000000004b  R14: ffff895ecdaf2000  R15: ffff895eb2023f00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...

The skb has the following properties:
    doffset = 66
    list_skb = skb_shinfo(skb)->frag_list
    list_skb->head_frag = true
    skb->len = 2441 && skb->data_len = 2250
    skb_shinfo(skb)->nr_frags = 17
    skb_shinfo(skb)->gso_size = 75
    skb_shinfo(skb)->frags[0...16].bv_len = 125
    list_skb->len = 125
    list_skb->data_len = 0

3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963                             netdev_features_t features)
3964 {
3965         struct sk_buff *segs = NULL;
3966         struct sk_buff *tail = NULL;
...
4181                 while (pos < offset + len) {
4182                         if (i >= nfrags) {
4183                                 i = 0;
4184                                 nfrags = skb_shinfo(list_skb)->nr_frags;
4185                                 frag = skb_shinfo(list_skb)->frags;
4186                                 frag_skb = list_skb;

After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.

Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.

In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.

Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request May 16, 2024
… non head_frag

The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.

When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.

call stack:
...
   [exception RIP: skb_segment+3016]
    RIP: ffffffffb97df2a8  RSP: ffffa3f2cce08728  RFLAGS: 00010293
    RAX: 000000000000007d  RBX: 00000000fffff7b3  RCX: 0000000000000011
    RDX: 0000000000000000  RSI: ffff895ea32c76c0  RDI: 00000000000008c1
    RBP: ffffa3f2cce087f8   R8: 000000000000088f   R9: 0000000000000011
    R10: 000000000000090c  R11: ffff895e47e68000  R12: ffff895eb2022f00
    R13: 000000000000004b  R14: ffff895ecdaf2000  R15: ffff895eb2023f00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...

The skb has the following properties:
    doffset = 66
    list_skb = skb_shinfo(skb)->frag_list
    list_skb->head_frag = true
    skb->len = 2441 && skb->data_len = 2250
    skb_shinfo(skb)->nr_frags = 17
    skb_shinfo(skb)->gso_size = 75
    skb_shinfo(skb)->frags[0...16].bv_len = 125
    list_skb->len = 125
    list_skb->data_len = 0

3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963                             netdev_features_t features)
3964 {
3965         struct sk_buff *segs = NULL;
3966         struct sk_buff *tail = NULL;
...
4181                 while (pos < offset + len) {
4182                         if (i >= nfrags) {
4183                                 i = 0;
4184                                 nfrags = skb_shinfo(list_skb)->nr_frags;
4185                                 frag = skb_shinfo(list_skb)->frags;
4186                                 frag_skb = list_skb;

After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.

Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.

In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.

Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request May 16, 2024
… non head_frag

The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.

When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.

call stack:
...
   [exception RIP: skb_segment+3016]
    RIP: ffffffffb97df2a8  RSP: ffffa3f2cce08728  RFLAGS: 00010293
    RAX: 000000000000007d  RBX: 00000000fffff7b3  RCX: 0000000000000011
    RDX: 0000000000000000  RSI: ffff895ea32c76c0  RDI: 00000000000008c1
    RBP: ffffa3f2cce087f8   R8: 000000000000088f   R9: 0000000000000011
    R10: 000000000000090c  R11: ffff895e47e68000  R12: ffff895eb2022f00
    R13: 000000000000004b  R14: ffff895ecdaf2000  R15: ffff895eb2023f00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...

The skb has the following properties:
    doffset = 66
    list_skb = skb_shinfo(skb)->frag_list
    list_skb->head_frag = true
    skb->len = 2441 && skb->data_len = 2250
    skb_shinfo(skb)->nr_frags = 17
    skb_shinfo(skb)->gso_size = 75
    skb_shinfo(skb)->frags[0...16].bv_len = 125
    list_skb->len = 125
    list_skb->data_len = 0

3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963                             netdev_features_t features)
3964 {
3965         struct sk_buff *segs = NULL;
3966         struct sk_buff *tail = NULL;
...
4181                 while (pos < offset + len) {
4182                         if (i >= nfrags) {
4183                                 i = 0;
4184                                 nfrags = skb_shinfo(list_skb)->nr_frags;
4185                                 frag = skb_shinfo(list_skb)->frags;
4186                                 frag_skb = list_skb;

After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.

Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.

In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.

Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request May 16, 2024
… non head_frag

The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.

When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.

call stack:
...
   [exception RIP: skb_segment+3016]
    RIP: ffffffffb97df2a8  RSP: ffffa3f2cce08728  RFLAGS: 00010293
    RAX: 000000000000007d  RBX: 00000000fffff7b3  RCX: 0000000000000011
    RDX: 0000000000000000  RSI: ffff895ea32c76c0  RDI: 00000000000008c1
    RBP: ffffa3f2cce087f8   R8: 000000000000088f   R9: 0000000000000011
    R10: 000000000000090c  R11: ffff895e47e68000  R12: ffff895eb2022f00
    R13: 000000000000004b  R14: ffff895ecdaf2000  R15: ffff895eb2023f00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...

The skb has the following properties:
    doffset = 66
    list_skb = skb_shinfo(skb)->frag_list
    list_skb->head_frag = true
    skb->len = 2441 && skb->data_len = 2250
    skb_shinfo(skb)->nr_frags = 17
    skb_shinfo(skb)->gso_size = 75
    skb_shinfo(skb)->frags[0...16].bv_len = 125
    list_skb->len = 125
    list_skb->data_len = 0

3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963                             netdev_features_t features)
3964 {
3965         struct sk_buff *segs = NULL;
3966         struct sk_buff *tail = NULL;
...
4181                 while (pos < offset + len) {
4182                         if (i >= nfrags) {
4183                                 i = 0;
4184                                 nfrags = skb_shinfo(list_skb)->nr_frags;
4185                                 frag = skb_shinfo(list_skb)->frags;
4186                                 frag_skb = list_skb;

After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.

Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.

In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.

Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request May 17, 2024
… non head_frag

The crashed kernel version is 5.16.20, and I have not test this patch
because I dont find a way to reproduce it, and the mailine may be
has the same problem.

When using bpf based NAT, hits a kernel BUG_ON at function skb_segment(),
BUG_ON(skb_headlen(list_skb) > len). The bpf calls the bpf_skb_adjust_room
to decrease the gso_size, and then call bpf_redirect send packet out.

call stack:
...
   [exception RIP: skb_segment+3016]
    RIP: ffffffffb97df2a8  RSP: ffffa3f2cce08728  RFLAGS: 00010293
    RAX: 000000000000007d  RBX: 00000000fffff7b3  RCX: 0000000000000011
    RDX: 0000000000000000  RSI: ffff895ea32c76c0  RDI: 00000000000008c1
    RBP: ffffa3f2cce087f8   R8: 000000000000088f   R9: 0000000000000011
    R10: 000000000000090c  R11: ffff895e47e68000  R12: ffff895eb2022f00
    R13: 000000000000004b  R14: ffff895ecdaf2000  R15: ffff895eb2023f00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 kernel-patches#9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
kernel-patches#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
kernel-patches#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
kernel-patches#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
kernel-patches#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
kernel-patches#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
kernel-patches#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
kernel-patches#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
kernel-patches#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
kernel-patches#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
kernel-patches#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
kernel-patches#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
kernel-patches#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
kernel-patches#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
...

The skb has the following properties:
    doffset = 66
    list_skb = skb_shinfo(skb)->frag_list
    list_skb->head_frag = true
    skb->len = 2441 && skb->data_len = 2250
    skb_shinfo(skb)->nr_frags = 17
    skb_shinfo(skb)->gso_size = 75
    skb_shinfo(skb)->frags[0...16].bv_len = 125
    list_skb->len = 125
    list_skb->data_len = 0

3962 struct sk_buff *skb_segment(struct sk_buff *head_skb,
3963                             netdev_features_t features)
3964 {
3965         struct sk_buff *segs = NULL;
3966         struct sk_buff *tail = NULL;
...
4181                 while (pos < offset + len) {
4182                         if (i >= nfrags) {
4183                                 i = 0;
4184                                 nfrags = skb_shinfo(list_skb)->nr_frags;
4185                                 frag = skb_shinfo(list_skb)->frags;
4186                                 frag_skb = list_skb;

After segment the head_skb's last frag, the (pos == offset+len), so break the
while at line 4181, run into this BUG_ON(), not segment the head_frag frag_list
skb.

Since commit 13acc94(net: permit skb_segment on head_frag frag_list skb),
it is allowed to segment the head_frag frag_list skb.

In commit 3dcbdb1 (net: gso: Fix skb_segment splat when splitting gso_size
mangled skb having linear-headed frag_list), it is cleared the NETIF_F_SG if it
has non head_frag skb. It is not cleared the NETIF_F_SG only with one head_frag
frag_list skb.

Signed-off-by: Fred Li <dracodingfly@gmail.com>
Signed-off-by: NipaLocal <nipa@local>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request May 28, 2024
ui_browser__show() is capturing the input title that is stack allocated
memory in hist_browser__run().

Avoid a use after return by strdup-ing the string.

Committer notes:

Further explanation from Ian Rogers:

My command line using tui is:
$ sudo bash -c 'rm /tmp/asan.log*; export
ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a
sleep 1; /tmp/perf/perf mem report'
I then go to the perf annotate view and quit. This triggers the asan
error (from the log file):
```
==1254591==ERROR: AddressSanitizer: stack-use-after-return on address
0x7f2813331920 at pc 0x7f28180
65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10
READ of size 80 at 0x7f2813331920 thread T0
    #0 0x7f2818065990 in __interceptor_strlen
../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461
    #1 0x7f2817698251 in SLsmg_write_wrapped_string
(/lib/x86_64-linux-gnu/libslang.so.2+0x98251)
    #2 0x7f28176984b9 in SLsmg_write_nstring
(/lib/x86_64-linux-gnu/libslang.so.2+0x984b9)
    #3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60
    #4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266
    #5 0x55c94045c776 in ui_browser__show ui/browser.c:288
    #6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206
    #7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458
    #8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412
    #9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527
    #10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613
    #11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661
    #12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671
    #13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141
    #14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805
    #15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374
    #16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516
    #17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350
    #18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403
    #19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447
    #20 0x55c9400e53ad in main tools/perf/perf.c:561
    #21 0x7f28170456c9 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
    #22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360
    #23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId:
84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93)

Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame
    #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746

  This frame has 1 object(s):
    [32, 192) 'title' (line 747) <== Memory access at offset 32 is
inside this variable
HINT: this may be a false positive if your program uses some custom
stack unwind mechanism, swapcontext or vfork
```
hist_browser__run isn't on the stack so the asan error looks legit.
There's no clean init/exit on struct ui_browser so I may be trading a
use-after-return for a memory leak, but that seems look a good trade
anyway.

Fixes: 05e8b08 ("perf ui browser: Stop using 'self'")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240507183545.1236093-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jun 25, 2024
Run the following BPF selftests on Loongarch:

./test_progs -t sockmap_listen

A Kernel panic occurs:

'''
 Oops[#1]:
 CPU: 49 PID: 233429 Comm: new_name Tainted: G           OE      6.10.0-rc2+ #20
 Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018-V4.0.11
 pc 0000000000000000 ra 90000000051ea4a0 tp 900030008549c000 sp 900030008549fe00
 a0 9000300152524a00 a1 0000000000000000 a2 900030008549fe38 a3 900030008549fe30
 a4 900030008549fe30 a5 90003000c58c8d80 a6 0000000000000000 a7 0000000000000039
 t0 0000000000000000 t1 90003000c58c8d80 t2 0000000000000001 t3 0000000000000000
 t4 0000000000000001 t5 900000011a1bf580 t6 900000011a3aff60 t7 000000000000006b
 t8 00000fffffffffff u0 0000000000000000 s9 00007fffbbe9e930 s0 9000300152524a00
 s1 90003000c58c8d00 s2 9000000006c81568 s3 0000000000000000 s4 90003000c58c8d80
 s5 00007ffff236a000 s6 00007ffffbc292b0 s7 00007ffffbc29998 s8 00007fffbbe9f180
    ra: 90000000051ea4a0 inet_release+0x60/0xc0
   ERA: 0000000000000000 0x0
  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
  PRMD: 0000000c (PPLV0 +PIE +PWE)
  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
 ESTAT: 00030000 [PIF] (IS= ECode=3 EsubCode=0)
  BADV: 0000000000000000
  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
 Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp
 Process new_name (pid: 233429, threadinfo=00000000b9196405, task=00000000c01df45b)
 Stack : 0000000000000000 90003000c58c8e20 90003000c58c8d00 900000000505960c
         0000000000000000 9000000101c6ad20 9000300086524540 00000000082e0003
         900030008bf57400 90000000050596bc 900030008bf57400 900000000434acac
         0000000000000016 00007ffff224e060 00007fffbbe9f180 900030008bf57400
         0000000000000000 9000000004341ce0 00007fffbbe9f180 00007ffff2369000
         900030008549fec0 90000000054476ec 000000000000006b 9000000003f71da4
         000000000000003a 00007ffff22b8a44 00007fffbbe9f8e0 00007fffbbe9e680
         ffffffffffffffda 0000000000000000 0000000000000000 0000000000000000
         00007fffbbe9f288 0000000000000000 0000000000000000 0000000000000039
         84c2431493ceab6e 84c23ceb2827425e 0000000000000007 00007ffff2271600
         ...
 Call Trace:
 [<900000000505960c>] __sock_release+0x4c/0xe0
 [<90000000050596bc>] sock_close+0x1c/0x40
 [<900000000434acac>] __fput+0xec/0x2e0
 [<9000000004341ce0>] sys_close+0x40/0xa0
 [<90000000054476ec>] do_syscall+0x8c/0xc0
 [<9000000003f71da4>] handle_syscall+0xc4/0x160

 Code: (Bad address in era)

 ---[ end trace 0000000000000000 ]---
 Kernel panic - not syncing: Fatal exception
 Kernel relocated by 0x3d50000
  .text @ 0x9000000003f50000
  .data @ 0x90000000055b0000
  .bss  @ 0x9000000006ca9400
 ---[ end Kernel panic - not syncing: Fatal exception ]---
'''

This is because "sk->sk_prot->close" pointer is NULL in that case. This
patch adds null check for it in inet_release() to fix this error.

Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jun 26, 2024
Run the following BPF selftests on Loongarch:

./test_progs -t sockmap_listen

A Kernel panic occurs:

'''
 Oops[#1]:
 CPU: 49 PID: 233429 Comm: new_name Tainted: G           OE      6.10.0-rc2+ #20
 Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018-V4.0.11
 pc 0000000000000000 ra 90000000051ea4a0 tp 900030008549c000 sp 900030008549fe00
 a0 9000300152524a00 a1 0000000000000000 a2 900030008549fe38 a3 900030008549fe30
 a4 900030008549fe30 a5 90003000c58c8d80 a6 0000000000000000 a7 0000000000000039
 t0 0000000000000000 t1 90003000c58c8d80 t2 0000000000000001 t3 0000000000000000
 t4 0000000000000001 t5 900000011a1bf580 t6 900000011a3aff60 t7 000000000000006b
 t8 00000fffffffffff u0 0000000000000000 s9 00007fffbbe9e930 s0 9000300152524a00
 s1 90003000c58c8d00 s2 9000000006c81568 s3 0000000000000000 s4 90003000c58c8d80
 s5 00007ffff236a000 s6 00007ffffbc292b0 s7 00007ffffbc29998 s8 00007fffbbe9f180
    ra: 90000000051ea4a0 inet_release+0x60/0xc0
   ERA: 0000000000000000 0x0
  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
  PRMD: 0000000c (PPLV0 +PIE +PWE)
  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
 ESTAT: 00030000 [PIF] (IS= ECode=3 EsubCode=0)
  BADV: 0000000000000000
  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
 Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp
 Process new_name (pid: 233429, threadinfo=00000000b9196405, task=00000000c01df45b)
 Stack : 0000000000000000 90003000c58c8e20 90003000c58c8d00 900000000505960c
         0000000000000000 9000000101c6ad20 9000300086524540 00000000082e0003
         900030008bf57400 90000000050596bc 900030008bf57400 900000000434acac
         0000000000000016 00007ffff224e060 00007fffbbe9f180 900030008bf57400
         0000000000000000 9000000004341ce0 00007fffbbe9f180 00007ffff2369000
         900030008549fec0 90000000054476ec 000000000000006b 9000000003f71da4
         000000000000003a 00007ffff22b8a44 00007fffbbe9f8e0 00007fffbbe9e680
         ffffffffffffffda 0000000000000000 0000000000000000 0000000000000000
         00007fffbbe9f288 0000000000000000 0000000000000000 0000000000000039
         84c2431493ceab6e 84c23ceb2827425e 0000000000000007 00007ffff2271600
         ...
 Call Trace:
 [<900000000505960c>] __sock_release+0x4c/0xe0
 [<90000000050596bc>] sock_close+0x1c/0x40
 [<900000000434acac>] __fput+0xec/0x2e0
 [<9000000004341ce0>] sys_close+0x40/0xa0
 [<90000000054476ec>] do_syscall+0x8c/0xc0
 [<9000000003f71da4>] handle_syscall+0xc4/0x160

 Code: (Bad address in era)

 ---[ end trace 0000000000000000 ]---
 Kernel panic - not syncing: Fatal exception
 Kernel relocated by 0x3d50000
  .text @ 0x9000000003f50000
  .data @ 0x90000000055b0000
  .bss  @ 0x9000000006ca9400
 ---[ end Kernel panic - not syncing: Fatal exception ]---
'''

This is because "sk->sk_prot->close" pointer is NULL in that case. This
patch adds null check for it in inet_release() to fix this error.

Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jun 26, 2024
Run the following BPF selftests on Loongarch:

./test_progs -t sockmap_listen

A Kernel panic occurs:

'''
 Oops[#1]:
 CPU: 49 PID: 233429 Comm: new_name Tainted: G           OE      6.10.0-rc2+ #20
 Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018-V4.0.11
 pc 0000000000000000 ra 90000000051ea4a0 tp 900030008549c000 sp 900030008549fe00
 a0 9000300152524a00 a1 0000000000000000 a2 900030008549fe38 a3 900030008549fe30
 a4 900030008549fe30 a5 90003000c58c8d80 a6 0000000000000000 a7 0000000000000039
 t0 0000000000000000 t1 90003000c58c8d80 t2 0000000000000001 t3 0000000000000000
 t4 0000000000000001 t5 900000011a1bf580 t6 900000011a3aff60 t7 000000000000006b
 t8 00000fffffffffff u0 0000000000000000 s9 00007fffbbe9e930 s0 9000300152524a00
 s1 90003000c58c8d00 s2 9000000006c81568 s3 0000000000000000 s4 90003000c58c8d80
 s5 00007ffff236a000 s6 00007ffffbc292b0 s7 00007ffffbc29998 s8 00007fffbbe9f180
    ra: 90000000051ea4a0 inet_release+0x60/0xc0
   ERA: 0000000000000000 0x0
  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
  PRMD: 0000000c (PPLV0 +PIE +PWE)
  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
 ESTAT: 00030000 [PIF] (IS= ECode=3 EsubCode=0)
  BADV: 0000000000000000
  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
 Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp
 Process new_name (pid: 233429, threadinfo=00000000b9196405, task=00000000c01df45b)
 Stack : 0000000000000000 90003000c58c8e20 90003000c58c8d00 900000000505960c
         0000000000000000 9000000101c6ad20 9000300086524540 00000000082e0003
         900030008bf57400 90000000050596bc 900030008bf57400 900000000434acac
         0000000000000016 00007ffff224e060 00007fffbbe9f180 900030008bf57400
         0000000000000000 9000000004341ce0 00007fffbbe9f180 00007ffff2369000
         900030008549fec0 90000000054476ec 000000000000006b 9000000003f71da4
         000000000000003a 00007ffff22b8a44 00007fffbbe9f8e0 00007fffbbe9e680
         ffffffffffffffda 0000000000000000 0000000000000000 0000000000000000
         00007fffbbe9f288 0000000000000000 0000000000000000 0000000000000039
         84c2431493ceab6e 84c23ceb2827425e 0000000000000007 00007ffff2271600
         ...
 Call Trace:
 [<900000000505960c>] __sock_release+0x4c/0xe0
 [<90000000050596bc>] sock_close+0x1c/0x40
 [<900000000434acac>] __fput+0xec/0x2e0
 [<9000000004341ce0>] sys_close+0x40/0xa0
 [<90000000054476ec>] do_syscall+0x8c/0xc0
 [<9000000003f71da4>] handle_syscall+0xc4/0x160

 Code: (Bad address in era)

 ---[ end trace 0000000000000000 ]---
 Kernel panic - not syncing: Fatal exception
 Kernel relocated by 0x3d50000
  .text @ 0x9000000003f50000
  .data @ 0x90000000055b0000
  .bss  @ 0x9000000006ca9400
 ---[ end Kernel panic - not syncing: Fatal exception ]---
'''

This is because "sk->sk_prot->close" pointer is NULL in that case. This
patch adds null check for it in inet_release() to fix this error.

Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jun 26, 2024
Run the following BPF selftests on Loongarch:

./test_progs -t sockmap_listen

A Kernel panic occurs:

'''
 Oops[#1]:
 CPU: 49 PID: 233429 Comm: new_name Tainted: G           OE      6.10.0-rc2+ #20
 Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018-V4.0.11
 pc 0000000000000000 ra 90000000051ea4a0 tp 900030008549c000 sp 900030008549fe00
 a0 9000300152524a00 a1 0000000000000000 a2 900030008549fe38 a3 900030008549fe30
 a4 900030008549fe30 a5 90003000c58c8d80 a6 0000000000000000 a7 0000000000000039
 t0 0000000000000000 t1 90003000c58c8d80 t2 0000000000000001 t3 0000000000000000
 t4 0000000000000001 t5 900000011a1bf580 t6 900000011a3aff60 t7 000000000000006b
 t8 00000fffffffffff u0 0000000000000000 s9 00007fffbbe9e930 s0 9000300152524a00
 s1 90003000c58c8d00 s2 9000000006c81568 s3 0000000000000000 s4 90003000c58c8d80
 s5 00007ffff236a000 s6 00007ffffbc292b0 s7 00007ffffbc29998 s8 00007fffbbe9f180
    ra: 90000000051ea4a0 inet_release+0x60/0xc0
   ERA: 0000000000000000 0x0
  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
  PRMD: 0000000c (PPLV0 +PIE +PWE)
  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
 ESTAT: 00030000 [PIF] (IS= ECode=3 EsubCode=0)
  BADV: 0000000000000000
  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
 Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp
 Process new_name (pid: 233429, threadinfo=00000000b9196405, task=00000000c01df45b)
 Stack : 0000000000000000 90003000c58c8e20 90003000c58c8d00 900000000505960c
         0000000000000000 9000000101c6ad20 9000300086524540 00000000082e0003
         900030008bf57400 90000000050596bc 900030008bf57400 900000000434acac
         0000000000000016 00007ffff224e060 00007fffbbe9f180 900030008bf57400
         0000000000000000 9000000004341ce0 00007fffbbe9f180 00007ffff2369000
         900030008549fec0 90000000054476ec 000000000000006b 9000000003f71da4
         000000000000003a 00007ffff22b8a44 00007fffbbe9f8e0 00007fffbbe9e680
         ffffffffffffffda 0000000000000000 0000000000000000 0000000000000000
         00007fffbbe9f288 0000000000000000 0000000000000000 0000000000000039
         84c2431493ceab6e 84c23ceb2827425e 0000000000000007 00007ffff2271600
         ...
 Call Trace:
 [<900000000505960c>] __sock_release+0x4c/0xe0
 [<90000000050596bc>] sock_close+0x1c/0x40
 [<900000000434acac>] __fput+0xec/0x2e0
 [<9000000004341ce0>] sys_close+0x40/0xa0
 [<90000000054476ec>] do_syscall+0x8c/0xc0
 [<9000000003f71da4>] handle_syscall+0xc4/0x160

 Code: (Bad address in era)

 ---[ end trace 0000000000000000 ]---
 Kernel panic - not syncing: Fatal exception
 Kernel relocated by 0x3d50000
  .text @ 0x9000000003f50000
  .data @ 0x90000000055b0000
  .bss  @ 0x9000000006ca9400
 ---[ end Kernel panic - not syncing: Fatal exception ]---
'''

This is because "sk->sk_prot->close" pointer is NULL in that case. This
patch adds null check for it in inet_release() to fix this error.

Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jun 27, 2024
Run the following BPF selftests on Loongarch:

./test_progs -t sockmap_listen

A Kernel panic occurs:

'''
 Oops[#1]:
 CPU: 49 PID: 233429 Comm: new_name Tainted: G           OE      6.10.0-rc2+ #20
 Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018-V4.0.11
 pc 0000000000000000 ra 90000000051ea4a0 tp 900030008549c000 sp 900030008549fe00
 a0 9000300152524a00 a1 0000000000000000 a2 900030008549fe38 a3 900030008549fe30
 a4 900030008549fe30 a5 90003000c58c8d80 a6 0000000000000000 a7 0000000000000039
 t0 0000000000000000 t1 90003000c58c8d80 t2 0000000000000001 t3 0000000000000000
 t4 0000000000000001 t5 900000011a1bf580 t6 900000011a3aff60 t7 000000000000006b
 t8 00000fffffffffff u0 0000000000000000 s9 00007fffbbe9e930 s0 9000300152524a00
 s1 90003000c58c8d00 s2 9000000006c81568 s3 0000000000000000 s4 90003000c58c8d80
 s5 00007ffff236a000 s6 00007ffffbc292b0 s7 00007ffffbc29998 s8 00007fffbbe9f180
    ra: 90000000051ea4a0 inet_release+0x60/0xc0
   ERA: 0000000000000000 0x0
  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
  PRMD: 0000000c (PPLV0 +PIE +PWE)
  EUEN: 00000000 (-FPE -SXE -ASXE -BTE)
  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
 ESTAT: 00030000 [PIF] (IS= ECode=3 EsubCode=0)
  BADV: 0000000000000000
  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
 Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp
 Process new_name (pid: 233429, threadinfo=00000000b9196405, task=00000000c01df45b)
 Stack : 0000000000000000 90003000c58c8e20 90003000c58c8d00 900000000505960c
         0000000000000000 9000000101c6ad20 9000300086524540 00000000082e0003
         900030008bf57400 90000000050596bc 900030008bf57400 900000000434acac
         0000000000000016 00007ffff224e060 00007fffbbe9f180 900030008bf57400
         0000000000000000 9000000004341ce0 00007fffbbe9f180 00007ffff2369000
         900030008549fec0 90000000054476ec 000000000000006b 9000000003f71da4
         000000000000003a 00007ffff22b8a44 00007fffbbe9f8e0 00007fffbbe9e680
         ffffffffffffffda 0000000000000000 0000000000000000 0000000000000000
         00007fffbbe9f288 0000000000000000 0000000000000000 0000000000000039
         84c2431493ceab6e 84c23ceb2827425e 0000000000000007 00007ffff2271600
         ...
 Call Trace:
 [<900000000505960c>] __sock_release+0x4c/0xe0
 [<90000000050596bc>] sock_close+0x1c/0x40
 [<900000000434acac>] __fput+0xec/0x2e0
 [<9000000004341ce0>] sys_close+0x40/0xa0
 [<90000000054476ec>] do_syscall+0x8c/0xc0
 [<9000000003f71da4>] handle_syscall+0xc4/0x160

 Code: (Bad address in era)

 ---[ end trace 0000000000000000 ]---
 Kernel panic - not syncing: Fatal exception
 Kernel relocated by 0x3d50000
  .text @ 0x9000000003f50000
  .data @ 0x90000000055b0000
  .bss  @ 0x9000000006ca9400
 ---[ end Kernel panic - not syncing: Fatal exception ]---
'''

This is because "sk->sk_prot->close" pointer is NULL in that case. This
patch adds null check for it in inet_release() to fix this error.

Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 27, 2024
In the TRACE_EVENT(qdisc_reset) NULL dereference occurred from

 qdisc->dev_queue->dev <NULL> ->name

This situation simulated from bunch of veths and Bluetooth disconnection
and reconnection.

During qdisc initialization, qdisc was being set to noop_queue.
In veth_init_queue, the initial tx_num was reduced back to one,
causing the qdisc reset to be called with noop, which led to the kernel
panic.

I've attached the GitHub gist link that C converted syz-execprogram
source code and 3 log of reproduced vmcore-dmesg.

 https://gist.github.com/yskelg/cc64562873ce249cdd0d5a358b77d740

Yeoreum and I use two fuzzing tool simultaneously.

One process with syz-executor : https://github.com/google/syzkaller

 $ ./syz-execprog -executor=./syz-executor -repeat=1 -sandbox=setuid \
    -enable=none -collide=false log1

The other process with perf fuzzer:
 https://github.com/deater/perf_event_tests/tree/master/fuzzer

 $ perf_event_tests/fuzzer/perf_fuzzer

I think this will happen on the kernel version.

 Linux kernel version +v6.7.10, +v6.8, +v6.9 and it could happen in v6.10.

This occurred from 51270d5. I think this patch is absolutely
necessary. Previously, It was showing not intended string value of name.

I've reproduced 3 time from my fedora 40 Debug Kernel with any other module
or patched.

 version: 6.10.0-0.rc2.20240608gitdc772f8237f9.29.fc41.aarch64+debug

[ 5287.164555] veth0_vlan: left promiscuous mode
[ 5287.164929] veth1_macvtap: left promiscuous mode
[ 5287.164950] veth0_macvtap: left promiscuous mode
[ 5287.164983] veth1_vlan: left promiscuous mode
[ 5287.165008] veth0_vlan: left promiscuous mode
[ 5287.165450] veth1_macvtap: left promiscuous mode
[ 5287.165472] veth0_macvtap: left promiscuous mode
[ 5287.165502] veth1_vlan: left promiscuous mode
…
[ 5297.598240] bridge0: port 2(bridge_slave_1) entered blocking state
[ 5297.598262] bridge0: port 2(bridge_slave_1) entered forwarding state
[ 5297.598296] bridge0: port 1(bridge_slave_0) entered blocking state
[ 5297.598313] bridge0: port 1(bridge_slave_0) entered forwarding state
[ 5297.616090] 8021q: adding VLAN 0 to HW filter on device bond0
[ 5297.620405] bridge0: port 1(bridge_slave_0) entered disabled state
[ 5297.620730] bridge0: port 2(bridge_slave_1) entered disabled state
[ 5297.627247] 8021q: adding VLAN 0 to HW filter on device team0
[ 5297.629636] bridge0: port 1(bridge_slave_0) entered blocking state
…
[ 5298.002798] bridge_slave_0: left promiscuous mode
[ 5298.002869] bridge0: port 1(bridge_slave_0) entered disabled state
[ 5298.309444] bond0 (unregistering): (slave bond_slave_0): Releasing backup interface
[ 5298.315206] bond0 (unregistering): (slave bond_slave_1): Releasing backup interface
[ 5298.320207] bond0 (unregistering): Released all slaves
[ 5298.354296] hsr_slave_0: left promiscuous mode
[ 5298.360750] hsr_slave_1: left promiscuous mode
[ 5298.374889] veth1_macvtap: left promiscuous mode
[ 5298.374931] veth0_macvtap: left promiscuous mode
[ 5298.374988] veth1_vlan: left promiscuous mode
[ 5298.375024] veth0_vlan: left promiscuous mode
[ 5299.109741] team0 (unregistering): Port device team_slave_1 removed
[ 5299.185870] team0 (unregistering): Port device team_slave_0 removed
…
[ 5300.155443] Bluetooth: hci3: unexpected cc 0x0c03 length: 249 > 1
[ 5300.155724] Bluetooth: hci3: unexpected cc 0x1003 length: 249 > 9
[ 5300.155988] Bluetooth: hci3: unexpected cc 0x1001 length: 249 > 9
….
[ 5301.075531] team0: Port device team_slave_1 added
[ 5301.085515] bridge0: port 1(bridge_slave_0) entered blocking state
[ 5301.085531] bridge0: port 1(bridge_slave_0) entered disabled state
[ 5301.085588] bridge_slave_0: entered allmulticast mode
[ 5301.085800] bridge_slave_0: entered promiscuous mode
[ 5301.095617] bridge0: port 1(bridge_slave_0) entered blocking state
[ 5301.095633] bridge0: port 1(bridge_slave_0) entered disabled state
…
[ 5301.149734] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
[ 5301.173234] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
[ 5301.180517] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link
[ 5301.193481] hsr_slave_0: entered promiscuous mode
[ 5301.204425] hsr_slave_1: entered promiscuous mode
[ 5301.210172] debugfs: Directory 'hsr0' with parent 'hsr' already present!
[ 5301.210185] Cannot create hsr debugfs directory
[ 5301.224061] bond0: (slave bond_slave_1): Enslaving as an active interface with an up link
[ 5301.246901] bond0: (slave bond_slave_0): Enslaving as an active interface with an up link
[ 5301.255934] team0: Port device team_slave_0 added
[ 5301.256480] team0: Port device team_slave_1 added
[ 5301.256948] team0: Port device team_slave_0 added
…
[ 5301.435928] hsr_slave_0: entered promiscuous mode
[ 5301.446029] hsr_slave_1: entered promiscuous mode
[ 5301.455872] debugfs: Directory 'hsr0' with parent 'hsr' already present!
[ 5301.455884] Cannot create hsr debugfs directory
[ 5301.502664] hsr_slave_0: entered promiscuous mode
[ 5301.513675] hsr_slave_1: entered promiscuous mode
[ 5301.526155] debugfs: Directory 'hsr0' with parent 'hsr' already present!
[ 5301.526164] Cannot create hsr debugfs directory
[ 5301.563662] hsr_slave_0: entered promiscuous mode
[ 5301.576129] hsr_slave_1: entered promiscuous mode
[ 5301.580259] debugfs: Directory 'hsr0' with parent 'hsr' already present!
[ 5301.580270] Cannot create hsr debugfs directory
[ 5301.590269] 8021q: adding VLAN 0 to HW filter on device bond0

[ 5301.595872] KASAN: null-ptr-deref in range [0x0000000000000130-0x0000000000000137]
[ 5301.595877] Mem abort info:
[ 5301.595881]   ESR = 0x0000000096000006
[ 5301.595885]   EC = 0x25: DABT (current EL), IL = 32 bits
[ 5301.595889]   SET = 0, FnV = 0
[ 5301.595893]   EA = 0, S1PTW = 0
[ 5301.595896]   FSC = 0x06: level 2 translation fault
[ 5301.595900] Data abort info:
[ 5301.595903]   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
[ 5301.595907]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 5301.595911]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 5301.595915] [dfff800000000026] address between user and kernel address ranges
[ 5301.595971] Internal error: Oops: 0000000096000006 [kernel-patches#1] SMP
…
[ 5301.596076] CPU: 2 PID: 102769 Comm:
syz-executor.3 Kdump: loaded Tainted:
 G        W         -------  ---  6.10.0-0.rc2.20240608gitdc772f8237f9.29.fc41.aarch64+debug kernel-patches#1
[ 5301.596080] Hardware name: VMware, Inc. VMware20,1/VBSA,
 BIOS VMW201.00V.21805430.BA64.2305221830 05/22/2023
[ 5301.596082] pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 5301.596085] pc : strnlen+0x40/0x88
[ 5301.596114] lr : trace_event_get_offsets_qdisc_reset+0x6c/0x2b0
[ 5301.596124] sp : ffff8000beef6b40
[ 5301.596126] x29: ffff8000beef6b40 x28: dfff800000000000 x27: 0000000000000001
[ 5301.596131] x26: 6de1800082c62bd0 x25: 1ffff000110aa9e0 x24: ffff800088554f00
[ 5301.596136] x23: ffff800088554ec0 x22: 0000000000000130 x21: 0000000000000140
[ 5301.596140] x20: dfff800000000000 x19: ffff8000beef6c60 x18: ffff7000115106d8
[ 5301.596143] x17: ffff800121bad000 x16: ffff800080020000 x15: 0000000000000006
[ 5301.596147] x14: 0000000000000002 x13: ffff0001f3ed8d14 x12: ffff700017ddeda5
[ 5301.596151] x11: 1ffff00017ddeda4 x10: ffff700017ddeda4 x9 : ffff800082cc5eec
[ 5301.596155] x8 : 0000000000000004 x7 : 00000000f1f1f1f1 x6 : 00000000f2f2f200
[ 5301.596158] x5 : 00000000f3f3f3f3 x4 : ffff700017dded80 x3 : 00000000f204f1f1
[ 5301.596162] x2 : 0000000000000026 x1 : 0000000000000000 x0 : 0000000000000130
[ 5301.596166] Call trace:
[ 5301.596175]  strnlen+0x40/0x88
[ 5301.596179]  trace_event_get_offsets_qdisc_reset+0x6c/0x2b0
[ 5301.596182]  perf_trace_qdisc_reset+0xb0/0x538
[ 5301.596184]  __traceiter_qdisc_reset+0x68/0xc0
[ 5301.596188]  qdisc_reset+0x43c/0x5e8
[ 5301.596190]  netif_set_real_num_tx_queues+0x288/0x770
[ 5301.596194]  veth_init_queues+0xfc/0x130 [veth]
[ 5301.596198]  veth_newlink+0x45c/0x850 [veth]
[ 5301.596202]  rtnl_newlink_create+0x2c8/0x798
[ 5301.596205]  __rtnl_newlink+0x92c/0xb60
[ 5301.596208]  rtnl_newlink+0xd8/0x130
[ 5301.596211]  rtnetlink_rcv_msg+0x2e0/0x890
[ 5301.596214]  netlink_rcv_skb+0x1c4/0x380
[ 5301.596225]  rtnetlink_rcv+0x20/0x38
[ 5301.596227]  netlink_unicast+0x3c8/0x640
[ 5301.596231]  netlink_sendmsg+0x658/0xa60
[ 5301.596234]  __sock_sendmsg+0xd0/0x180
[ 5301.596243]  __sys_sendto+0x1c0/0x280
[ 5301.596246]  __arm64_sys_sendto+0xc8/0x150
[ 5301.596249]  invoke_syscall+0xdc/0x268
[ 5301.596256]  el0_svc_common.constprop.0+0x16c/0x240
[ 5301.596259]  do_el0_svc+0x48/0x68
[ 5301.596261]  el0_svc+0x50/0x188
[ 5301.596265]  el0t_64_sync_handler+0x120/0x130
[ 5301.596268]  el0t_64_sync+0x194/0x198
[ 5301.596272] Code: eb15001f 54000120 d343fc02 12000801 (38f46842)
[ 5301.596285] SMP: stopping secondary CPUs
[ 5301.597053] Starting crashdump kernel...
[ 5301.597057] Bye!

After applying our patch, I didn't find any kernel panic errors.

We've found a simple reproducer

 # echo 1 > /sys/kernel/debug/tracing/events/qdisc/qdisc_reset/enable

 # ip link add veth0 type veth peer name veth1

 Error: Unknown device type.

However, without our patch applied, I tested upstream 6.10.0-rc3 kernel
using the qdisc_reset event and the ip command on my qemu virtual machine.

This 2 commands makes always kernel panic.

Linux version: 6.10.0-rc3

[    0.000000] Linux version 6.10.0-rc3-00164-g44ef20baed8e-dirty
(paran@fedora) (gcc (GCC) 14.1.1 20240522 (Red Hat 14.1.1-4), GNU ld
version 2.41-34.fc40) kernel-patches#20 SMP PREEMPT Sat Jun 15 16:51:25 KST 2024

Kernel panic message:

[  615.236484] Internal error: Oops: 0000000096000005 [kernel-patches#1] PREEMPT SMP
[  615.237250] Dumping ftrace buffer:
[  615.237679]    (ftrace buffer empty)
[  615.238097] Modules linked in: veth crct10dif_ce virtio_gpu
virtio_dma_buf drm_shmem_helper drm_kms_helper zynqmp_fpga xilinx_can
xilinx_spi xilinx_selectmap xilinx_core xilinx_pr_decoupler versal_fpga
uvcvideo uvc videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev
videobuf2_common mc usbnet deflate zstd ubifs ubi rcar_canfd rcar_can
omap_mailbox ntb_msi_test ntb_hw_epf lattice_sysconfig_spi
lattice_sysconfig ice40_spi gpio_xilinx dwmac_altr_socfpga mdio_regmap
stmmac_platform stmmac pcs_xpcs dfl_fme_region dfl_fme_mgr dfl_fme_br
dfl_afu dfl fpga_region fpga_bridge can can_dev br_netfilter bridge stp
llc atl1c ath11k_pci mhi ath11k_ahb ath11k qmi_helpers ath10k_sdio
ath10k_pci ath10k_core ath mac80211 libarc4 cfg80211 drm fuse backlight ipv6
Jun 22 02:36:5[3   6k152.62-4sm98k4-0k]v  kCePUr:n e1l :P IUDn:a b4le6
8t oC ohmma: nidpl eN oketr nteali nptaedg i6n.g1 0re.0q-urecs3t- 0at0
1v6i4r-tgu4a4le fa2d0dbraeeds0se-dir tyd f#f2f08
  615.252376] Hardware name: linux,dummy-virt (DT)
[  615.253220] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[  615.254433] pc : strnlen+0x6c/0xe0
[  615.255096] lr : trace_event_get_offsets_qdisc_reset+0x94/0x3d0
[  615.256088] sp : ffff800080b269a0
[  615.256615] x29: ffff800080b269a0 x28: ffffc070f3f98500 x27:
0000000000000001
[  615.257831] x26: 0000000000000010 x25: ffffc070f3f98540 x24:
ffffc070f619cf60
[  615.259020] x23: 0000000000000128 x22: 0000000000000138 x21:
dfff800000000000
[  615.260241] x20: ffffc070f631ad00 x19: 0000000000000128 x18:
ffffc070f448b800
[  615.261454] x17: 0000000000000000 x16: 0000000000000001 x15:
ffffc070f4ba2a90
[  615.262635] x14: ffff700010164d73 x13: 1ffff80e1e8d5eb3 x12:
1ffff00010164d72
[  615.263877] x11: ffff700010164d72 x10: dfff800000000000 x9 :
ffffc070e85d6184
[  615.265047] x8 : ffffc070e4402070 x7 : 000000000000f1f1 x6 :
000000001504a6d3
[  615.266336] x5 : ffff28ca21122140 x4 : ffffc070f5043ea8 x3 :
0000000000000000
[  615.267528] x2 : 0000000000000025 x1 : 0000000000000000 x0 :
0000000000000000
[  615.268747] Call trace:
[  615.269180]  strnlen+0x6c/0xe0
[  615.269767]  trace_event_get_offsets_qdisc_reset+0x94/0x3d0
[  615.270716]  trace_event_raw_event_qdisc_reset+0xe8/0x4e8
[  615.271667]  __traceiter_qdisc_reset+0xa0/0x140
[  615.272499]  qdisc_reset+0x554/0x848
[  615.273134]  netif_set_real_num_tx_queues+0x360/0x9a8
[  615.274050]  veth_init_queues+0x110/0x220 [veth]
[  615.275110]  veth_newlink+0x538/0xa50 [veth]
[  615.276172]  __rtnl_newlink+0x11e4/0x1bc8
[  615.276944]  rtnl_newlink+0xac/0x120
[  615.277657]  rtnetlink_rcv_msg+0x4e4/0x1370
[  615.278409]  netlink_rcv_skb+0x25c/0x4f0
[  615.279122]  rtnetlink_rcv+0x48/0x70
[  615.279769]  netlink_unicast+0x5a8/0x7b8
[  615.280462]  netlink_sendmsg+0xa70/0x1190

Yeoreum and I don't know if the patch we wrote will fix the underlying
cause, but we think that priority is to prevent kernel panic happening.
So, we're sending this patch.

Fixes: 51270d5 ("tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string")
Link: https://lore.kernel.org/lkml/20240229143432.273b4871@gandalf.local.home/t/
Cc: netdev@vger.kernel.org
Tested-by: Yunseong Kim <yskelg@gmail.com>
Signed-off-by: Yunseong Kim <yskelg@gmail.com>
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Link: https://lore.kernel.org/r/20240624173320.24945-4-yskelg@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 27, 2024
The code in ocfs2_dio_end_io_write() estimates number of necessary
transaction credits using ocfs2_calc_extend_credits().  This however does
not take into account that the IO could be arbitrarily large and can
contain arbitrary number of extents.

Extent tree manipulations do often extend the current transaction but not
in all of the cases.  For example if we have only single block extents in
the tree, ocfs2_mark_extent_written() will end up calling
ocfs2_replace_extent_rec() all the time and we will never extend the
current transaction and eventually exhaust all the transaction credits if
the IO contains many single block extents.  Once that happens a
WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in
jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to
this error.  This was actually triggered by one of our customers on a
heavily fragmented OCFS2 filesystem.

To fix the issue make sure the transaction always has enough credits for
one extent insert before each call of ocfs2_mark_extent_written().

Heming Zhao said:

------
PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error"

PID: xxx  TASK: xxxx  CPU: 5  COMMAND: "SubmitThread-CA"
  #0 machine_kexec at ffffffff8c069932
  kernel-patches#1 __crash_kexec at ffffffff8c1338fa
  kernel-patches#2 panic at ffffffff8c1d69b9
  kernel-patches#3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2]
  kernel-patches#4 __ocfs2_abort at ffffffffc0c88387 [ocfs2]
  kernel-patches#5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2]
  kernel-patches#6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2]
  kernel-patches#7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2]
  kernel-patches#8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2]
  kernel-patches#9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2]
kernel-patches#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2]
kernel-patches#11 dio_complete at ffffffff8c2b9fa7
kernel-patches#12 do_blockdev_direct_IO at ffffffff8c2bc09f
kernel-patches#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2]
kernel-patches#14 generic_file_direct_write at ffffffff8c1dcf14
kernel-patches#15 __generic_file_write_iter at ffffffff8c1dd07b
kernel-patches#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2]
kernel-patches#17 aio_write at ffffffff8c2cc72e
kernel-patches#18 kmem_cache_alloc at ffffffff8c248dde
kernel-patches#19 do_io_submit at ffffffff8c2ccada
kernel-patches#20 do_syscall_64 at ffffffff8c004984
kernel-patches#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba

Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz
Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jul 17, 2024
There is a race condition in the CMT interrupt handler. In the interrupt
handler the driver sets a driver private flag, FLAG_IRQCONTEXT. This
flag is used to indicate any call to set_next_event() should not be
directly propagated to the device, but instead cached. This is done as
the interrupt handler itself reprograms the device when needed before it
completes and this avoids this operation to take place twice.

It is unclear why this design was chosen, my suspicion is to allow the
struct clock_event_device.event_handler callback, which is called while
the FLAG_IRQCONTEXT is set, can update the next event without having to
write to the device twice.

Unfortunately there is a race between when the FLAG_IRQCONTEXT flag is
set and later cleared where the interrupt handler have already started to
write the next event to the device. If set_next_event() is called in
this window the value is only cached in the driver but not written. This
leads to the board to misbehave, or worse lockup and produce a splat.

   rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
   rcu:     0-...!: (0 ticks this GP) idle=f5e0/0/0x0 softirq=519/519 fqs=0 (false positive?)
   rcu:     (detected by 1, t=6502 jiffies, g=-595, q=77 ncpus=2)
   Sending NMI from CPU 1 to CPUs 0:
   NMI backtrace for cpu 0
   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.10.0-rc5-arm64-renesas-00019-g74a6f86eaf1c-dirty #20
   Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
   pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
   pc : tick_check_broadcast_expired+0xc/0x40
   lr : cpu_idle_poll.isra.0+0x8c/0x168
   sp : ffff800081c63d70
   x29: ffff800081c63d70 x28: 00000000580000c8 x27: 00000000bfee5610
   x26: 0000000000000027 x25: 0000000000000000 x24: 0000000000000000
   x23: ffff00007fbb9100 x22: ffff8000818f1008 x21: ffff8000800ef07c
   x20: ffff800081c79ec0 x19: ffff800081c70c28 x18: 0000000000000000
   x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffc2c717d8
   x14: 0000000000000000 x13: ffff000009c18080 x12: ffff8000825f7fc0
   x11: 0000000000000000 x10: ffff8000818f3cd4 x9 : 0000000000000028
   x8 : ffff800081c79ec0 x7 : ffff800081c73000 x6 : 0000000000000000
   x5 : 0000000000000000 x4 : ffff7ffffe286000 x3 : 0000000000000000
   x2 : ffff7ffffe286000 x1 : ffff800082972900 x0 : ffff8000818f1008
   Call trace:
    tick_check_broadcast_expired+0xc/0x40
    do_idle+0x9c/0x280
    cpu_startup_entry+0x34/0x40
    kernel_init+0x0/0x11c
    do_one_initcall+0x0/0x260
    __primary_switched+0x80/0x88
   rcu: rcu_preempt kthread timer wakeup didn't happen for 6501 jiffies! g-595 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
   rcu:     Possible timer handling issue on cpu=0 timer-softirq=262
   rcu: rcu_preempt kthread starved for 6502 jiffies! g-595 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
   rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
   rcu: RCU grace-period kthread stack dump:
   task:rcu_preempt     state:I stack:0     pid:15    tgid:15    ppid:2      flags:0x00000008
   Call trace:
    __switch_to+0xbc/0x100
    __schedule+0x358/0xbe0
    schedule+0x48/0x148
    schedule_timeout+0xc4/0x138
    rcu_gp_fqs_loop+0x12c/0x764
    rcu_gp_kthread+0x208/0x298
    kthread+0x10c/0x110
    ret_from_fork+0x10/0x20

The design have been part of the driver since it was first merged in
early 2009. It becomes increasingly harder to trigger the issue the
older kernel version one tries. It only takes a few boots on v6.10-rc5,
while hundreds of boots are needed to trigger it on v5.10.

Close the race condition by using the CMT channel lock for the two
competing sections. The channel lock was added to the driver after its
initial design.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://lore.kernel.org/r/20240702190230.3825292-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Aug 9, 2024
The RISC-V kernel already has checks to ensure that memory which would
lie outside of the linear mapping is not used. However those checks
use memory_limit, which is used to implement the mem= kernel command
line option (to limit the total amount of memory, not its address
range). When memory is made up of two or more non-contiguous memory
banks this check is incorrect.

Two changes are made here:
 - add a call in setup_bootmem() to memblock_cap_memory_range() which
   will cause any memory which falls outside the linear mapping to be
   removed from the memory regions.
 - remove the check in create_linear_mapping_page_table() which was
   intended to remove memory which is outside the liner mapping based
   on memory_limit, as it is no longer needed. Note a check for
   mapping more memory than memory_limit (to implement mem=) is
   unnecessary because of the existing call to
   memblock_enforce_memory_limit().

This issue was seen when booting on a SV39 platform with two memory
banks:
  0x00,80000000 1GiB
  0x20,00000000 32GiB
This memory range is 158GiB from top to bottom, but the linear mapping
is limited to 128GiB, so the lower block of RAM will be mapped at
PAGE_OFFSET, and the upper block straddles the top of the linear
mapping.

This causes the following Oops:
[    0.000000] Linux version 6.10.0-rc2-gd3b8dd5b51dd-dirty (stuart.menefy@codasip.com) (riscv64-codasip-linux-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41.0.20231213) #20 SMP Sat Jun 22 11:34:22 BST 2024
[    0.000000] memblock_add: [0x0000000080000000-0x00000000bfffffff] early_init_dt_add_memory_arch+0x4a/0x52
[    0.000000] memblock_add: [0x0000002000000000-0x00000027ffffffff] early_init_dt_add_memory_arch+0x4a/0x52
...
[    0.000000] memblock_alloc_try_nid: 23724 bytes align=0x8 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 early_init_dt_alloc_memory_arch+0x1e/0x48
[    0.000000] memblock_reserve: [0x00000027ffff5350-0x00000027ffffaffb] memblock_alloc_range_nid+0xb8/0x132
[    0.000000] Unable to handle kernel paging request at virtual address fffffffe7fff5350
[    0.000000] Oops [#1]
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.10.0-rc2-gd3b8dd5b51dd-dirty #20
[    0.000000] Hardware name: codasip,a70x (DT)
[    0.000000] epc : __memset+0x8c/0x104
[    0.000000]  ra : memblock_alloc_try_nid+0x74/0x84
[    0.000000] epc : ffffffff805e88c8 ra : ffffffff806148f6 sp : ffffffff80e03d50
[    0.000000]  gp : ffffffff80ec4158 tp : ffffffff80e0bec0 t0 : fffffffe7fff52f8
[    0.000000]  t1 : 00000027ffffb000 t2 : 5f6b636f6c626d65 s0 : ffffffff80e03d90
[    0.000000]  s1 : 0000000000005cac a0 : fffffffe7fff5350 a1 : 0000000000000000
[    0.000000]  a2 : 0000000000005cac a3 : fffffffe7fffaff8 a4 : 000000000000002c
[    0.000000]  a5 : ffffffff805e88c8 a6 : 0000000000005cac a7 : 0000000000000030
[    0.000000]  s2 : fffffffe7fff5350 s3 : ffffffffffffffff s4 : 0000000000000000
[    0.000000]  s5 : ffffffff8062347e s6 : 0000000000000000 s7 : 0000000000000001
[    0.000000]  s8 : 0000000000002000 s9 : 00000000800226d0 s10: 0000000000000000
[    0.000000]  s11: 0000000000000000 t3 : ffffffff8080a928 t4 : ffffffff8080a928
[    0.000000]  t5 : ffffffff8080a928 t6 : ffffffff8080a940
[    0.000000] status: 0000000200000100 badaddr: fffffffe7fff5350 cause: 000000000000000f
[    0.000000] [<ffffffff805e88c8>] __memset+0x8c/0x104
[    0.000000] [<ffffffff8062349c>] early_init_dt_alloc_memory_arch+0x1e/0x48
[    0.000000] [<ffffffff8043e892>] __unflatten_device_tree+0x52/0x114
[    0.000000] [<ffffffff8062441e>] unflatten_device_tree+0x9e/0xb8
[    0.000000] [<ffffffff806046fe>] setup_arch+0xd4/0x5bc
[    0.000000] [<ffffffff806007aa>] start_kernel+0x76/0x81a
[    0.000000] Code: b823 02b2 bc23 02b2 b023 04b2 b423 04b2 b823 04b2 (bc23) 04b2
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

The problem is that memblock (unaware that some physical memory cannot
be used) has allocated memory from the top of memory but which is
outside the linear mapping region.

Signed-off-by: Stuart Menefy <stuart.menefy@codasip.com>
Fixes: c99127c ("riscv: Make sure the linear mapping does not use the kernel mapping")
Reviewed-by: David McKay <david.mckay@codasip.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240622114217.2158495-1-stuart.menefy@codasip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Sep 23, 2024
iter_finish_branch_entry() doesn't put the branch_info from/to map
elements creating memory leaks. This can be seen with:

```
$ perf record -e cycles -b perf test -w noploop
$ perf report -D
...
Direct leak of 984344 byte(s) in 123043 object(s) allocated from:
    #0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x564d3400d10b in map__get util/map.h:186
    #2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981
    #3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151
    #4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898
    #5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238
    #6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334
    #7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655
    #8 0x564d3403ba52 in do_flush util/ordered-events.c:245
    #9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324
    #10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708
    #11 0x564d34032480 in perf_session__process_event util/session.c:1877
    #12 0x564d340336ad in reader__read_event util/session.c:2399
    #13 0x564d34033fdc in reader__process_events util/session.c:2448
    #14 0x564d34033fdc in __perf_session__process_events util/session.c:2495
    #15 0x564d34033fdc in perf_session__process_events util/session.c:2661
    #16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065
    #17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805
    #18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350
    #19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403
    #20 0x564d33cdd827 in run_argv tools/perf/perf.c:447
    #21 0x564d33cdd827 in main tools/perf/perf.c:561
...
```

Clearing up the map_symbols properly creates maps reference count
issues so resolve those. Resolving this issue doesn't improve peak
heap consumption for the test above.

Committer testing:

  $ sudo dnf install libasan
  $ make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" CC=clang O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Yanteng Si <siyanteng@loongson.cn>
Link: https://lore.kernel.org/r/20240807065136.1039977-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Dec 15, 2024
…s_lock

For storing a value to a queue attribute, the queue_attr_store function
first freezes the queue (->q_usage_counter(io)) and then acquire
->sysfs_lock. This seems not correct as the usual ordering should be to
acquire ->sysfs_lock before freezing the queue. This incorrect ordering
causes the following lockdep splat which we are able to reproduce always
simply by accessing /sys/kernel/debug file using ls command:

[   57.597146] WARNING: possible circular locking dependency detected
[   57.597154] 6.12.0-10553-gb86545e02e8c #20 Tainted: G        W
[   57.597162] ------------------------------------------------------
[   57.597168] ls/4605 is trying to acquire lock:
[   57.597176] c00000003eb56710 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x58/0xc0
[   57.597200]
               but task is already holding lock:
[   57.597207] c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4
[   57.597226]
               which lock already depends on the new lock.

[   57.597233]
               the existing dependency chain (in reverse order) is:
[   57.597241]
               -> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}:
[   57.597255]        down_write+0x6c/0x18c
[   57.597264]        start_creating+0xb4/0x24c
[   57.597274]        debugfs_create_dir+0x2c/0x1e8
[   57.597283]        blk_register_queue+0xec/0x294
[   57.597292]        add_disk_fwnode+0x2e4/0x548
[   57.597302]        brd_alloc+0x2c8/0x338
[   57.597309]        brd_init+0x100/0x178
[   57.597317]        do_one_initcall+0x88/0x3e4
[   57.597326]        kernel_init_freeable+0x3cc/0x6e0
[   57.597334]        kernel_init+0x34/0x1cc
[   57.597342]        ret_from_kernel_user_thread+0x14/0x1c
[   57.597350]
               -> #4 (&q->debugfs_mutex){+.+.}-{4:4}:
[   57.597362]        __mutex_lock+0xfc/0x12a0
[   57.597370]        blk_register_queue+0xd4/0x294
[   57.597379]        add_disk_fwnode+0x2e4/0x548
[   57.597388]        brd_alloc+0x2c8/0x338
[   57.597395]        brd_init+0x100/0x178
[   57.597402]        do_one_initcall+0x88/0x3e4
[   57.597410]        kernel_init_freeable+0x3cc/0x6e0
[   57.597418]        kernel_init+0x34/0x1cc
[   57.597426]        ret_from_kernel_user_thread+0x14/0x1c
[   57.597434]
               -> #3 (&q->sysfs_lock){+.+.}-{4:4}:
[   57.597446]        __mutex_lock+0xfc/0x12a0
[   57.597454]        queue_attr_store+0x9c/0x110
[   57.597462]        sysfs_kf_write+0x70/0xb0
[   57.597471]        kernfs_fop_write_iter+0x1b0/0x2ac
[   57.597480]        vfs_write+0x3dc/0x6e8
[   57.597488]        ksys_write+0x84/0x140
[   57.597495]        system_call_exception+0x130/0x360
[   57.597504]        system_call_common+0x160/0x2c4
[   57.597516]
               -> #2 (&q->q_usage_counter(io)#21){++++}-{0:0}:
[   57.597530]        __submit_bio+0x5ec/0x828
[   57.597538]        submit_bio_noacct_nocheck+0x1e4/0x4f0
[   57.597547]        iomap_readahead+0x2a0/0x448
[   57.597556]        xfs_vm_readahead+0x28/0x3c
[   57.597564]        read_pages+0x88/0x41c
[   57.597571]        page_cache_ra_unbounded+0x1ac/0x2d8
[   57.597580]        filemap_get_pages+0x188/0x984
[   57.597588]        filemap_read+0x13c/0x4bc
[   57.597596]        xfs_file_buffered_read+0x88/0x17c
[   57.597605]        xfs_file_read_iter+0xac/0x158
[   57.597614]        vfs_read+0x2d4/0x3b4
[   57.597622]        ksys_read+0x84/0x144
[   57.597629]        system_call_exception+0x130/0x360
[   57.597637]        system_call_common+0x160/0x2c4
[   57.597647]
               -> #1 (mapping.invalidate_lock#2){++++}-{4:4}:
[   57.597661]        down_read+0x6c/0x220
[   57.597669]        filemap_fault+0x870/0x100c
[   57.597677]        xfs_filemap_fault+0xc4/0x18c
[   57.597684]        __do_fault+0x64/0x164
[   57.597693]        __handle_mm_fault+0x1274/0x1dac
[   57.597702]        handle_mm_fault+0x248/0x484
[   57.597711]        ___do_page_fault+0x428/0xc0c
[   57.597719]        hash__do_page_fault+0x30/0x68
[   57.597727]        do_hash_fault+0x90/0x35c
[   57.597736]        data_access_common_virt+0x210/0x220
[   57.597745]        _copy_from_user+0xf8/0x19c
[   57.597754]        sel_write_load+0x178/0xd54
[   57.597762]        vfs_write+0x108/0x6e8
[   57.597769]        ksys_write+0x84/0x140
[   57.597777]        system_call_exception+0x130/0x360
[   57.597785]        system_call_common+0x160/0x2c4
[   57.597794]
               -> #0 (&mm->mmap_lock){++++}-{4:4}:
[   57.597806]        __lock_acquire+0x17cc/0x2330
[   57.597814]        lock_acquire+0x138/0x400
[   57.597822]        __might_fault+0x7c/0xc0
[   57.597830]        filldir64+0xe8/0x390
[   57.597839]        dcache_readdir+0x80/0x2d4
[   57.597846]        iterate_dir+0xd8/0x1d4
[   57.597855]        sys_getdents64+0x88/0x2d4
[   57.597864]        system_call_exception+0x130/0x360
[   57.597872]        system_call_common+0x160/0x2c4
[   57.597881]
               other info that might help us debug this:

[   57.597888] Chain exists of:
                 &mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3

[   57.597905]  Possible unsafe locking scenario:

[   57.597911]        CPU0                    CPU1
[   57.597917]        ----                    ----
[   57.597922]   rlock(&sb->s_type->i_mutex_key#3);
[   57.597932]                                lock(&q->debugfs_mutex);
[   57.597940]                                lock(&sb->s_type->i_mutex_key#3);
[   57.597950]   rlock(&mm->mmap_lock);
[   57.597958]
                *** DEADLOCK ***

[   57.597965] 2 locks held by ls/4605:
[   57.597971]  #0: c0000000137c12f8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0xcc/0x154
[   57.597989]  #1: c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4

Prevent the above lockdep warning by acquiring ->sysfs_lock before
freezing the queue while storing a queue attribute in queue_attr_store
function. Later, we also found[1] another function __blk_mq_update_nr_
hw_queues where we first freeze queue and then acquire the ->sysfs_lock.
So we've also updated lock ordering in __blk_mq_update_nr_hw_queues
function and ensured that in all code paths we follow the correct lock
ordering i.e. acquire ->sysfs_lock before freezing the queue.

[1] https://lore.kernel.org/all/CAFj5m9Ke8+EHKQBs_Nk6hqd=LGXtk4mUxZUN5==ZcCjnZSBwHw@mail.gmail.com/

Reported-by: kjain@linux.ibm.com
Fixes: af28141 ("block: freeze the queue in queue_attr_store")
Tested-by: kjain@linux.ibm.com
Cc: hch@lst.de
Cc: axboe@kernel.dk
Cc: ritesh.list@gmail.com
Cc: ming.lei@redhat.com
Cc: gjoyce@linux.ibm.com
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20241210144222.1066229-1-nilay@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants