-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix building several modules as built-in #9
Comments
To expand a bit on this: a quick workaround would be hiding symbols, but ideally we want to share the common code, even for loadable modules. The reason is that we want to be able to build kernels with Rust support built-in so that Rust loadable modules can be as small as possible. In fact, that will likely be the most common use case as soon as distro kernels start shipping with Rust modules. Until then, third-party modules would still need a way to embed the Rust support. |
The recent commit 01eb018 ("powerpc/64s: Fix restore_math unnecessarily changing MSR") changed some of the handling of floating point/vector restore. In particular it caused current->thread.fpexc_mode to be copied into the current MSR (via msr_check_and_set()), rather than just into regs->msr (which is moved into MSR on return to userspace). This can lead to a crash in the kernel if we take a floating point exception when restoring FPSCR: Oops: Exception in kernel mode, sig: 8 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 3 PID: 101213 Comm: ld64.so.2 Not tainted 5.9.0-rc1-00098-g18445bf405cb-dirty #9 NIP: c00000000000fbb4 LR: c00000000001a7ac CTR: c000000000183570 REGS: c0000016b7cfb3b0 TRAP: 0700 Not tainted (5.9.0-rc1-00098-g18445bf405cb-dirty) MSR: 900000000290b933 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44002444 XER: 00000000 CFAR: c00000000001a7a8 IRQMASK: 1 GPR00: c00000000001ae40 c0000016b7cfb640 c0000000011b7f00 c000001542a0f740 GPR04: c000001542a0f720 c000001542a0eb00 0000000000000900 c000001542a0eb00 GPR08: 000000000000000a 0000000000002000 9000000000009033 0000000000000000 GPR12: 0000000000004000 c0000017ffffd900 0000000000000001 c000000000df5a58 GPR16: c000000000e19c18 c0000000010e1123 0000000000000001 c000000000e1a638 GPR20: 0000000000000000 c0000000044b1d00 0000000000000000 c000001542a0f2a0 GPR24: 00000016c7fe0000 c000001542a0f720 c000000001c93da0 c000000000fe5f28 GPR28: c000001542a0f720 0000000000800000 c0000016b7cfbe90 0000000002802900 NIP load_fp_state+0x4/0x214 LR restore_math+0x17c/0x1f0 Call Trace: 0xc0000016b7cfb680 (unreliable) __switch_to+0x330/0x460 __schedule+0x318/0x920 schedule+0x74/0x140 schedule_timeout+0x318/0x3f0 wait_for_completion+0xc8/0x210 call_usermodehelper_exec+0x234/0x280 do_coredump+0xedc/0x13c0 get_signal+0x1d4/0xbe0 do_notify_resume+0x1a0/0x490 interrupt_exit_user_prepare+0x1c4/0x230 interrupt_return+0x14/0x1c0 Instruction dump: ebe10168 e88101a0 7c8ff120 382101e0 e8010010 7c0803a6 4e800020 790605c4 782905c4 7c0008a8 7c0008a8 c8030200 <fffe058e> 48000088 c8030000 c8230010 Fix it by only loading the fpexc_mode value into regs->msr. Also add a comment to explain that although VSX is subject to the value of fpexc_mode, we don't have to handle that separately because we only allow VSX to be enabled if FP is also enabled. Fixes: 01eb018 ("powerpc/64s: Fix restore_math unnecessarily changing MSR") Reported-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/r/20200825093424.3967813-1-mpe@ellerman.id.au
…s metrics" test Linux 5.9 introduced perf test case "Parse and process metrics" and on s390 this test case always dumps core: [root@t35lp67 perf]# ./perf test -vvvv -F 67 67: Parse and process metrics : --- start --- metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC parsing metric: inst_retired.any / cpu_clk_unhalted.thread Segmentation fault (core dumped) [root@t35lp67 perf]# I debugged this core dump and gdb shows this call chain: (gdb) where #0 0x000003ffabc3192a in __strnlen_c_1 () from /lib64/libc.so.6 #1 0x000003ffabc293de in strcasestr () from /lib64/libc.so.6 #2 0x0000000001102ba2 in match_metric(list=0x1e6ea20 "inst_retired.any", n=<optimized out>) at util/metricgroup.c:368 #3 find_metric (map=<optimized out>, map=<optimized out>, metric=0x1e6ea20 "inst_retired.any") at util/metricgroup.c:765 #4 __resolve_metric (ids=0x0, map=<optimized out>, metric_list=0x0, metric_no_group=<optimized out>, m=<optimized out>) at util/metricgroup.c:844 #5 resolve_metric (ids=0x0, map=0x0, metric_list=0x0, metric_no_group=<optimized out>) at util/metricgroup.c:881 #6 metricgroup__add_metric (metric=<optimized out>, metric_no_group=metric_no_group@entry=false, events=<optimized out>, events@entry=0x3ffd84fb878, metric_list=0x0, metric_list@entry=0x3ffd84fb868, map=0x0) at util/metricgroup.c:943 #7 0x00000000011034ae in metricgroup__add_metric_list (map=0x13f9828 <map>, metric_list=0x3ffd84fb868, events=0x3ffd84fb878, metric_no_group=<optimized out>, list=<optimized out>) at util/metricgroup.c:988 #8 parse_groups (perf_evlist=perf_evlist@entry=0x1e70260, str=str@entry=0x12f34b2 "IPC", metric_no_group=<optimized out>, metric_no_merge=<optimized out>, fake_pmu=fake_pmu@entry=0x1462f18 <perf_pmu.fake>, metric_events=0x3ffd84fba58, map=0x1) at util/metricgroup.c:1040 #9 0x0000000001103eb2 in metricgroup__parse_groups_test( evlist=evlist@entry=0x1e70260, map=map@entry=0x13f9828 <map>, str=str@entry=0x12f34b2 "IPC", metric_no_group=metric_no_group@entry=false, metric_no_merge=metric_no_merge@entry=false, metric_events=0x3ffd84fba58) at util/metricgroup.c:1082 #10 0x00000000010c84d8 in __compute_metric (ratio2=0x0, name2=0x0, ratio1=<synthetic pointer>, name1=0x12f34b2 "IPC", vals=0x3ffd84fbad8, name=0x12f34b2 "IPC") at tests/parse-metric.c:159 #11 compute_metric (ratio=<synthetic pointer>, vals=0x3ffd84fbad8, name=0x12f34b2 "IPC") at tests/parse-metric.c:189 #12 test_ipc () at tests/parse-metric.c:208 ..... ..... omitted many more lines This test case was added with commit 218ca91 ("perf tests: Add parse metric test for frontend metric"). When I compile with make DEBUG=y it works fine and I do not get a core dump. It turned out that the above listed function call chain worked on a struct pmu_event array which requires a trailing element with zeroes which was missing. The marco map_for_each_event() loops over that array tests for members metric_expr/metric_name/metric_group being non-NULL. Adding this element fixes the issue. Output after: [root@t35lp46 perf]# ./perf test 67 67: Parse and process metrics : Ok [root@t35lp46 perf]# Committer notes: As Ian remarks, this is not s390 specific: <quote Ian> This also shows up with address sanitizer on all architectures (perhaps change the patch title) and perhaps add a "Fixes: <commit>" tag. ================================================================= ==4718==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55c93b4d59e8 at pc 0x55c93a1541e2 bp 0x7ffd24327c60 sp 0x7ffd24327c58 READ of size 8 at 0x55c93b4d59e8 thread T0 #0 0x55c93a1541e1 in find_metric tools/perf/util/metricgroup.c:764:2 #1 0x55c93a153e6c in __resolve_metric tools/perf/util/metricgroup.c:844:9 #2 0x55c93a152f18 in resolve_metric tools/perf/util/metricgroup.c:881:9 #3 0x55c93a1528db in metricgroup__add_metric tools/perf/util/metricgroup.c:943:9 #4 0x55c93a151996 in metricgroup__add_metric_list tools/perf/util/metricgroup.c:988:9 #5 0x55c93a1511b9 in parse_groups tools/perf/util/metricgroup.c:1040:8 #6 0x55c93a1513e1 in metricgroup__parse_groups_test tools/perf/util/metricgroup.c:1082:9 #7 0x55c93a0108ae in __compute_metric tools/perf/tests/parse-metric.c:159:8 #8 0x55c93a010744 in compute_metric tools/perf/tests/parse-metric.c:189:9 #9 0x55c93a00f5ee in test_ipc tools/perf/tests/parse-metric.c:208:2 #10 0x55c93a00f1e8 in test__parse_metric tools/perf/tests/parse-metric.c:345:2 #11 0x55c939fd7202 in run_test tools/perf/tests/builtin-test.c:410:9 #12 0x55c939fd6736 in test_and_print tools/perf/tests/builtin-test.c:440:9 #13 0x55c939fd58c3 in __cmd_test tools/perf/tests/builtin-test.c:661:4 #14 0x55c939fd4e02 in cmd_test tools/perf/tests/builtin-test.c:807:9 #15 0x55c939e4763d in run_builtin tools/perf/perf.c:313:11 #16 0x55c939e46475 in handle_internal_command tools/perf/perf.c:365:8 #17 0x55c939e4737e in run_argv tools/perf/perf.c:409:2 #18 0x55c939e45f7e in main tools/perf/perf.c:539:3 0x55c93b4d59e8 is located 0 bytes to the right of global variable 'pme_test' defined in 'tools/perf/tests/parse-metric.c:17:25' (0x55c93b4d54a0) of size 1352 SUMMARY: AddressSanitizer: global-buffer-overflow tools/perf/util/metricgroup.c:764:2 in find_metric Shadow bytes around the buggy address: 0x0ab9a7692ae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692af0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0ab9a7692b30: 00 00 00 00 00 00 00 00 00 00 00 00 00[f9]f9 f9 0x0ab9a7692b40: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0ab9a7692b50: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0ab9a7692b60: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00 0x0ab9a7692b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc </quote> I'm also adding the missing "Fixes" tag and setting just .name to NULL, as doing it that way is more compact (the compiler will zero out everything else) and the table iterators look for .name being NULL as the sentinel marking the end of the table. Fixes: 0a507af ("perf tests: Add parse metric test for ipc metric") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200825071211.16959-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The aliases were never released causing the following leaks: Indirect leak of 1224 byte(s) in 9 object(s) allocated from: #0 0x7feefb830628 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x107628) #1 0x56332c8f1b62 in __perf_pmu__new_alias util/pmu.c:322 #2 0x56332c8f401f in pmu_add_cpu_aliases_map util/pmu.c:778 #3 0x56332c792ce9 in __test__pmu_event_aliases tests/pmu-events.c:295 #4 0x56332c792ce9 in test_aliases tests/pmu-events.c:367 #5 0x56332c76a09b in run_test tests/builtin-test.c:410 #6 0x56332c76a09b in test_and_print tests/builtin-test.c:440 #7 0x56332c76ce69 in __cmd_test tests/builtin-test.c:695 #8 0x56332c76ce69 in cmd_test tests/builtin-test.c:807 #9 0x56332c7d2214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #10 0x56332c6701a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #11 0x56332c6701a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #12 0x56332c6701a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #13 0x7feefb359cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: 956a783 ("perf test: Test pmu-events aliases") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-11-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The evsel->unit borrows a pointer of pmu event or alias instead of owns a string. But tool event (duration_time) passes a result of strdup() caused a leak. It was found by ASAN during metric test: Direct leak of 210 byte(s) in 70 object(s) allocated from: #0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5) #1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414 #2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414 #3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439 #4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096 #5 0x559fbbcc95da in __parse_events util/parse-events.c:2141 #6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406 #7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393 #8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415 #9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498 #10 0x559fbbc0109b in run_test tests/builtin-test.c:410 #11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440 #12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695 #13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807 #14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: f0fbb11 ("perf stat: Implement duration_time as a proper event") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The test_generic_metric() missed to release entries in the pctx. Asan reported following leak (and more): Direct leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7f4c9396980e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) #1 0x55f7e748cc14 in hashmap_grow (/home/namhyung/project/linux/tools/perf/perf+0x90cc14) #2 0x55f7e748d497 in hashmap__insert (/home/namhyung/project/linux/tools/perf/perf+0x90d497) #3 0x55f7e7341667 in hashmap__set /home/namhyung/project/linux/tools/perf/util/hashmap.h:111 #4 0x55f7e7341667 in expr__add_ref util/expr.c:120 #5 0x55f7e7292436 in prepare_metric util/stat-shadow.c:783 #6 0x55f7e729556d in test_generic_metric util/stat-shadow.c:858 #7 0x55f7e712390b in compute_single tests/parse-metric.c:128 #8 0x55f7e712390b in __compute_metric tests/parse-metric.c:180 #9 0x55f7e712446d in compute_metric tests/parse-metric.c:196 #10 0x55f7e712446d in test_dcache_l2 tests/parse-metric.c:295 #11 0x55f7e712446d in test__parse_metric tests/parse-metric.c:355 #12 0x55f7e70be09b in run_test tests/builtin-test.c:410 #13 0x55f7e70be09b in test_and_print tests/builtin-test.c:440 #14 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661 #15 0x55f7e70c101a in cmd_test tests/builtin-test.c:807 #16 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #17 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #18 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #19 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #20 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: 6d432c4 ("perf tools: Add test_generic_metric function") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The metricgroup__add_metric() can find multiple match for a metric group and it's possible to fail. Also it can fail in the middle like in resolve_metric() even for single metric. In those cases, the intermediate list and ids will be leaked like: Direct leak of 3 byte(s) in 1 object(s) allocated from: #0 0x7f4c938f40b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5) #1 0x55f7e71c1bef in __add_metric util/metricgroup.c:683 #2 0x55f7e71c31d0 in add_metric util/metricgroup.c:906 #3 0x55f7e71c3844 in metricgroup__add_metric util/metricgroup.c:940 #4 0x55f7e71c488d in metricgroup__add_metric_list util/metricgroup.c:993 #5 0x55f7e71c488d in parse_groups util/metricgroup.c:1045 #6 0x55f7e71c60a4 in metricgroup__parse_groups_test util/metricgroup.c:1087 #7 0x55f7e71235ae in __compute_metric tests/parse-metric.c:164 #8 0x55f7e7124650 in compute_metric tests/parse-metric.c:196 #9 0x55f7e7124650 in test_recursion_fail tests/parse-metric.c:318 #10 0x55f7e7124650 in test__parse_metric tests/parse-metric.c:356 #11 0x55f7e70be09b in run_test tests/builtin-test.c:410 #12 0x55f7e70be09b in test_and_print tests/builtin-test.c:440 #13 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661 #14 0x55f7e70c101a in cmd_test tests/builtin-test.c:807 #15 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #16 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #17 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #18 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #19 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: 83de0b7 ("perf metric: Collect referenced metrics in struct metric_ref_node") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The following leaks were detected by ASAN: Indirect leak of 360 byte(s) in 9 object(s) allocated from: #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) #1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333 #2 0x560578f752fc in perf_pmu_parse util/pmu.y:59 #3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73 #4 0x560578e07045 in test__pmu tests/pmu.c:155 #5 0x560578de109b in run_test tests/builtin-test.c:410 #6 0x560578de109b in test_and_print tests/builtin-test.c:440 #7 0x560578de401a in __cmd_test tests/builtin-test.c:661 #8 0x560578de401a in cmd_test tests/builtin-test.c:807 #9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: cff7f95 ("perf tests: Move pmu tests into separate object") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure. Way less files around! - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Offline builds, always; i.e. there is no "online compilation" anymore (fixes #17). - No more interleaved Cargo output (fixes #29). - One less nightly dependency (Cargo's `build-std`); since now we manage the cross-compilation ourselves (should fix #27). - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), following what the user selected for C (no Cargo profiles). - Added Kconfig menu for tweaking `rustc` options, like overflow checks. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Updated the quick start guide. - Updated CI `.config`s: + Add the new options and test with 2 builtins and 2 loadables. At the same time, remove the matrix test for builtin/loadable. + Debug: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. + Release: disabled `EXPERT` and changed a few things to make it look more like a normal configuration. + Also update both configs to v5.9 while I was at it. (I could have split a few of these ones off into another PR, but anyway it is for the CI only and I had already done it). - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is one more nightly feature used (the new Rust mangling scheme), but we know that one will be stable (and the default one, later on). - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. There are a few TODOs that we can improve later if we agree on this: - Kbuild: + Actually use the `*.d` files. + Complete `make clean`. + Support single-object compilation. + Pass `objtool` to make the ORC unwinder work. + Echo the building of the rust/* libraries and the bindgen call. - Figure out how to pick symbols to export automatically from Rust code instead of managing the list by hand. Perhaps we could use a no-op macro on the Rust code, which is then parse by a script to pick up the symbols: pub fn foo() {} export_symbol!(foo); Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure. Way less files around! - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Offline builds, always; i.e. there is no "online compilation" anymore (fixes #17). - No more interleaved Cargo output (fixes #29). - One less nightly dependency (Cargo's `build-std`); since now we manage the cross-compilation ourselves (should fix #27). - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), following what the user selected for C (no Cargo profiles). - Added Kconfig menu for tweaking `rustc` options, like overflow checks. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Updated the quick start guide. - Updated CI `.config`s: + Add the new options and test with 2 builtins and 2 loadables. At the same time, remove the matrix test for builtin/loadable. + Debug: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. + Release: disabled `EXPERT` and changed a few things to make it look more like a normal configuration. + Also update both configs to v5.9 while I was at it. (I could have split a few of these ones off into another PR, but anyway it is for the CI only and I had already done it). - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is one more nightly feature used (the new Rust mangling scheme), but we know that one will be stable (and the default one, later on). - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. There are a few TODOs that we can improve later if we agree on this: - Kbuild: + Actually use the `*.d` files. + Complete `make clean`. + Support single-object compilation. + Pass `objtool` to make the ORC unwinder work. + Echo the building of the rust/* libraries and the bindgen call. - Figure out how to pick symbols to export automatically from Rust code instead of managing the list by hand. Perhaps we could use a no-op macro on the Rust code, which is then parse by a script to pick up the symbols: pub fn foo() {} export_symbol!(foo); Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure. Way less files around! - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Offline builds, always; i.e. there is no "online compilation" anymore (fixes #17). - No more interleaved Cargo output (fixes #29). - One less nightly dependency (Cargo's `build-std`); since now we manage the cross-compilation ourselves (should fix #27). - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), following what the user selected for C (no Cargo profiles). - Added Kconfig menu for tweaking `rustc` options, like overflow checks. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Updated the quick start guide. - Updated CI `.config`s: + Add the new options and test with 2 builtins and 2 loadables. At the same time, remove the matrix test for builtin/loadable. + Debug: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. + Release: disabled `EXPERT` and changed a few things to make it look more like a normal configuration. + Also update both configs to v5.9 while I was at it. (I could have split a few of these ones off into another PR, but anyway it is for the CI only and I had already done it). - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is one more nightly feature used (the new Rust mangling scheme), but we know that one will be stable (and the default one, later on). - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. There are a few TODOs that we can improve later if we agree on this: - Kbuild: + Actually use the `*.d` files. + Complete `make clean`. + Support single-object compilation. + Pass `objtool` to make the ORC unwinder work. + Echo the building of the rust/* libraries and the bindgen call. - Figure out how to pick symbols to export automatically from Rust code instead of managing the list by hand. Perhaps we could use a no-op macro on the Rust code, which is then parse by a script to pick up the symbols: pub fn foo() {} export_symbol!(foo); Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure. Way less files around! - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Offline builds, always; i.e. there is no "online compilation" anymore (fixes #17). - No more interleaved Cargo output (fixes #29). - One less nightly dependency (Cargo's `build-std`); since now we manage the cross-compilation ourselves (should fix #27). - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), following what the user selected for C (no Cargo profiles). - Added Kconfig menu for tweaking `rustc` options, like overflow checks. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Updated the quick start guide. - Updated CI `.config`s: + Add the new options and test with 2 builtins and 2 loadables. At the same time, remove the matrix test for builtin/loadable. + Debug: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. + Release: disabled `EXPERT` and changed a few things to make it look more like a normal configuration. + Also update both configs to v5.9 while I was at it. (I could have split a few of these ones off into another PR, but anyway it is for the CI only and I had already done it). - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is one more nightly feature used (the new Rust mangling scheme), but we know that one will be stable (and the default one, later on). - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. There are a few TODOs that we can improve later if we agree on this: - Kbuild: + Actually use the `*.d` files. + Complete `make clean`. + Support single-object compilation. + Pass `objtool` to make the ORC unwinder work. + Echo the building of the rust/* libraries and the bindgen call. - Figure out how to pick symbols to export automatically from Rust code instead of managing the list by hand. Perhaps we could use a no-op macro on the Rust code, which is then parse by a script to pick up the symbols: pub fn foo() {} export_symbol!(foo); Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, etc. all trigger recompilation of the proper things. + Works as expected with parallel support (`-j`). - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes #17). - No more interleaved Cargo output (fixes #29). - One less nightly dependency (Cargo's `build-std`); since now we manage the cross-compilation ourselves (should fix #27). - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), following what the user selected for C (no Cargo profiles). - Added Kconfig menu for tweaking `rustc` options, like overflow checks. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Updated the quick start guide. - Updated CI `.config`s: + Add the new options and test with 2 builtins and 2 loadables. At the same time, remove the matrix test for builtin/loadable. + Debug: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. + Release: disabled `EXPERT` and changed a few things to make it look more like a normal configuration. + Also update both configs to v5.9 while I was at it. (I could have split a few of these ones off into another PR, but anyway it is for the CI only and I had already done it). - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is one more nightly feature used (the new Rust mangling scheme), but we know that one will be stable (and the default one, later on). - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, command-line changes, flag changes, etc. all trigger recompilation as needed. + Works as expected with parallel support (`-j`). - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes #17). - No interleaved Cargo output (fixes #29). - No nightly dependency on Cargo's `build-std`; since now we manage the cross-compilation ourselves (should fix #27). - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), following what the user selected for C (no Cargo profiles). - Added Kconfig menu for tweaking relevant `rustc` options, like overflow checks, debug assertions, optimization level, etc. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Updated the quick start guide. - Updated CI `.config`s: + Add the new options and test with 2 builtins and 2 loadables. At the same time, remove the matrix test for builtin/loadable. + Debug: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. + Release: disabled `EXPERT` and changed a few things to make it look more like a normal configuration. + Also update both configs to v5.9 while I was at it. (I could have split a few of these ones off into another PR, but anyway it is for the CI only and I had already done it). - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is two more nightly features used: + The new Rust mangling scheme: we know it will be stable (and the default on, later on). + The binary dep-info output: if we remove all other nightly features, this one can easily go too. - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, command-line changes, flag changes, etc. all trigger recompilation as needed. + Works as expected with parallel support (`-j`). - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes #17). - No interleaved Cargo output (fixes #29). - No nightly dependency on Cargo's `build-std`; since now we manage the cross-compilation ourselves (should fix #27). - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), following what the user selected for C (no Cargo profiles). - Added Kconfig menu for tweaking relevant `rustc` options, like overflow checks, debug assertions, optimization level, etc. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Updated the quick start guide. - Updated CI `.config`s: + Add the new options and test with 2 builtins and 2 loadables. At the same time, remove the matrix test for builtin/loadable. + Debug: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. + Release: disabled `EXPERT` and changed a few things to make it look more like a normal configuration. + Also update both configs to v5.9 while I was at it. (I could have split a few of these ones off into another PR, but anyway it is for the CI only and I had already done it). - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is two more nightly features used: + The new Rust mangling scheme: we know it will be stable (and the default on, later on). + The binary dep-info output: if we remove all other nightly features, this one can easily go too. - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, command-line changes, flag changes, etc. all trigger recompilation as needed. + Works as expected with parallel support (`-j`). - Automatic generation of the `exports.c` list: + Instead of manually handling the list, all non-local functions available in `core`, `alloc` and `kernel` are exported, so all modules should work, regardless of what they need, and without failing linking due to symbols in the manual list not existing (e.g. due to differences in config options). + They are a lot, though: * ~6k Rust symbols vs. ~4k C symbols in release. * However, 4k of those are `bindings_raw` (i.e. duplicated C ones), which shouldn't be exported. Thus we should look into making `bindings_raw` private to the crate (at the moment, the (first) Rust example requires `<kernel::bindings...::miscdevice as Default>::default`). + Licensing: * `kernel`'s symbols are exported as GPL. * `core`'s and `alloc`'s symbols are exported as non-GPL so that third-parties can build Rust modules as long as they write their own kernel support infrastructure, i.e. without taking advantage of `kernel`. This seemed to make the most sense compared to other exports from the kernel, plus it follows more closely the original licence of the crates. - Support for GCC-compiled kernels. + The generated bindings do not have meaningful differences in our release config, between GCC 10.1 and Clang 11. + Other configs (e.g. our debug one) may add/remove types and functions. That is fine unless we use them form our bindings. + However, there are config options that may not work (e.g. the randstruct GCC plugin if we use one of those structs). - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes #17). - No interleaved Cargo output (fixes #29). - No nightly dependency on Cargo's `build-std`; since now we manage the cross-compilation ourselves (should fix #27). - "Big" kallsyms symbol support: + I already raised ksym names from 128 to 256 back when I wrote the first integration. However, Rust symbols can be huge in debug/non-optimized, so I increased it again to 512; plus the module name from 56 to 248. + In turn, this required tuning the table format to support 2-byte lengths for ksyms. Compression at generation and kernel decompression is covered, although it may be the case that some script/tool also requires changes to understand the new table format. - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), by default following what the user selected for C, but customizable through a Kconfig menu. As well as options for tweaking overflow checks, debug assertions, etc. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Updated the quick start guide. - Updated CI `.config`s: + Add the new options and test with 2 builtins and 2 loadables. At the same time, remove the matrix test for builtin/loadable. + Updated with `toolchain` matrix support: now we test building with GCC, Clang or a full LLVM toolchain. + Debug: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. Also enable `-C opt-level=0` to test that such an extreme works and also to see how much bloated everything becomes. + Release: disabled `EXPERT` and changed a few things to make it look more like a normal configuration. + Also update both configs to v5.10 and `LLVM=1` while I was at it. (I could have split a few of these ones off into another PR, but anyway it is for the CI only and I had already done it). - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is two more nightly features used: + The new Rust mangling scheme: we know it will be stable (and the default on, later on). + The binary dep-info output: if we remove all other nightly features, this one can easily go too. - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, command-line changes, flag changes, etc. all trigger recompilation as needed. + Works as expected with parallel support (`-j`). - Automatic generation of the `exports.c` list: + Instead of manually handling the list, all non-local functions available in `core`, `alloc` and `kernel` are exported, so all modules should work, regardless of what they need, and without failing linking due to symbols in the manual list not existing (e.g. due to differences in config options). + They are a lot, though: * ~6k Rust symbols vs. ~4k C symbols in release. * However, 4k of those are `bindings_raw` (i.e. duplicated C ones), which shouldn't be exported. Thus we should look into making `bindings_raw` private to the crate (at the moment, the (first) Rust example requires `<kernel::bindings...::miscdevice as Default>::default`). + Licensing: * `kernel`'s symbols are exported as GPL. * `core`'s and `alloc`'s symbols are exported as non-GPL so that third-parties can build Rust modules as long as they write their own kernel support infrastructure, i.e. without taking advantage of `kernel`. This seemed to make the most sense compared to other exports from the kernel, plus it follows more closely the original licence of the crates. - Support for GCC-compiled kernels. + The generated bindings do not have meaningful differences in our release config, between GCC 10.1 and Clang 11. + Other configs (e.g. our debug one) may add/remove types and functions. That is fine unless we use them form our bindings. + However, there are config options that may not work (e.g. the randstruct GCC plugin if we use one of those structs). - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes #17). - No interleaved Cargo output (fixes #29). - No nightly dependency on Cargo's `build-std`; since now we manage the cross-compilation ourselves (should fix #27). - "Big" kallsyms symbol support: + I already raised ksym names from 128 to 256 back when I wrote the first integration. However, Rust symbols can be huge in debug/non-optimized, so I increased it again to 512; plus the module name from 56 to 248. + In turn, this required tuning the table format to support 2-byte lengths for ksyms. Compression at generation and kernel decompression is covered, although it may be the case that some script/tool also requires changes to understand the new table format. - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), by default following what the user selected for C, but customizable through a Kconfig menu. As well as options for tweaking overflow checks, debug assertions, etc. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Updated the quick start guide. - Updated CI `.config`s: + Add the new options and test with 2 builtins and 2 loadables. At the same time, remove the matrix test for builtin/loadable. + Updated with `toolchain` matrix support: now we test building with GCC, Clang or a full LLVM toolchain. + Debug: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. Also enable `-C opt-level=0` to test that such an extreme works and also to see how much bloated everything becomes. + Release: disabled `EXPERT` and changed a few things to make it look more like a normal configuration. + Also update both configs to v5.10 and `LLVM=1` while I was at it. (I could have split a few of these ones off into another PR, but anyway it is for the CI only and I had already done it). - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is two more nightly features used: + The new Rust mangling scheme: we know it will be stable (and the default on, later on). + The binary dep-info output: if we remove all other nightly features, this one can easily go too. - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, command-line changes, flag changes, etc. all trigger recompilation as needed. + Works as expected with parallel support (`-j`). - Automatic generation of the `exports.c` list: + Instead of manually handling the list, all non-local functions available in `core`, `alloc` and `kernel` are exported, so all modules should work, regardless of what they need, and without failing linking due to symbols in the manual list not existing (e.g. due to differences in config options). + They are a lot, though: * ~6k Rust symbols vs. ~4k C symbols in release. * However, 4k of those are `bindings_raw` (i.e. duplicated C ones), which shouldn't be exported. Thus we should look into making `bindings_raw` private to the crate (at the moment, the (first) Rust example requires `<kernel::bindings...::miscdevice as Default>::default`). + Licensing: * `kernel`'s symbols are exported as GPL. * `core`'s and `alloc`'s symbols are exported as non-GPL so that third-parties can build Rust modules as long as they write their own kernel support infrastructure, i.e. without taking advantage of `kernel`. This seemed to make the most sense compared to other exports from the kernel, plus it follows more closely the original licence of the crates. - Support for GCC-compiled kernels. + The generated bindings do not have meaningful differences in our release config, between GCC 10.1 and Clang 11. + Other configs (e.g. our debug one) may add/remove types and functions. That is fine unless we use them form our bindings. + However, there are config options that may not work (e.g. the randstruct GCC plugin if we use one of those structs). - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes #17). - No interleaved Cargo output (fixes #29). - No nightly dependency on Cargo's `build-std`; since now we manage the cross-compilation ourselves (should fix #27). - "Big" kallsyms symbol support: + I already raised ksym names from 128 to 256 back when I wrote the first integration. However, Rust symbols can be huge in debug/non-optimized, so I increased it again to 512; plus the module name from 56 to 248. + In turn, this required tuning the table format to support 2-byte lengths for ksyms. Compression at generation and kernel decompression is covered, although it may be the case that some script/tool also requires changes to understand the new table format. - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), by default following what the user selected for C, but customizable through a Kconfig menu. As well as options for tweaking overflow checks, debug assertions, etc. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Updated the quick start guide. - Updated CI `.config`s: + Add the new options and test with 2 builtins and 2 loadables. At the same time, remove the matrix test for builtin/loadable. + Updated with `toolchain` matrix support: now we test building with GCC, Clang or a full LLVM toolchain. + Debug: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. Also enable `-C opt-level=0` to test that such an extreme works and also to see how much bloated everything becomes. + Release: disabled `EXPERT` and changed a few things to make it look more like a normal configuration. + Also update both configs to v5.10 and `LLVM=1` while I was at it. (I could have split a few of these ones off into another PR, but anyway it is for the CI only and I had already done it). - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is two more nightly features used: + The new Rust mangling scheme: we know it will be stable (and the default on, later on). + The binary dep-info output: if we remove all other nightly features, this one can easily go too. - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, command-line changes, flag changes, etc. all trigger recompilation as needed. + Works as expected with parallel support (`-j`). - Automatic generation of the `exports.c` list: + Instead of manually handling the list, all non-local functions available in `core`, `alloc` and `kernel` are exported, so all modules should work, regardless of what they need, and without failing linking due to symbols in the manual list not existing (e.g. due to differences in config options). + They are a lot, though: * ~6k Rust symbols vs. ~4k C symbols in release. * However, 4k of those are `bindings_raw` (i.e. duplicated C ones), which shouldn't be exported. Thus we should look into making `bindings_raw` private to the crate (at the moment, the (first) Rust example requires `<kernel::bindings...::miscdevice as Default>::default`). + Licensing: * `kernel`'s symbols are exported as GPL. * `core`'s and `alloc`'s symbols are exported as non-GPL so that third-parties can build Rust modules as long as they write their own kernel support infrastructure, i.e. without taking advantage of `kernel`. This seemed to make the most sense compared to other exports from the kernel, plus it follows more closely the original licence of the crates. - Support for GCC-compiled kernels. + The generated bindings do not have meaningful differences in our release config, between GCC 10.1 and Clang 11. + Other configs (e.g. our debug one) may add/remove types and functions. That is fine unless we use them form our bindings. + However, there are config options that may not work (e.g. the randstruct GCC plugin if we use one of those structs). - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes #17). - No interleaved Cargo output (fixes #29). - No nightly dependency on Cargo's `build-std`; since now we manage the cross-compilation ourselves (should fix #27). - "Big" kallsyms symbol support: + I already raised ksym names from 128 to 256 back when I wrote the first integration. However, Rust symbols can be huge in debug/non-optimized, so I increased it again to 512; plus the module name from 56 to 248. + In turn, this required tuning the table format to support 2-byte lengths for ksyms. Compression at generation and kernel decompression is covered, although it may be the case that some script/tool also requires changes to understand the new table format. - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), by default following what the user selected for C, but customizable through a Kconfig menu. As well as options for tweaking overflow checks, debug assertions, etc. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Updated the quick start guide. - Updated CI `.config`s: + Add the new options and test with 2 builtins and 2 loadables. At the same time, remove the matrix test for builtin/loadable. + Updated with `toolchain` matrix support: now we test building with GCC, Clang or a full LLVM toolchain. + Debug: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. Also enable `-C opt-level=0` to test that such an extreme works and also to see how much bloated everything becomes. + Release: disabled `EXPERT` and changed a few things to make it look more like a normal configuration. + Also update both configs to v5.10 and `LLVM=1` while I was at it. (I could have split a few of these ones off into another PR, but anyway it is for the CI only and I had already done it). - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is two more nightly features used: + The new Rust mangling scheme: we know it will be stable (and the default on, later on). + The binary dep-info output: if we remove all other nightly features, this one can easily go too. - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, command-line changes, flag changes, etc. all trigger recompilation as needed. + Works as expected with parallel support (`-j`). - Automatic generation of the `exports.c` list: + Instead of manually handling the list, all non-local functions available in `core`, `alloc` and `kernel` are exported, so all modules should work, regardless of what they need, and without failing linking due to symbols in the manual list not existing (e.g. due to differences in config options). + They are a lot, though: * ~6k Rust symbols vs. ~4k C symbols in release. * However, 4k of those are `bindings_raw` (i.e. duplicated C ones), which shouldn't be exported. Thus we should look into making `bindings_raw` private to the crate (at the moment, the (first) Rust example requires `<kernel::bindings...::miscdevice as Default>::default`). + Licensing: * `kernel`'s symbols are exported as GPL. * `core`'s and `alloc`'s symbols are exported as non-GPL so that third-parties can build Rust modules as long as they write their own kernel support infrastructure, i.e. without taking advantage of `kernel`. This seemed to make the most sense compared to other exports from the kernel, plus it follows more closely the original licence of the crates. - Support for GCC-compiled kernels. + The generated bindings do not have meaningful differences in our release config, between GCC 10.1 and Clang 11. + Other configs (e.g. our debug one) may add/remove types and functions. That is fine unless we use them form our bindings. + However, there are config options that may not work (e.g. the randstruct GCC plugin if we use one of those structs). - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes #17). - No interleaved Cargo output (fixes #29). - No nightly dependency on Cargo's `build-std`; since now we manage the cross-compilation ourselves (should fix #27). - "Big" kallsyms symbol support: + I already raised ksym names from 128 to 256 back when I wrote the first integration. However, Rust symbols can be huge in debug/non-optimized, so I increased it again to 512; plus the module name from 56 to 248. + In turn, this required tuning the table format to support 2-byte lengths for ksyms. Compression at generation and kernel decompression is covered, although it may be the case that some script/tool also requires changes to understand the new table format. - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), by default following what the user selected for C, but customizable through a Kconfig menu. As well as options for tweaking overflow checks, debug assertions, etc. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Updated the quick start guide. - Updated CI `.config`s: + Add the new options and test with 2 builtins and 2 loadables. At the same time, remove the matrix test for builtin/loadable. + Updated with `toolchain` matrix support: now we test building with GCC, Clang or a full LLVM toolchain. + Debug: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. Also enable `-C opt-level=0` to test that such an extreme works and also to see how much bloated everything becomes. + Release: disabled `EXPERT` and changed a few things to make it look more like a normal configuration. + Also update both configs to v5.10 and `LLVM=1` while I was at it. (I could have split a few of these ones off into another PR, but anyway it is for the CI only and I had already done it). - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is two more nightly features used: + The new Rust mangling scheme: we know it will be stable (and the default on, later on). + The binary dep-info output: if we remove all other nightly features, this one can easily go too. - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, command-line changes, flag changes, etc. all trigger recompilation as needed. + Works as expected with parallel support (`-j`). - Automatic generation of the `exports.c` list: + Instead of manually handling the list, all non-local functions available in `core`, `alloc` and `kernel` are exported, so all modules should work, regardless of what they need, and without failing linking due to symbols in the manual list not existing (e.g. due to differences in config options). + They are a lot, though: * ~6k Rust symbols vs. ~4k C symbols in release. * However, 4k of those are `bindings_raw` (i.e. duplicated C ones), which shouldn't be exported. Thus we should look into making `bindings_raw` private to the crate (at the moment, the (first) Rust example requires `<kernel::bindings...::miscdevice as Default>::default`). + Licensing: * `kernel`'s symbols are exported as GPL. * `core`'s and `alloc`'s symbols are exported as non-GPL so that third-parties can build Rust modules as long as they write their own kernel support infrastructure, i.e. without taking advantage of `kernel`. This seemed to make the most sense compared to other exports from the kernel, plus it follows more closely the original licence of the crates. - Support for GCC-compiled kernels. + The generated bindings do not have meaningful differences in our release config, between GCC 10.1 and Clang 11. + Other configs (e.g. our debug one) may add/remove types and functions. That is fine unless we use them form our bindings. + However, there are config options that may not work (e.g. the randstruct GCC plugin if we use one of those structs). - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes #17). - No interleaved Cargo output (fixes #29). - No nightly dependency on Cargo's `build-std`; since now we manage the cross-compilation ourselves (should fix #27). - "Big" kallsyms symbol support: + I already raised ksym names from 128 to 256 back when I wrote the first integration. However, Rust symbols can be huge in debug/non-optimized, so I increased it again to 512; plus the module name from 56 to 248. + In turn, this required tuning the table format to support 2-byte lengths for ksyms. Compression at generation and kernel decompression is covered, although it may be the case that some script/tool also requires changes to understand the new table format. - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), by default following what the user selected for C, but customizable through a Kconfig menu. As well as options for tweaking overflow checks, debug assertions, etc. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Updated the quick start guide. - Updated CI `.config`s: + Add the new options and test with 2 builtins and 2 loadables. At the same time, remove the matrix test for builtin/loadable. + Updated with `toolchain` matrix support: now we test building with GCC, Clang or a full LLVM toolchain. + Debug: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. Also enable `-C opt-level=0` to test that such an extreme works and also to see how much bloated everything becomes. + Release: disabled `EXPERT` and changed a few things to make it look more like a normal configuration. + Also update both configs to v5.10 and `LLVM=1` while I was at it. (I could have split a few of these ones off into another PR, but anyway it is for the CI only and I had already done it). - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is two more nightly features used: + The new Rust mangling scheme: we know it will be stable (and the default on, later on). + The binary dep-info output: if we remove all other nightly features, this one can easily go too. - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, command-line changes, flag changes, etc. all trigger recompilation as needed. + Works as expected with parallel support (`-j`). - Automatic generation of the `exports.c` list: + Instead of manually handling the list, all non-local functions available in `core`, `alloc` and `kernel` are exported, so all modules should work, regardless of what they need, and without failing linking due to symbols in the manual list not existing (e.g. due to differences in config options). + They are a lot, though: * ~6k Rust symbols vs. ~4k C symbols in release. * However, 4k of those are `bindings_raw` (i.e. duplicated C ones), which shouldn't be exported. Thus we should look into making `bindings_raw` private to the crate (at the moment, the (first) Rust example requires `<kernel::bindings...::miscdevice as Default>::default`). + Licensing: * `kernel`'s symbols are exported as GPL. * `core`'s and `alloc`'s symbols are exported as non-GPL so that third-parties can build Rust modules as long as they write their own kernel support infrastructure, i.e. without taking advantage of `kernel`. This seemed to make the most sense compared to other exports from the kernel, plus it follows more closely the original licence of the crates. - Support for GCC-compiled kernels. + The generated bindings do not have meaningful differences in our release config, between GCC 10.1 and Clang 11. + Other configs (e.g. our debug one) may add/remove types and functions. That is fine unless we use them form our bindings. + However, there are config options that may not work (e.g. the randstruct GCC plugin if we use one of those structs). - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes #17). - No interleaved Cargo output (fixes #29). - No nightly dependency on Cargo's `build-std`; since now we manage the cross-compilation ourselves (should fix #27). - "Big" kallsyms symbol support: + I already raised ksym names from 128 to 256 back when I wrote the first integration. However, Rust symbols can be huge in debug/non-optimized, so I increased it again to 512; plus the module name from 56 to 248. + In turn, this required tuning the table format to support 2-byte lengths for ksyms. Compression at generation and kernel decompression is covered, although it may be the case that some script/tool also requires changes to understand the new table format. - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), by default following what the user selected for C, but customizable through a Kconfig menu. As well as options for tweaking overflow checks, debug assertions, etc. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Updated the quick start guide. - Updated CI `.config`s: + Add the new options and test with 2 builtins and 2 loadables. At the same time, remove the matrix test for builtin/loadable. + Updated with `toolchain` matrix support: now we test building with GCC, Clang or a full LLVM toolchain. + Debug: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. Also enable `-C opt-level=0` to test that such an extreme works and also to see how much bloated everything becomes. + Release: disabled `EXPERT` and changed a few things to make it look more like a normal configuration. + Also update both configs to v5.10 and `LLVM=1` while I was at it. (I could have split a few of these ones off into another PR, but anyway it is for the CI only and I had already done it). - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is two more nightly features used: + The new Rust mangling scheme: we know it will be stable (and the default on, later on). + The binary dep-info output: if we remove all other nightly features, this one can easily go too. - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, command-line changes, flag changes, etc. all trigger recompilation as needed. + Works as expected with parallel support (`-j`). - Automatic generation of the `exports.c` list: + Instead of manually handling the list, all non-local functions available in `core`, `alloc` and `kernel` are exported, so all modules should work, regardless of what they need, and without failing linking due to symbols in the manual list not existing (e.g. due to differences in config options). + They are a lot, though: * ~6k Rust symbols vs. ~4k C symbols in release. * However, 4k of those are `bindings_raw` (i.e. duplicated C ones), which shouldn't be exported. Thus we should look into making `bindings_raw` private to the crate (at the moment, the (first) Rust example requires `<kernel::bindings...::miscdevice as Default>::default`). + Licensing: * `kernel`'s symbols are exported as GPL. * `core`'s and `alloc`'s symbols are exported as non-GPL so that third-parties can build Rust modules as long as they write their own kernel support infrastructure, i.e. without taking advantage of `kernel`. This seemed to make the most sense compared to other exports from the kernel, plus it follows more closely the original licence of the crates. - Support for GCC-compiled kernels: + The generated bindings do not have meaningful differences in our release config, between GCC 10.1 and Clang 11. + Other configs (e.g. our debug one) may add/remove types and functions. That is fine unless we use them form our bindings. + However, there are config options that may not work (e.g. the randstruct GCC plugin if we use one of those structs). - Support for `arm64` architecture: + Added to the CI: BusyBox is cross-compiled on the fly (increased timeout a bit to match). + Requires weakening of a few compiler builtins and adding `copy_{from,to}_user` helpers. - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes #17). - No interleaved Cargo output (fixes #29). - No nightly dependency on Cargo's `build-std`; since now we manage the cross-compilation ourselves (should fix #27). - "Big" kallsyms symbol support: + I already raised ksym names from 128 to 256 back when I wrote the first integration. However, Rust symbols can be huge in debug/non-optimized, so I increased it again to 512; plus the module name from 56 to 248. + In turn, this required tuning the table format to support 2-byte lengths for ksyms. Compression at generation and kernel decompression is covered, although it may be the case that some script/tool also requires changes to understand the new table format. - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), by default following what the user selected for C, but customizable through a Kconfig menu. As well as options for tweaking overflow checks and debug assertions. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Do not export any helpers' symbols. - Updated the quick start guide. - Updated CI: + Now we always test with 2 builtins and 2 loadables Rust example drivers, removing the matrix test for builtin/loadable. + Added `toolchain` to matrix: now we test building with GCC, Clang or a full LLVM toolchain. + Added `arch` to matrix: now we test both arm64 and x86_64. + Added `rustc` to matrix: now we test with a very recent nightly as well. + Only build `output == build` once to reduce the number of combinations. + Debug x86_64 config: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. Also enable `-C opt-level=0` to test that such an extreme works and also to see how much bloated everything becomes. + Release x86_64 config: disabled `EXPERT` and changed a few things to make it look more like a normal desktop configuration, although it is still pretty minimal. + The configs for arm64 are `EXPERT` and `EMBEDDED` ones, very minimal, for the particular CPU we are simulating. + Update configs to v5.10. + Use `$GITHUB_ENV` to simplify. - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is two more nightly features used: + The new Rust mangling scheme: we know it will be stable (and the default on, later on). + The binary dep-info output: if we remove all other nightly features, this one can easily go too. - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. - The hack to get the proper flags for bindgen on GCC builds. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, command-line changes, flag changes, etc. all trigger recompilation as needed. + Works as expected with parallel support (`-j`). - Automatic generation of the `exports.c` list: + Instead of manually handling the list, all non-local functions available in `core`, `alloc` and `kernel` are exported, so all modules should work, regardless of what they need, and without failing linking due to symbols in the manual list not existing (e.g. due to differences in config options). + They are a lot, though: * ~6k Rust symbols vs. ~4k C symbols in release. * However, 4k of those are `bindings_raw` (i.e. duplicated C ones), which shouldn't be exported. Thus we should look into making `bindings_raw` private to the crate (at the moment, the (first) Rust example requires `<kernel::bindings...::miscdevice as Default>::default`). + Licensing: * `kernel`'s symbols are exported as GPL. * `core`'s and `alloc`'s symbols are exported as non-GPL so that third-parties can build Rust modules as long as they write their own kernel support infrastructure, i.e. without taking advantage of `kernel`. This seemed to make the most sense compared to other exports from the kernel, plus it follows more closely the original licence of the crates. - Support for GCC-compiled kernels: + The generated bindings do not have meaningful differences in our release config, between GCC 10.1 and Clang 11. + Other configs (e.g. our debug one) may add/remove types and functions. That is fine unless we use them form our bindings. + However, there are config options that may not work (e.g. the randstruct GCC plugin if we use one of those structs). - Support for `arm64` architecture: + Added to the CI: BusyBox is cross-compiled on the fly (increased timeout a bit to match). + Requires weakening of a few compiler builtins and adding `copy_{from,to}_user` helpers. - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes #17). - No interleaved Cargo output (fixes #29). - No nightly dependency on Cargo's `build-std`; since now we manage the cross-compilation ourselves (should fix #27). - "Big" kallsyms symbol support: + I already raised ksym names from 128 to 256 back when I wrote the first integration. However, Rust symbols can be huge in debug/non-optimized, so I increased it again to 512; plus the module name from 56 to 248. + In turn, this required tuning the table format to support 2-byte lengths for ksyms. Compression at generation and kernel decompression is covered, although it may be the case that some script/tool also requires changes to understand the new table format. - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), by default following what the user selected for C, but customizable through a Kconfig menu. As well as options for tweaking overflow checks and debug assertions. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Do not export any helpers' symbols. - Updated the quick start guide. - Updated CI: + Now we always test with 2 builtins and 2 loadables Rust example drivers, removing the matrix test for builtin/loadable. + Added `toolchain` to matrix: now we test building with GCC, Clang or a full LLVM toolchain. + Added `arch` to matrix: now we test both arm64 and x86_64. + Added `rustc` to matrix: now we test with a very recent nightly as well. + Only build `output == build` once to reduce the number of combinations. + Debug x86_64 config: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. Also enable `-C opt-level=0` to test that such an extreme works and also to see how much bloated everything becomes. + Release x86_64 config: disabled `EXPERT` and changed a few things to make it look more like a normal desktop configuration, although it is still pretty minimal. + The configs for arm64 are `EXPERT` and `EMBEDDED` ones, very minimal, for the particular CPU we are simulating. + Update configs to v5.10. + Use `$GITHUB_ENV` to simplify. - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is two more nightly features used: + The new Rust mangling scheme: we know it will be stable (and the default on, later on). + The binary dep-info output: if we remove all other nightly features, this one can easily go too. - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. - The hack to get the proper flags for bindgen on GCC builds. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, command-line changes, flag changes, etc. all trigger recompilation as needed. + Works as expected with parallel support (`-j`). - Automatic generation of the `exports.c` list: + Instead of manually handling the list, all non-local functions available in `core`, `alloc` and `kernel` are exported, so all modules should work, regardless of what they need, and without failing linking due to symbols in the manual list not existing (e.g. due to differences in config options). + They are a lot, though: * ~6k Rust symbols vs. ~4k C symbols in release. * However, 4k of those are `bindings_raw` (i.e. duplicated C ones), which shouldn't be exported. Thus we should look into making `bindings_raw` private to the crate (at the moment, the (first) Rust example requires `<kernel::bindings...::miscdevice as Default>::default`). + Licensing: * `kernel`'s symbols are exported as GPL. * `core`'s and `alloc`'s symbols are exported as non-GPL so that third-parties can build Rust modules as long as they write their own kernel support infrastructure, i.e. without taking advantage of `kernel`. This seemed to make the most sense compared to other exports from the kernel, plus it follows more closely the original licence of the crates. - Support for GCC-compiled kernels: + The generated bindings do not have meaningful differences in our release config, between GCC 10.1 and Clang 11. + Other configs (e.g. our debug one) may add/remove types and functions. That is fine unless we use them form our bindings. + However, there are config options that may not work (e.g. the randstruct GCC plugin if we use one of those structs). - Support for `arm64` architecture: + Added to the CI: BusyBox is cross-compiled on the fly (increased timeout a bit to match). + Requires weakening of a few compiler builtins and adding `copy_{from,to}_user` helpers. - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes #17). - No interleaved Cargo output (fixes #29). - No nightly dependency on Cargo's `build-std`; since now we manage the cross-compilation ourselves (should fix #27). - "Big" kallsyms symbol support: + I already raised ksym names from 128 to 256 back when I wrote the first integration. However, Rust symbols can be huge in debug/non-optimized, so I increased it again to 512; plus the module name from 56 to 248. + In turn, this required tuning the table format to support 2-byte lengths for ksyms. Compression at generation and kernel decompression is covered, although it may be the case that some script/tool also requires changes to understand the new table format. - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), by default following what the user selected for C, but customizable through a Kconfig menu. As well as options for tweaking overflow checks and debug assertions. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Do not export any helpers' symbols. - Updated the quick start guide. - Updated CI: + Now we always test with 2 builtins and 2 loadables Rust example drivers, removing the matrix test for builtin/loadable. + Added `toolchain` to matrix: now we test building with GCC, Clang or a full LLVM toolchain. + Added `arch` to matrix: now we test both arm64 and x86_64. + Added `rustc` to matrix: now we test with a very recent nightly as well. + Only build `output == build` once to reduce the number of combinations. + Debug x86_64 config: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. Also enable `-C opt-level=0` to test that such an extreme works and also to see how much bloated everything becomes. + Release x86_64 config: disabled `EXPERT` and changed a few things to make it look more like a normal desktop configuration, although it is still pretty minimal. + The configs for arm64 are `EXPERT` and `EMBEDDED` ones, very minimal, for the particular CPU we are simulating. + Update configs to v5.10. + Use `$GITHUB_ENV` to simplify. - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is two more nightly features used: + The new Rust mangling scheme: we know it will be stable (and the default on, later on). + The binary dep-info output: if we remove all other nightly features, this one can easily go too. - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. - The hack to get the proper flags for bindgen on GCC builds. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This is a big PR, but most of it is interdependent to the rest. - Shared Rust infrastructure: `libkernel`, `libmodule`, `libcore`, `liballoc`, `libcompiler_builtins`. + The Rust modules are now much smaller since they do not contain several copies of those libraries. Our example `.ko` on release is just 12 KiB, down from 1.3 MiB. For reference: `vmlinux` on release w/ Rust is 23 MiB (compressed: 2.1 MiB) `vmlinux` on release w/o Rust is 22 MiB (compressed: 1.9 MiB) i.e. the bulk is now shared. + Multiple builtin modules are now supported since their symbols do not collide against each other (fixes #9). + Faster compilation (less crates to compile & less repetition). + We achieve this by compiling all the shared code to `.rlib`s (and the `.so` for the proc macro). For loadable modules, we need to rely on the upcoming v0 Rust mangling scheme, plus we need to export the Rust symbols needed by the `.ko`s. - Simpler, flat file structure: now a small driver may only need a single file like `drivers/char/rust_example.rs`, like in C. + All the `rust/*` and `driver/char/rust_example/*` files moved to fit in the new structure: less files around. - Only `rust-lang/{rust,rust-bindgen,compiler-builtins}` as dependencies. + Also helps with the faster compilation. - Dependency handling integration with `Kbuild`/`fixdep`. + Changes to the Rust standard library, kernel headers (bindings), `rust/` source files, `.rs` changes, command-line changes, flag changes, etc. all trigger recompilation as needed. + Works as expected with parallel support (`-j`). - Automatic generation of the `exports.c` list: + Instead of manually handling the list, all non-local functions available in `core`, `alloc` and `kernel` are exported, so all modules should work, regardless of what they need, and without failing linking due to symbols in the manual list not existing (e.g. due to differences in config options). + They are a lot, though: * ~6k Rust symbols vs. ~4k C symbols in release. * However, 4k of those are `bindings_raw` (i.e. duplicated C ones), which shouldn't be exported. Thus we should look into making `bindings_raw` private to the crate (at the moment, the (first) Rust example requires `<kernel::bindings...::miscdevice as Default>::default`). + Licensing: * `kernel`'s symbols are exported as GPL. * `core`'s and `alloc`'s symbols are exported as non-GPL so that third-parties can build Rust modules as long as they write their own kernel support infrastructure, i.e. without taking advantage of `kernel`. This seemed to make the most sense compared to other exports from the kernel, plus it follows more closely the original licence of the crates. - Support for GCC-compiled kernels: + The generated bindings do not have meaningful differences in our release config, between GCC 10.1 and Clang 11. + Other configs (e.g. our debug one) may add/remove types and functions. That is fine unless we use them form our bindings. + However, there are config options that may not work (e.g. the randstruct GCC plugin if we use one of those structs). - Support for `arm64` architecture: + Added to the CI: BusyBox is cross-compiled on the fly (increased timeout a bit to match). + Requires weakening of a few compiler builtins and adding `copy_{from,to}_user` helpers. - Proper `make clean` support. - Offline builds by default (there is no "online compilation" anymore; fixes #17). - No interleaved Cargo output (fixes #29). - No nightly dependency on Cargo's `build-std`; since now we manage the cross-compilation ourselves (should fix #27). - "Big" kallsyms symbol support: + I already raised ksym names from 128 to 256 back when I wrote the first integration. However, Rust symbols can be huge in debug/non-optimized, so I increased it again to 512; plus the module name from 56 to 248. + In turn, this required tuning the table format to support 2-byte lengths for ksyms. Compression at generation and kernel decompression is covered, although it may be the case that some script/tool also requires changes to understand the new table format. - Since now a kernel can be "Rust-enabled", a new `CONFIG_RUST` option is added to enable/disable it manually, regardless of whether one has `rustc` available or not (`CONFIG_HAS_RUST`). - Improved handling of `rustc` flags (`opt-level`, `debuginfo`, etc.), by default following what the user selected for C, but customizable through a Kconfig menu. As well as options for tweaking overflow checks and debug assertions. - This rewrite of the Kbuild support is cleaner, i.e. less hacks in general handling paths (e.g. no more `shell readlink` for `O=`). - Duplicated the example driver 3 times so that we can test in the CI that 2 builtins and 2 loadables work, all at the same time. - Do not export any helpers' symbols. - Updated the quick start guide. - Updated CI: + Now we always test with 2 builtins and 2 loadables Rust example drivers, removing the matrix test for builtin/loadable. + Added `toolchain` to matrix: now we test building with GCC, Clang or a full LLVM toolchain. + Added `arch` to matrix: now we test both arm64 and x86_64. + Added `rustc` to matrix: now we test with a very recent nightly as well. + Only build `output == build` once to reduce the number of combinations. + Debug x86_64 config: more things enabled (debuginfo, kgdb, unit testing, etc.) that mimic more what a developer would have. Running the CI will be slightly slower, but should be OK. Also enable `-C opt-level=0` to test that such an extreme works and also to see how much bloated everything becomes. + Release x86_64 config: disabled `EXPERT` and changed a few things to make it look more like a normal desktop configuration, although it is still pretty minimal. + The configs for arm64 are `EXPERT` and `EMBEDDED` ones, very minimal, for the particular CPU we are simulating. + Update configs to v5.10. + Use `$GITHUB_ENV` to simplify. - Less `extern crate`s needed since we pass it via `rustc` (closer to idiomatic 2018 edition Rust code). Things to note: - There is two more nightly features used: + The new Rust mangling scheme: we know it will be stable (and the default on, later on). + The binary dep-info output: if we remove all other nightly features, this one can easily go too. - The hack at `exports.c` to export symbols to loadable modules. - The hack at `allocator.rs` to get the `__rust_*()` functions. - The hack to get the proper flags for bindgen on GCC builds. Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Andrii Nakryiko says: ==================== BPF register bounds logic and testing improvements This patch set adds a big set of manual and auto-generated test cases validating BPF verifier's register bounds tracking and deduction logic. See details in the last patch. We start with building a tester that validates existing <range> vs <scalar> verifier logic for range bounds. To make all this work, BPF verifier's logic needed a bunch of improvements to handle some cases that previously were not covered. This had no implications as to correctness of verifier logic, but it was incomplete enough to cause significant disagreements with alternative implementation of register bounds logic that tests in this patch set implement. So we need BPF verifier logic improvements to make all the tests pass. This is what we do in patches #3 through #9. The end goal of this work, though, is to extend BPF verifier range state tracking such as to allow to derive new range bounds when comparing non-const registers. There is some more investigative work required to investigate and fix existing potential issues with range tracking as part of ALU/ALU64 operations, so <range> x <range> part of v5 patch set ([0]) is dropped until these issues are sorted out. For now, we include preparatory refactorings and clean ups, that set up BPF verifier code base to extend the logic to <range> vs <range> logic in subsequent patch set. Patches #10-#16 perform preliminary refactorings without functionally changing anything. But they do clean up check_cond_jmp_op() logic and generalize a bunch of other pieces in is_branch_taken() logic. [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=797178&state=* v5->v6: - dropped <range> vs <range> patches (original patches #18 through #23) to add more register range sanity checks and fix preexisting issues; - comments improvements, addressing other feedback on first 17 patches (Eduard, Alexei); v4->v5: - added entirety of verifier reg bounds tracking changes, now handling <range> vs <range> cases (Alexei); - added way more comments trying to explain why deductions added are correct, hopefully they are useful and clarify things a bit (Daniel, Shung-Hsi); - added two preliminary selftests fixes necessary for RELEASE=1 build to work again, it keeps breaking. v3->v4: - improvements to reg_bounds tester (progress report, split 32-bit and 64-bit ranges, fix various verbosity output issues, etc); v2->v3: - fix a subtle little-endianness assumption inside parge_reg_state() (CI); v1->v2: - fix compilation when building selftests with llvm-16 toolchain (CI). ==================== Link: https://lore.kernel.org/r/20231102033759.2541186-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko says: ==================== BPF register bounds range vs range support This patch set is a continuation of work started in [0]. It adds a big set of manual, auto-generated, and now also random test cases validating BPF verifier's register bounds tracking and deduction logic. First few patches generalize verifier's logic to handle conditional jumps and corresponding range adjustments in case when two non-const registers are compared to each other. Patch #1 generalizes reg_set_min_max() portion, while patch #2 does the same for is_branch_taken() part of the overall solution. Patch #3 improves equality and inequality for cases when BPF program code mixes 64-bit and 32-bit uses of the same register. Depending on specific sequence, it's possible to get to the point where u64/s64 bounds will be very generic (e.g., after signed 32-bit comparison), while we still keep pretty tight u32/s32 bounds. If in such state we proceed with 32-bit equality or inequality comparison, reg_set_min_max() might have to deal with adjusting s32 bounds for two registers that don't overlap, which breaks reg_set_min_max(). This doesn't manifest in <range> vs <const> cases, because if that happens reg_set_min_max() in effect will force s32 bounds to be a new "impossible" constant (from original smin32/smax32 bounds point of view). Things get tricky when we have <range> vs <range> adjustments, so instead of trying to somehow make sense out of such situations, it's best to detect such impossible situations and prune the branch that can't be taken in is_branch_taken() logic. This equality/inequality was the only such category of situations with auto-generated tests added later in the patch set. But when we start mixing arithmetic operations in different numeric domains and conditionals, things get even hairier. So, patch #4 adds sanity checking logic after all ALU/ALU64, JMP/JMP32, and LDX operations. By default, instead of failing verification, we conservatively reset range bounds to unknown values, reporting violation in verifier log (if verbose logs are requested). But to aid development, detection, and debugging, we also introduce a new test flag, BPF_F_TEST_SANITY_STRICT, which triggers verification failure on range sanity violation. Patch #11 sets BPF_F_TEST_SANITY_STRICT by default for test_progs and test_verifier. Patch #12 adds support for controlling this in veristat for testing with production BPF object files. Getting back to BPF verifier, patches #5 and #6 complete verifier's range tracking logic clean up. See respective patches for details. With kernel-side taken care of, we move to testing. We start with building a tester that validates existing <range> vs <scalar> verifier logic for range bounds. Patch #7 implements an initial version of such a tester. We guard millions of generated tests behind SLOW_TESTS=1 envvar requirement, but also have a relatively small number of tricky cases that came up during development and debugging of this work. Those will be executed as part of a normal test_progs run. Patch #8 simulates more nuanced JEQ/JNE logic we added to verifier in patch #3. Patch #9 adds <range> vs <range> "slow tests". Patch #10 is a completely new one, it adds a bunch of randomly generated cases to be run normally, without SLOW_TESTS=1 guard. This should help to get a bunch of cover, and hopefully find some remaining latent problems if verifier proactively as part of normal BPF CI runs. Finally, a tiny test which was, amazingly, an initial motivation for this whole work, is added in lucky patch #13, demonstrating how verifier is now smart enough to track actual number of elements in the array and won't require additional checks on loop iteration variable inside the bpf_for() open-coded iterator loop. [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=798308&state=* v1->v2: - use x < y => y > x property to minimize reg_set_min_max (Eduard); - fix for JEQ/JNE logic in reg_bounds.c (Eduard); - split BPF_JSET and !BPF_JSET cases handling (Shung-Hsi); - adjustments to reg_bounds.c to make it easier to follow (Alexei); - added acks (Eduard, Shung-Hsi). ==================== Link: https://lore.kernel.org/r/20231112010609.848406-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Petr Machata says: ==================== mlxsw: Add support for new reset flow Ido Schimmel writes: This patchset changes mlxsw to issue a PCI reset during probe and devlink reload so that the PCI firmware could be upgraded without a reboot. Unlike the old version of this patchset [1], in this version the driver no longer tries to issue a PCI reset by triggering a PCI link toggle on its own, but instead calls the PCI core to issue the reset. The PCI APIs require the device lock to be held which is why patches Patches #7 adds reset method quirk for NVIDIA Spectrum devices. Patch #8 adds a debug level print in PCI core so that device ready delay will be printed even if it is shorter than one second. Patches #9-#11 are straightforward preparations in mlxsw. Patch #12 finally implements the new reset flow in mlxsw. Patch #13 adds PCI reset handlers in mlxsw to avoid user space from resetting the device from underneath an unaware driver. Instead, the driver is gracefully de-initialized before the PCI reset and then initialized again after it. Patch #14 adds a PCI reset selftest to make sure this code path does not regress. [1] https://lore.kernel.org/netdev/cover.1679502371.git.petrm@nvidia.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata says: ==================== mlxsw: Preparations for support of CFF flood mode PGT is an in-HW table that maps addresses to sets of ports. Then when some HW process needs a set of ports as an argument, instead of embedding the actual set in the dynamic configuration, what gets configured is the address referencing the set. The HW then works with the appropriate PGT entry. Among other allocations, the PGT currently contains two large blocks for bridge flooding: one for 802.1q and one for 802.1d. Within each of these blocks are three tables, for unknown-unicast, multicast and broadcast flooding: . . . | 802.1q | 802.1d | . . . | UC | MC | BC | UC | MC | BC | \______ _____/ \_____ ______/ v v FID flood vectors Thus each FID (which corresponds to an 802.1d bridge or one VLAN in an 802.1q bridge) uses three flood vectors spread across a fairly large region of PGT. This way of organizing the flood table (called "controlled") is not very flexible. E.g. to decrease a bridge scale and store more IP MC vectors, one would need to completely rewrite the bridge PGT blocks, or resort to hacks such as storing individual MC flood vectors into unused part of the bridge table. In order to address these shortcomings, Spectrum-2 and above support what is called CFF flood mode, for Compressed FID Flooding. In CFF flood mode, each FID has a little table of its own, with three entries adjacent to each other, one for unknown-UC, one for MC, one for BC. This allows for a much more fine-grained approach to PGT management, where bits of it are allocated on demand. . . . | FID | FID | FID | FID | FID | . . . |U|M|B|U|M|B|U|M|B|U|M|B|U|M|B| \_____________ _____________/ v FID flood vectors Besides the FID table organization, the CFF flood mode also impacts Router Subport (RSP) table. This table contains flood vectors for rFIDs, which are FIDs that reference front panel ports or LAGs. The RSP table contains two entries per front panel port and LAG, one for unknown-UC traffic, and one for everything else. Currently, the FW allocates and manages the table in its own part of PGT. rFIDs are marked with flood_rsp bit and managed specially. In CFF mode, rFIDs are managed as all other FIDs. The driver therefore has to allocate and maintain the flood vectors. Like with bridge FIDs, this is more work, but increases flexibility of the system. The FW currently supports both the controlled and CFF flood modes. To shed complexity, in the future it should only support CFF flood mode. Hence this patchset, which is the first in series of two to add CFF flood mode support to mlxsw. There are FW versions out there that do not support CFF flood mode, and on Spectrum-1 in particular, there is no plan to support it at all. mlxsw will therefore have to support both controlled flood mode as well as CFF. Another aspect is that at least on Spectrum-1, there are FW versions out there that claim to support CFF flood mode, but then reject or ignore configurations enabling the same. The driver thus has to have a say in whether an attempt to configure CFF flood mode should even be made. Much like with the LAG mode, the feature is therefore expressed in terms of "does the driver prefer CFF flood mode?", and "what flood mode the PCI module managed to configure the FW with". This gives to the driver a chance to determine whether CFF flood mode configuration should be attempted. In this patchset, we lay the ground with new definitions, registers and their fields, and some minor code shaping. The next patchset will be more focused on introducing necessary abstractions and implementation. - Patches #1 and #2 add CFF-related items to the command interface. - Patch #3 adds a new resource, for maximum number of flood profiles supported. (A flood profile is a mapping between traffic type and offset in the per-FID flood vector table.) - Patches #4 to #8 adjust reg.h. The SFFP register is added, which is used for configuring the abovementioned traffic-type-to-offset mapping. The SFMR, register, which serves for FID configuration, is extended with fields specific to CFF mode. And other minor adjustments. - Patches #9 and #10 add the plumbing for CFF mode: a way to request that CFF flood mode be configured, and a way to query the flood mode that was actually configured. - Patch #11 removes dead code. - Patches #12 and #13 add helpers that the next patchset will make use of. Patch #14 moves RIF setup ahead so that FID code can make use of it. ==================== Link: https://lore.kernel.org/r/cover.1700503643.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata says: ==================== mlxsw: Support CFF flood mode The registers to configure to initialize a flood table differ between the controlled and CFF flood modes. In therefore needs to be an op. Add it, hook up the current init to the existing families, and invoke the op. PGT is an in-HW table that maps addresses to sets of ports. Then when some HW process needs a set of ports as an argument, instead of embedding the actual set in the dynamic configuration, what gets configured is the address referencing the set. The HW then works with the appropriate PGT entry. Among other allocations, the PGT currently contains two large blocks for bridge flooding: one for 802.1q and one for 802.1d. Within each of these blocks are three tables, for unknown-unicast, multicast and broadcast flooding: . . . | 802.1q | 802.1d | . . . | UC | MC | BC | UC | MC | BC | \______ _____/ \_____ ______/ v v FID flood vectors Thus each FID (which corresponds to an 802.1d bridge or one VLAN in an 802.1q bridge) uses three flood vectors spread across a fairly large region of PGT. This way of organizing the flood table (called "controlled") is not very flexible. E.g. to decrease a bridge scale and store more IP MC vectors, one would need to completely rewrite the bridge PGT blocks, or resort to hacks such as storing individual MC flood vectors into unused part of the bridge table. In order to address these shortcomings, Spectrum-2 and above support what is called CFF flood mode, for Compressed FID Flooding. In CFF flood mode, each FID has a little table of its own, with three entries adjacent to each other, one for unknown-UC, one for MC, one for BC. This allows for a much more fine-grained approach to PGT management, where bits of it are allocated on demand. . . . | FID | FID | FID | FID | FID | . . . |U|M|B|U|M|B|U|M|B|U|M|B|U|M|B| \_____________ _____________/ v FID flood vectors Besides the FID table organization, the CFF flood mode also impacts Router Subport (RSP) table. This table contains flood vectors for rFIDs, which are FIDs that reference front panel ports or LAGs. The RSP table contains two entries per front panel port and LAG, one for unknown-UC traffic, and one for everything else. Currently, the FW allocates and manages the table in its own part of PGT. rFIDs are marked with flood_rsp bit and managed specially. In CFF mode, rFIDs are managed as all other FIDs. The driver therefore has to allocate and maintain the flood vectors. Like with bridge FIDs, this is more work, but increases flexibility of the system. The FW currently supports both the controlled and CFF flood modes. To shed complexity, in the future it should only support CFF flood mode. Hence this patchset, which adds CFF flood mode support to mlxsw. Since mlxsw needs to maintain both the controlled mode as well as CFF mode support, we will keep the layout as compatible as possible. The bridge tables will stay in the same overall shape, just their inner organization will change from flood mode -> FID to FID -> flood mode. Likewise will RSP be kept as a contiguous block of PGT memory, as was the case when the FW maintained it. - The way FIDs get configured under the CFF flood mode differs from the currently used controlled mode. The simple approach of having several globally visible arrays for spectrum.c to statically choose from no longer works. Patch #1 thus privatizes all FID initialization and finalization logic, and exposes it as ops instead. - Patch #2 renames the ops that are specific to the controlled mode, to make room in the namespace for the CFF variants. Patch #3 extracts a helper to compute flood table base out of mlxsw_sp_fid_flood_table_mid(). - The op fid_setup configured fid_offset, i.e. the number of this FID within its family. For rFIDs in CFF mode, to determine this number, the driver will need to do fallible queries. Thus in patch #4, make the FID setup operation fallible as well. - Flood mode initialization routine differs between the controlled and CFF flood modes. The controlled mode needs to configure flood table layout, which the CFF mode does not need to do. In patch #5, move mlxsw_sp_fid_flood_table_init() up so that the following patch can make use of it. In patch #6, add an op to be invoked per table (if defined). - The current way of determining PGT allocation size depends on the number of FIDs and number of flood tables. RFIDs however have PGT footprint depending not on number of FIDs, but on number of ports and LAGs, because which ports an rFID should flood to does not depend on the FID itself, but on the port or LAG that it references. Therefore in patch #7, add FID family ops for determining PGT allocation size. - As elaborated above, layout of PGT will differ between controlled and CFF flood modes. In CFF mode, it will further differ between rFIDs and other FIDs (as described at previous patch). The way to pack the SFMR register to configure a FID will likewise differ from controlled to CFF. Thus in patches #8 and #9 add FID family ops to determine PGT base address for a FID and to pack SFMR. - Patches #10 and #11 add more bits for RSP support. In patch #10, add a new traffic type enumerator, for non-UC traffic. This is a combination of BC and MC traffic, but the way that mlxsw maps these mnemonic names to actual traffic type configurations requires that we have a new name to describe this class of traffic. Patch #11 then adds hooks necessary for RSP table maintenance. As ports come and go, and join and leave LAGs, it is necessary to update flood vectors that the rFIDs use. These new hooks will make that possible. - Patches #12, #13 and #14 introduce flood profiles. These have been implicit so far, but the way that CFF flood mode works with profile IDs requires that we make them explicit. Thus in patch #12, introduce flood profile objects as a set of flood tables that FID families then refer to. The FID code currently only uses a single flood profile. In patch #13, add a flood profile ID to flood profile objects. In patch #14, when in CFF mode, configure SFFP according to the existing flood profiles (or the one that exists as of that point). - Patches #15 and #16 add code to implement, respectively, bridge FIDs and RSP FIDs in CFF mode. - In patch #17, toggle flood_mode_prefer_cff on Spectrum-2 and above, which makes the newly-added code live. ==================== Link: https://lore.kernel.org/r/cover.1701183891.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
…gister-spills' Andrii Nakryiko says: ==================== Complete BPF verifier precision tracking support for register spills Add support to BPF verifier to track and support register spill/fill to/from stack regardless if it was done through read-only R10 register (which is the only form supported today), or through a general register after copying R10 into it, while also potentially modifying offset. Once we add register this generic spill/fill support to precision backtracking, we can take advantage of it to stop doing eager STACK_ZERO conversion on register spill. Instead we can rely on (im)precision of spilled const zero register to improve verifier state pruning efficiency. This situation of using const zero register to initialize stack slots is very common with __builtin_memset() usage or just zero-initializing variables on the stack, and it causes unnecessary state duplication, as that STACK_ZERO knowledge is often not necessary for correctness, as those zero values are never used in precise context. Thus, relying on register imprecision helps tremendously, especially in real-world BPF programs. To make spilled const zero register behave completely equivalently to STACK_ZERO, we need to improve few other small pieces, which is done in the second part of the patch set. See individual patches for details. There are also two small bug fixes spotted during STACK_ZERO debugging. The patch set consists of logically three changes: - patch #1 (and corresponding tests in patch #2) is fixing/impoving precision propagation for stack spills/fills. This can be landed as a stand-alone improvement; - patches #3 through #9 is improving verification scalability by utilizing register (im)precision instead of eager STACK_ZERO. These changes depend on patch #1. - patch #10 is a memory efficiency improvement to how instruction/jump history is tracked and maintained. It depends on patch #1, but is not strictly speaking required, even though I believe it's a good long-term solution to have a path-dependent per-instruction information. Kind of like a path-dependent counterpart to path-agnostic insn_aux array. v3->v3: - fixed up Fixes tag (Alexei); - fixed few more selftests to not use BPF_ST instruction in inline asm directly, checked with CI, it was happy (CI); v2->v3: - BPF_ST instruction workaround (Eduard); - force dereference in added tests to catch problems (Eduard); - some commit message massaging (Alexei); v1->v2: - clean ups, WARN_ONCE(), insn_flags helpers added (Eduard); - added more selftests for STACK_ZERO/STACK_MISC cases (Eduard); - a bit more detailed explanation of effect of avoiding STACK_ZERO in favor of register spill in patch #8 commit (Alexei); - global shared instruction history refactoring moved to be the last patch in the series to make it easier to revert it, if applied (Alexei). ==================== Link: https://lore.kernel.org/r/20231205184248.1502704-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
…ux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2023-12-04 This series provides bug fixes to mlx5 driver. V1->V2: - Drop commit #9 ("net/mlx5e: Forbid devlink reload if IPSec rules are offloaded"), we are working on a better fix Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Before the change on `i686-linux` `systemd` build failed as: $ bpftool gen object src/core/bpf/socket_bind/socket-bind.bpf.o src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o Error: failed to link 'src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o': Invalid argument (22) After the change it fails as: $ bpftool gen object src/core/bpf/socket_bind/socket-bind.bpf.o src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o libbpf: ELF section #9 has inconsistent alignment addr=8 != d=4 in src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o Error: failed to link 'src/core/bpf/socket_bind/socket-bind.bpf.unstripped.o': Invalid argument (22) Now it's slightly easier to figure out what is wrong with an ELF file. Signed-off-by: Sergei Trofimovich <slyich@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20231208215100.435876-1-slyich@gmail.com
Andrii Nakryiko says: ==================== BPF token support in libbpf's BPF object Add fuller support for BPF token in high-level BPF object APIs. This is the most frequently used way to work with BPF using libbpf, so supporting BPF token there is critical. Patch #1 is improving kernel-side BPF_TOKEN_CREATE behavior by rejecting to create "empty" BPF token with no delegation. This seems like saner behavior which also makes libbpf's caching better overall. If we ever want to create BPF token with no delegate_xxx options set on BPF FS, we can use a new flag to enable that. Patches #2-#5 refactor libbpf internals, mostly feature detection code, to prepare it from BPF token FD. Patch #6 adds options to pass BPF token into BPF object open options. It also adds implicit BPF token creation logic to BPF object load step, even without any explicit involvement of the user. If the environment is setup properly, BPF token will be created transparently and used implicitly. This allows for all existing application to gain BPF token support by just linking with latest version of libbpf library. No source code modifications are required. All that under assumption that privileged container management agent properly set up default BPF FS instance at /sys/bpf/fs to allow BPF token creation. Patches #7-#8 adds more selftests, validating BPF object APIs work as expected under unprivileged user namespaced conditions in the presence of BPF token. Patch #9 extends libbpf with LIBBPF_BPF_TOKEN_PATH envvar knowledge, which can be used to override custom BPF FS location used for implicit BPF token creation logic without needing to adjust application code. This allows admins or container managers to mount BPF token-enabled BPF FS at non-standard location without the need to coordinate with applications. LIBBPF_BPF_TOKEN_PATH can also be used to disable BPF token implicit creation by setting it to an empty value. Patch #10 tests this new envvar functionality. v2->v3: - move some stray feature cache refactorings into patch #4 (Alexei); - add LIBBPF_BPF_TOKEN_PATH envvar support (Alexei); v1->v2: - remove minor code redundancies (Eduard, John); - add acks and rebase. ==================== Link: https://lore.kernel.org/r/20231213190842.3844987-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Validate @smb->WordCount to avoid reading off the end of @smb and thus causing the following KASAN splat: BUG: KASAN: slab-out-of-bounds in smbCalcSize+0x32/0x40 [cifs] Read of size 2 at addr ffff88801c024ec5 by task cifsd/1328 CPU: 1 PID: 1328 Comm: cifsd Not tainted 6.7.0-rc5 #9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 print_report+0xcf/0x650 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __phys_addr+0x46/0x90 kasan_report+0xd8/0x110 ? smbCalcSize+0x32/0x40 [cifs] ? smbCalcSize+0x32/0x40 [cifs] kasan_check_range+0x105/0x1b0 smbCalcSize+0x32/0x40 [cifs] checkSMB+0x162/0x370 [cifs] ? __pfx_checkSMB+0x10/0x10 [cifs] cifs_handle_standard+0xbc/0x2f0 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 cifs_demultiplex_thread+0xed1/0x1360 [cifs] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? lockdep_hardirqs_on_prepare+0x136/0x210 ? __pfx_lock_release+0x10/0x10 ? srso_alias_return_thunk+0x5/0xfbef5 ? mark_held_locks+0x1a/0x90 ? lockdep_hardirqs_on_prepare+0x136/0x210 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __kthread_parkme+0xce/0xf0 ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] kthread+0x18d/0x1d0 ? kthread+0xdb/0x1d0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> This fixes CVE-2023-6606. Reported-by: j51569436@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218218 Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Jiri Pirko says: ==================== devlink: introduce notifications filtering From: Jiri Pirko <jiri@nvidia.com> Currently the user listening on a socket for devlink notifications gets always all messages for all existing devlink instances and objects, even if he is interested only in one of those. That may cause unnecessary overhead on setups with thousands of instances present. User is currently able to narrow down the devlink objects replies to dump commands by specifying select attributes. Allow similar approach for notifications providing user a new notify-filter-set command to select attributes with values the notification message has to match. In that case, it is delivered to the socket. Note that the filtering is done per-socket, so multiple users may specify different selection of attributes with values. This patchset initially introduces support for following attributes: DEVLINK_ATTR_BUS_NAME DEVLINK_ATTR_DEV_NAME DEVLINK_ATTR_PORT_INDEX Patches #1 - #4 are preparations in devlink code, patch #3 is an optimization done on the way. Patches #5 - #7 are preparations in netlink and generic netlink code. Patch #8 is the main one in this set implementing of the notify-filter-set command and the actual per-socket filtering. Patch #9 extends the infrastructure allowing to filter according to a port index. Example: $ devlink mon port pci/0000:08:00.0/32768 [port,new] pci/0000:08:00.0/32768: type notset flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable [port,new] pci/0000:08:00.0/32768: type eth flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable [port,new] pci/0000:08:00.0/32768: type eth netdev eth3 flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable [port,new] pci/0000:08:00.0/32768: type eth netdev eth3 flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable [port,new] pci/0000:08:00.0/32768: type eth flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable [port,new] pci/0000:08:00.0/32768: type notset flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable [port,del] pci/0000:08:00.0/32768: type notset flavour pcisf controller 0 pfnum 0 sfnum 107 splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable ==================== Link: https://lore.kernel.org/r/20231216123001.1293639-1-jiri@resnulli.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Ido Schimmel says: ==================== Add MDB bulk deletion support This patchset adds MDB bulk deletion support, allowing user space to request the deletion of matching entries instead of dumping the entire MDB and issuing a separate deletion request for each matching entry. Support is added in both the bridge and VXLAN drivers in a similar fashion to the existing FDB bulk deletion support. The parameters according to which bulk deletion can be performed are similar to the FDB ones, namely: Destination port, VLAN ID, state (e.g., "permanent"), routing protocol, source / destination VNI, destination IP and UDP port. Flushing based on flags (e.g., "offload", "fast_leave", "added_by_star_ex", "blocked") is not currently supported, but can be added in the future, if a use case arises. Patch #1 adds a new uAPI attribute to allow specifying the state mask according to which bulk deletion will be performed, if any. Patch #2 adds a new policy according to which bulk deletion requests (with 'NLM_F_BULK' flag set) will be parsed. Patches #3-#4 add a new NDO for MDB bulk deletion and invoke it from the rtnetlink code when a bulk deletion request is made. Patches #5-#6 implement the MDB bulk deletion NDO in the bridge and VXLAN drivers, respectively. Patch #7 allows user space to issue MDB bulk deletion requests by no longer rejecting the 'NLM_F_BULK' flag when it is set in 'RTM_DELMDB' requests. Patches #8-#9 add selftests for both drivers, for both good and bad flows. iproute2 changes can be found here [1]. https://github.com/idosch/iproute2/tree/submit/mdb_flush_v1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Wen Gu says: ==================== net/smc: implement SMCv2.1 virtual ISM device support The fourth edition of SMCv2 adds the SMC version 2.1 feature updates for SMC-Dv2 with virtual ISM. Virtual ISM are created and supported mainly by OS or hypervisor software, comparable to IBM ISM which is based on platform firmware or hardware. With the introduction of virtual ISM, SMCv2.1 makes some updates: - Introduce feature bitmask to indicate supplemental features. - Reserve a range of CHIDs for virtual ISM. - Support extended GIDs (128 bits) in CLC handshake. So this patch set aims to implement these updates in Linux kernel. And it acts as the first part of SMC-D virtual ISM extension & loopback-ism [1]. [1] https://lore.kernel.org/netdev/1695568613-125057-1-git-send-email-guwen@linux.alibaba.com/ v8->v7: - Patch #7: v7 mistakenly changed the type of gid_ext in smc_clc_msg_accept_confirm to u64 instead of __be64 as previous versions when fixing the rebase conflicts. So fix this mistake. v7->v6: Link: https://lore.kernel.org/netdev/20231219084536.8158-1-guwen@linux.alibaba.com/ - Collect the Reviewed-by tag in v6; - Patch #3: redefine the struct smc_clc_msg_accept_confirm; - Patch #7: Because that the Patch #3 already adds '__packed' to smc_clc_msg_accept_confirm, so Patch #7 doesn't need to do the same thing. But this is a minor change, so I kept the 'Reviewed-by' tag. Other changes in previous versions but not yet acked: - Patch #1: Some minor changes in subject and fix the format issue (length exceeds 80 columns) compared to v3. - Patch #5: removes useless ini->feature_mask assignment in __smc_connect() and smc_listen_v2_check() compared to v4. - Patch #8: new added, compared to v3. v6->v5: Link: https://lore.kernel.org/netdev/1702371151-125258-1-git-send-email-guwen@linux.alibaba.com/ - Add 'Reviewed-by' label given in the previous versions: * Patch #4, #6, #9, #10 have nothing changed since v3; - Patch #2: * fix the format issue (Alignment should match open parenthesis) compared to v5; * remove useless clc->hdr.length assignment in smcr_clc_prep_confirm_accept() compared to v5; - Patch #3: new added compared to v5. - Patch #7: some minor changes like aclc_v2->aclc or clc_v2->clc compared to v5 due to the introduction of Patch #3. Since there were no major changes, I kept the 'Reviewed-by' label. Other changes in previous versions but not yet acked: - Patch #1: Some minor changes in subject and fix the format issue (length exceeds 80 columns) compared to v3. - Patch #5: removes useless ini->feature_mask assignment in __smc_connect() and smc_listen_v2_check() compared to v4. - Patch #8: new added, compared to v3. v5->v4: Link: https://lore.kernel.org/netdev/1702021259-41504-1-git-send-email-guwen@linux.alibaba.com/ - Patch #6: improve the comment of SMCD_CLC_MAX_V2_GID_ENTRIES; - Patch #4: remove useless ini->feature_mask assignment; v4->v3: https://lore.kernel.org/netdev/1701920994-73705-1-git-send-email-guwen@linux.alibaba.com/ - Patch #6: use SMCD_CLC_MAX_V2_GID_ENTRIES to indicate the max gid entries in CLC proposal and using SMC_MAX_V2_ISM_DEVS to indicate the max devices to propose; - Patch #6: use i and i+1 in smc_find_ism_v2_device_serv(); - Patch #2: replace the large if-else block in smc_clc_send_confirm_accept() with 2 subfunctions; - Fix missing byte order conversion of GID and token in CLC handshake, which is in a separate patch sending to net: https://lore.kernel.org/netdev/1701882157-87956-1-git-send-email-guwen@linux.alibaba.com/ - Patch #7: add extended GID in SMC-D lgr netlink attribute; v3->v2: https://lore.kernel.org/netdev/1701343695-122657-1-git-send-email-guwen@linux.alibaba.com/ - Rename smc_clc_fill_fce as smc_clc_fill_fce_v2x; - Remove ISM_IDENT_MASK from drivers/s390/net/ism.h; - Add explicitly assigning 'false' to ism_v2_capable in ism_dev_init(); - Remove smc_ism_set_v2_capable() helper for now, and introduce it in later loopback-ism implementation; v2->v1: - Fix sparse complaint; - Rebase to the latest net-next; ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
QXL driver doesn't use any device for DMA mappings or allocations so dev_to_node() will panic inside ttm_device_init() on NUMA systems: general protection fault, probably for non-canonical address 0xdffffc000000007a: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x00000000000003d0-0x00000000000003d7] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.7.0+ #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:ttm_device_init+0x10e/0x340 Call Trace: <TASK> qxl_ttm_init+0xaa/0x310 qxl_device_init+0x1071/0x2000 qxl_pci_probe+0x167/0x3f0 local_pci_probe+0xe1/0x1b0 pci_device_probe+0x29d/0x790 really_probe+0x251/0x910 __driver_probe_device+0x1ea/0x390 driver_probe_device+0x4e/0x2e0 __driver_attach+0x1e3/0x600 bus_for_each_dev+0x12d/0x1c0 bus_add_driver+0x25a/0x590 driver_register+0x15c/0x4b0 qxl_pci_driver_init+0x67/0x80 do_one_initcall+0xf5/0x5d0 kernel_init_freeable+0x637/0xb10 kernel_init+0x1c/0x2e0 ret_from_fork+0x48/0x80 ret_from_fork_asm+0x1b/0x30 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:ttm_device_init+0x10e/0x340 Fall back to NUMA_NO_NODE if there is no device for DMA. Found by Linux Verification Center (linuxtesting.org). Fixes: b0a7ce5 ("drm/ttm: Schedule delayed_delete worker closer") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Link: https://patchwork.freedesktop.org/patch/msgid/20240113213347.9562-1-pchelkin@ispras.ru Signed-off-by: Christian König <christian.koenig@amd.com>
The QXL driver doesn't use any device for DMA mappings or allocations so dev_to_node() will panic inside ttm_device_init() on NUMA systems: general protection fault, probably for non-canonical address 0xdffffc000000007a: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x00000000000003d0-0x00000000000003d7] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.7.0+ #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:ttm_device_init+0x10e/0x340 Call Trace: qxl_ttm_init+0xaa/0x310 qxl_device_init+0x1071/0x2000 qxl_pci_probe+0x167/0x3f0 local_pci_probe+0xe1/0x1b0 pci_device_probe+0x29d/0x790 really_probe+0x251/0x910 __driver_probe_device+0x1ea/0x390 driver_probe_device+0x4e/0x2e0 __driver_attach+0x1e3/0x600 bus_for_each_dev+0x12d/0x1c0 bus_add_driver+0x25a/0x590 driver_register+0x15c/0x4b0 qxl_pci_driver_init+0x67/0x80 do_one_initcall+0xf5/0x5d0 kernel_init_freeable+0x637/0xb10 kernel_init+0x1c/0x2e0 ret_from_fork+0x48/0x80 ret_from_fork_asm+0x1b/0x30 RIP: 0010:ttm_device_init+0x10e/0x340 Fall back to NUMA_NO_NODE if there is no device for DMA. Found by Linux Verification Center (linuxtesting.org). Fixes: b0a7ce5 ("drm/ttm: Schedule delayed_delete worker closer") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Christian König <christian.koenig@amd.com> Reported-by: Steven Rostedt <rostedt@goodmis.org> Cc: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A recent change in acp_irq_thread() was meant to address a potential race condition while trying to acquire the hardware semaphore responsible for the synchronization between firmware and host IPC interrupts. This resulted in an improper use of the IPC spinlock, causing normal kernel memory allocations (which may sleep) inside atomic contexts: 1707255557.133976 kernel: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:315 ... 1707255557.134757 kernel: sof_ipc3_rx_msg+0x70/0x130 [snd_sof] 1707255557.134793 kernel: acp_sof_ipc_irq_thread+0x1e0/0x550 [snd_sof_amd_acp] 1707255557.134855 kernel: acp_irq_thread+0xa3/0x130 [snd_sof_amd_acp] 1707255557.134904 kernel: ? irq_thread+0xb5/0x1e0 1707255557.134947 kernel: ? __pfx_irq_thread_fn+0x10/0x10 1707255557.134985 kernel: irq_thread_fn+0x23/0x60 Moreover, there are attempts to lock a mutex from the same atomic context: 1707255557.136357 kernel: ============================= 1707255557.136393 kernel: [ BUG: Invalid wait context ] 1707255557.136413 kernel: 6.8.0-rc3-next-20240206-audio-next #9 Tainted: G W 1707255557.136432 kernel: ----------------------------- 1707255557.136451 kernel: irq/66-AudioDSP/502 is trying to lock: 1707255557.136470 kernel: ffff965152f26af8 (&sb->s_type->i_mutex_key#2){+.+.}-{3:3}, at: start_creating.part.0+0x5f/0x180 ... 1707255557.137429 kernel: start_creating.part.0+0x5f/0x180 1707255557.137457 kernel: __debugfs_create_file+0x61/0x210 1707255557.137475 kernel: snd_sof_debugfs_io_item+0x75/0xc0 [snd_sof] 1707255557.137494 kernel: sof_ipc3_do_rx_work+0x7cf/0x9f0 [snd_sof] 1707255557.137513 kernel: sof_ipc3_rx_msg+0xb3/0x130 [snd_sof] 1707255557.137532 kernel: acp_sof_ipc_irq_thread+0x1e0/0x550 [snd_sof_amd_acp] 1707255557.137551 kernel: acp_irq_thread+0xa3/0x130 [snd_sof_amd_acp] Fix the issues by reducing the lock scope in acp_irq_thread(), so that it guards only the hardware semaphore acquiring attempt. Additionally, restore the initial locking in acp_sof_ipc_irq_thread() to synchronize the handling of immediate replies from DSP core. Fixes: 802134c ("ASoC: SOF: amd: Refactor spinlock_irq(&sdev->ipc_lock) sequence in irq_handler") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://lore.kernel.org/r/20240208234315.2182048-1-cristian.ciocaltea@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
…lblished(). syzkaller reported a warning [0] in inet_csk_destroy_sock() with no repro. WARN_ON(inet_sk(sk)->inet_num && !inet_csk(sk)->icsk_bind_hash); However, the syzkaller's log hinted that connect() failed just before the warning due to FAULT_INJECTION. [1] When connect() is called for an unbound socket, we search for an available ephemeral port. If a bhash bucket exists for the port, we call __inet_check_established() or __inet6_check_established() to check if the bucket is reusable. If reusable, we add the socket into ehash and set inet_sk(sk)->inet_num. Later, we look up the corresponding bhash2 bucket and try to allocate it if it does not exist. Although it rarely occurs in real use, if the allocation fails, we must revert the changes by check_established(). Otherwise, an unconnected socket could illegally occupy an ehash entry. Note that we do not put tw back into ehash because sk might have already responded to a packet for tw and it would be better to free tw earlier under such memory presure. [0]: WARNING: CPU: 0 PID: 350830 at net/ipv4/inet_connection_sock.c:1193 inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193) Modules linked in: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193) Code: 41 5c 41 5d 41 5e e9 2d 4a 3d fd e8 28 4a 3d fd 48 89 ef e8 f0 cd 7d ff 5b 5d 41 5c 41 5d 41 5e e9 13 4a 3d fd e8 0e 4a 3d fd <0f> 0b e9 61 fe ff ff e8 02 4a 3d fd 4c 89 e7 be 03 00 00 00 e8 05 RSP: 0018:ffffc9000b21fd38 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000009e78 RCX: ffffffff840bae40 RDX: ffff88806e46c600 RSI: ffffffff840bb012 RDI: ffff88811755cca8 RBP: ffff88811755c880 R08: 0000000000000003 R09: 0000000000000000 R10: 0000000000009e78 R11: 0000000000000000 R12: ffff88811755c8e0 R13: ffff88811755c892 R14: ffff88811755c918 R15: 0000000000000000 FS: 00007f03e5243800(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32f21000 CR3: 0000000112ffe001 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> ? inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193) dccp_close (net/dccp/proto.c:1078) inet_release (net/ipv4/af_inet.c:434) __sock_release (net/socket.c:660) sock_close (net/socket.c:1423) __fput (fs/file_table.c:377) __fput_sync (fs/file_table.c:462) __x64_sys_close (fs/open.c:1557 fs/open.c:1539 fs/open.c:1539) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7f03e53852bb Code: 03 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 43 c9 f5 ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 c9 f5 ff 8b 44 RSP: 002b:00000000005dfba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f03e53852bb RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000167c R10: 0000000008a79680 R11: 0000000000000293 R12: 00007f03e4e43000 R13: 00007f03e4e43170 R14: 00007f03e4e43178 R15: 00007f03e4e43170 </TASK> [1]: FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 0 CPU: 0 PID: 350833 Comm: syz-executor.1 Not tainted 6.7.0-12272-g2121c43f88f5 #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153) should_failslab (mm/slub.c:3748) kmem_cache_alloc (mm/slub.c:3763 mm/slub.c:3842 mm/slub.c:3867) inet_bind2_bucket_create (net/ipv4/inet_hashtables.c:135) __inet_hash_connect (net/ipv4/inet_hashtables.c:1100) dccp_v4_connect (net/dccp/ipv4.c:116) __inet_stream_connect (net/ipv4/af_inet.c:676) inet_stream_connect (net/ipv4/af_inet.c:747) __sys_connect_file (net/socket.c:2048 (discriminator 2)) __sys_connect (net/socket.c:2065) __x64_sys_connect (net/socket.c:2072) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7f03e5284e5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007f03e4641cc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007f03e5284e5d RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000003 RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 000000000000000b R14: 00007f03e52e5530 R15: 0000000000000000 </TASK> Reported-by: syzkaller <syzkaller@googlegroups.com> Fixes: 28044fc ("net: Add a bhash2 table hashed by port and address") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(1 << idx) of int is not desired when setting bits in unsigned long overflowed_ctrs, use BIT() instead. This panic happens when running 'perf record -e branches' on sophgo sg2042. [ 273.311852] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000098 [ 273.320851] Oops [#1] [ 273.323179] Modules linked in: [ 273.326303] CPU: 0 PID: 1475 Comm: perf Not tainted 6.6.0-rc3+ Rust-for-Linux#9 [ 273.332521] Hardware name: Sophgo Mango (DT) [ 273.336878] epc : riscv_pmu_ctr_get_width_mask+0x8/0x62 [ 273.342291] ra : pmu_sbi_ovf_handler+0x2e0/0x34e [ 273.347091] epc : ffffffff80aecd98 ra : ffffffff80aee056 sp : fffffff6e36928b0 [ 273.354454] gp : ffffffff821f82d0 tp : ffffffd90c353200 t0 : 0000002ade4f9978 [ 273.361815] t1 : 0000000000504d55 t2 : ffffffff8016cd8c s0 : fffffff6e3692a70 [ 273.369180] s1 : 0000000000000020 a0 : 0000000000000000 a1 : 00001a8e81800000 [ 273.376540] a2 : 0000003c00070198 a3 : 0000003c00db75a4 a4 : 0000000000000015 [ 273.383901] a5 : ffffffd7ff8804b0 a6 : 0000000000000015 a7 : 000000000000002a [ 273.391327] s2 : 000000000000ffff s3 : 0000000000000000 s4 : ffffffd7ff8803b0 [ 273.398773] s5 : 0000000000504d55 s6 : ffffffd905069800 s7 : ffffffff821fe210 [ 273.406139] s8 : 000000007fffffff s9 : ffffffd7ff8803b0 s10: ffffffd903f29098 [ 273.413660] s11: 0000000080000000 t3 : 0000000000000003 t4 : ffffffff8017a0ca [ 273.421022] t5 : ffffffff8023cfc2 t6 : ffffffd9040780e8 [ 273.426437] status: 0000000200000100 badaddr: 0000000000000098 cause: 000000000000000d [ 273.434512] [<ffffffff80aecd98>] riscv_pmu_ctr_get_width_mask+0x8/0x62 [ 273.441169] [<ffffffff80076bd8>] handle_percpu_devid_irq+0x98/0x1ee [ 273.447562] [<ffffffff80071158>] generic_handle_domain_irq+0x28/0x36 [ 273.454151] [<ffffffff8047a99a>] riscv_intc_irq+0x36/0x4e [ 273.459659] [<ffffffff80c944de>] handle_riscv_irq+0x4a/0x74 [ 273.465442] [<ffffffff80c94c48>] do_irq+0x62/0x92 [ 273.470360] Code: 0420 60a2 6402 5529 0141 8082 0013 0000 0013 0000 (6d5c) b783 [ 273.477921] ---[ end trace 0000000000000000 ]--- [ 273.482630] Kernel panic - not syncing: Fatal exception in interrupt Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Fei Wu <fei2.wu@intel.com> Link: https://lore.kernel.org/r/20240228115425.2613856-1-fei2.wu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Passing a maximum attribute type to nlmsg_parse() that is larger than the size of the passed policy will result in an out-of-bounds access [1] when the attribute type is used as an index into the policy array. Fix by setting the maximum attribute type according to the policy size, as is already done for RTM_NEWNEXTHOP messages. Add a test case that triggers the bug. No regressions in fib nexthops tests: # ./fib_nexthops.sh [...] Tests passed: 236 Tests failed: 0 [1] BUG: KASAN: global-out-of-bounds in __nla_validate_parse+0x1e53/0x2940 Read of size 1 at addr ffffffff99ab4d20 by task ip/610 CPU: 3 PID: 610 Comm: ip Not tainted 6.8.0-rc7-custom-gd435d6e3e161 #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x8f/0xe0 print_report+0xcf/0x670 kasan_report+0xd8/0x110 __nla_validate_parse+0x1e53/0x2940 __nla_parse+0x40/0x50 rtm_del_nexthop+0x1bd/0x400 rtnetlink_rcv_msg+0x3cc/0xf20 netlink_rcv_skb+0x170/0x440 netlink_unicast+0x540/0x820 netlink_sendmsg+0x8d3/0xdb0 ____sys_sendmsg+0x31f/0xa60 ___sys_sendmsg+0x13a/0x1e0 __sys_sendmsg+0x11c/0x1f0 do_syscall_64+0xc5/0x1d0 entry_SYSCALL_64_after_hwframe+0x63/0x6b [...] The buggy address belongs to the variable: rtm_nh_policy_del+0x20/0x40 Fixes: 2118f93 ("net: nexthop: Adjust netlink policy parsing for a new attribute") Reported-by: Eric Dumazet <edumazet@google.com> Closes: https://lore.kernel.org/netdev/CANn89i+UNcG0PJMW5X7gOMunF38ryMh=L1aeZUKH3kL4UdUqag@mail.gmail.com/ Reported-by: syzbot+65bb09a7208ce3d4a633@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/00000000000088981b06133bc07b@google.com/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240311162307.545385-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver creates /sys/kernel/debug/dri/0/mob_ttm even when the corresponding ttm_resource_manager is not allocated. This leads to a crash when trying to read from this file. Add a check to create mob_ttm, system_mob_ttm, and gmr_ttm debug file only when the corresponding ttm_resource_manager is allocated. crash> bt PID: 3133409 TASK: ffff8fe4834a5000 CPU: 3 COMMAND: "grep" #0 [ffffb954506b3b20] machine_kexec at ffffffffb2a6bec3 #1 [ffffb954506b3b78] __crash_kexec at ffffffffb2bb598a Rust-for-Linux#2 [ffffb954506b3c38] crash_kexec at ffffffffb2bb68c1 Rust-for-Linux#3 [ffffb954506b3c50] oops_end at ffffffffb2a2a9b1 Rust-for-Linux#4 [ffffb954506b3c70] no_context at ffffffffb2a7e913 Rust-for-Linux#5 [ffffb954506b3cc8] __bad_area_nosemaphore at ffffffffb2a7ec8c Rust-for-Linux#6 [ffffb954506b3d10] do_page_fault at ffffffffb2a7f887 Rust-for-Linux#7 [ffffb954506b3d40] page_fault at ffffffffb360116e [exception RIP: ttm_resource_manager_debug+0x11] RIP: ffffffffc04afd11 RSP: ffffb954506b3df0 RFLAGS: 00010246 RAX: ffff8fe41a6d1200 RBX: 0000000000000000 RCX: 0000000000000940 RDX: 0000000000000000 RSI: ffffffffc04b4338 RDI: 0000000000000000 RBP: ffffb954506b3e08 R8: ffff8fee3ffad000 R9: 0000000000000000 R10: ffff8fe41a76a000 R11: 0000000000000001 R12: 00000000ffffffff R13: 0000000000000001 R14: ffff8fe5bb6f3900 R15: ffff8fe41a6d1200 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Rust-for-Linux#8 [ffffb954506b3e00] ttm_resource_manager_show at ffffffffc04afde7 [ttm] Rust-for-Linux#9 [ffffb954506b3e30] seq_read at ffffffffb2d8f9f3 RIP: 00007f4c4eda8985 RSP: 00007ffdbba9e9f8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 000000000037e000 RCX: 00007f4c4eda8985 RDX: 000000000037e000 RSI: 00007f4c41573000 RDI: 0000000000000003 RBP: 000000000037e000 R8: 0000000000000000 R9: 000000000037fe30 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4c41573000 R13: 0000000000000003 R14: 00007f4c41572010 R15: 0000000000000003 ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Fixes: af4a25b ("drm/vmwgfx: Add debugfs entries for various ttm resource managers") Cc: <stable@vger.kernel.org> Reviewed-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240312093551.196609-1-jfalempe@redhat.com
vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e #3 [fffffe00003fced0] do_nmi at ffffffff8922660d #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 #12 [ffffa65531497b68] printk at ffffffff89318306 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] #18 [ffffa65531497f10] kthread at ffffffff892d2e72 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
…PLES event" This reverts commit 7d1405c. This causes segfaults in some cases, as reported by Milian: ``` sudo /usr/bin/perf record -z --call-graph dwarf -e cycles -e raw_syscalls:sys_enter ls ... [ perf record: Woken up 3 times to write data ] malloc(): invalid next size (unsorted) Aborted ``` Backtrace with GDB + debuginfod: ``` malloc(): invalid next size (unsorted) Thread 1 "perf" received signal SIGABRT, Aborted. __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 Downloading source file /usr/src/debug/glibc/glibc/nptl/pthread_kill.c 44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0; (gdb) bt #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x00007ffff6ea8eb3 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78 #2 0x00007ffff6e50a30 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/ raise.c:26 #3 0x00007ffff6e384c3 in __GI_abort () at abort.c:79 #4 0x00007ffff6e39354 in __libc_message_impl (fmt=fmt@entry=0x7ffff6fc22ea "%s\n") at ../sysdeps/posix/libc_fatal.c:132 #5 0x00007ffff6eb3085 in malloc_printerr (str=str@entry=0x7ffff6fc5850 "malloc(): invalid next size (unsorted)") at malloc.c:5772 #6 0x00007ffff6eb657c in _int_malloc (av=av@entry=0x7ffff6ff6ac0 <main_arena>, bytes=bytes@entry=368) at malloc.c:4081 #7 0x00007ffff6eb877e in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3754 #8 0x000055555569bdb6 in perf_session.do_write_header () #9 0x00005555555a373a in __cmd_record.constprop.0 () #10 0x00005555555a6846 in cmd_record () #11 0x000055555564db7f in run_builtin () #12 0x000055555558ed77 in main () ``` Valgrind memcheck: ``` ==45136== Invalid write of size 8 ==45136== at 0x2B38A5: perf_event__synthesize_id_sample (in /usr/bin/perf) ==45136== by 0x157069: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd ==45136== at 0x4849BF3: calloc (vg_replace_malloc.c:1675) ==45136== by 0x3574AB: zalloc (in /usr/bin/perf) ==45136== by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== ==45136== Syscall param write(buf) points to unaddressable byte(s) ==45136== at 0x575953D: __libc_write (write.c:26) ==45136== by 0x575953D: write (write.c:24) ==45136== by 0x35761F: ion (in /usr/bin/perf) ==45136== by 0x357778: writen (in /usr/bin/perf) ==45136== by 0x1548F7: record__write (in /usr/bin/perf) ==45136== by 0x15708A: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd ==45136== at 0x4849BF3: calloc (vg_replace_malloc.c:1675) ==45136== by 0x3574AB: zalloc (in /usr/bin/perf) ==45136== by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== ----- Closes: https://lore.kernel.org/linux-perf-users/23879991.0LEYPuXRzz@milian-workstation/ Reported-by: Milian Wolff <milian.wolff@kdab.com> Tested-by: Milian Wolff <milian.wolff@kdab.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@kernel.org # 6.8+ Link: https://lore.kernel.org/lkml/Zl9ksOlHJHnKM70p@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We have been seeing crashes on duplicate keys in btrfs_set_item_key_safe(): BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192) ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:2620! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs] With the following stack trace: #0 btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4) #1 btrfs_drop_extents (fs/btrfs/file.c:411:4) #2 log_one_extent (fs/btrfs/tree-log.c:4732:9) #3 btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9) #4 btrfs_log_inode (fs/btrfs/tree-log.c:6626:9) #5 btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8) #6 btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8) #7 btrfs_sync_file (fs/btrfs/file.c:1933:8) #8 vfs_fsync_range (fs/sync.c:188:9) #9 vfs_fsync (fs/sync.c:202:9) #10 do_fsync (fs/sync.c:212:9) #11 __do_sys_fdatasync (fs/sync.c:225:9) #12 __se_sys_fdatasync (fs/sync.c:223:1) #13 __x64_sys_fdatasync (fs/sync.c:223:1) #14 do_syscall_x64 (arch/x86/entry/common.c:52:14) #15 do_syscall_64 (arch/x86/entry/common.c:83:7) #16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121) So we're logging a changed extent from fsync, which is splitting an extent in the log tree. But this split part already exists in the tree, triggering the BUG(). This is the state of the log tree at the time of the crash, dumped with drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py) to get more details than btrfs_print_leaf() gives us: >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"]) leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610 leaf 33439744 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 7 transid 9 size 8192 nbytes 8473563889606862198 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 204 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417704.983333333 (2024-05-22 15:41:44) mtime 1716417704.983333333 (2024-05-22 15:41:44) otime 17592186044416.000000000 (559444-03-08 01:40:16) item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13 index 195 namelen 3 name: 193 item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 4096 ram 12288 extent compression 0 (none) item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 4096 nr 8192 item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 ... So the real problem happened earlier: notice that items 4 (4k-12k) and 5 (8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and item 5 starts at i_size. Here is the state of the filesystem tree at the time of the crash: >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0)) >>> print_extent_buffer(nodes[0]) leaf 30425088 level 0 items 184 generation 9 owner 5 leaf 30425088 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da ... item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160 generation 7 transid 7 size 4096 nbytes 12288 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 6 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417703.220000000 (2024-05-22 15:41:43) mtime 1716417703.220000000 (2024-05-22 15:41:43) otime 1716417703.220000000 (2024-05-22 15:41:43) item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13 index 195 namelen 3 name: 193 item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 8192 ram 12288 extent compression 0 (none) item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 Item 5 in the log tree corresponds to item 183 in the filesystem tree, but nothing matches item 4. Furthermore, item 183 is the last item in the leaf. btrfs_log_prealloc_extents() is responsible for logging prealloc extents beyond i_size. It first truncates any previously logged prealloc extents that start beyond i_size. Then, it walks the filesystem tree and copies the prealloc extent items to the log tree. If it hits the end of a leaf, then it calls btrfs_next_leaf(), which unlocks the tree and does another search. However, while the filesystem tree is unlocked, an ordered extent completion may modify the tree. In particular, it may insert an extent item that overlaps with an extent item that was already copied to the log tree. This may manifest in several ways depending on the exact scenario, including an EEXIST error that is silently translated to a full sync, overlapping items in the log tree, or this crash. This particular crash is triggered by the following sequence of events: - Initially, the file has i_size=4k, a regular extent from 0-4k, and a prealloc extent beyond i_size from 4k-12k. The prealloc extent item is the last item in its B-tree leaf. - The file is fsync'd, which copies its inode item and both extent items to the log tree. - An xattr is set on the file, which sets the BTRFS_INODE_COPY_EVERYTHING flag. - The range 4k-8k in the file is written using direct I/O. i_size is extended to 8k, but the ordered extent is still in flight. - The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this calls copy_inode_items_to_log(), which calls btrfs_log_prealloc_extents(). - btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the filesystem tree. Since it starts before i_size, it skips it. Since it is the last item in its B-tree leaf, it calls btrfs_next_leaf(). - btrfs_next_leaf() unlocks the path. - The ordered extent completion runs, which converts the 4k-8k part of the prealloc extent to written and inserts the remaining prealloc part from 8k-12k. - btrfs_next_leaf() does a search and finds the new prealloc extent 8k-12k. - btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into the log tree. Note that it overlaps with the 4k-12k prealloc extent that was copied to the log tree by the first fsync. - fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k extent that was written. - This tries to drop the range 4k-8k in the log tree, which requires adjusting the start of the 4k-12k prealloc extent in the log tree to 8k. - btrfs_set_item_key_safe() sees that there is already an extent starting at 8k in the log tree and calls BUG(). Fix this by detecting when we're about to insert an overlapping file extent item in the log tree and truncating the part that would overlap. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
xarray can't support arbitrary page cache size. the largest and supported page cache size is defined as MAX_PAGECACHE_ORDER by commit 099d906 ("mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray"). However, it's possible to have 512MB page cache in the huge memory's collapsing path on ARM64 system whose base page size is 64KB. 512MB page cache is breaking the limitation and a warning is raised when the xarray entry is split as shown in the following example. [root@dhcp-10-26-1-207 ~]# cat /proc/1/smaps | grep KernelPageSize KernelPageSize: 64 kB [root@dhcp-10-26-1-207 ~]# cat /tmp/test.c : int main(int argc, char **argv) { const char *filename = TEST_XFS_FILENAME; int fd = 0; void *buf = (void *)-1, *p; int pgsize = getpagesize(); int ret = 0; if (pgsize != 0x10000) { fprintf(stdout, "System with 64KB base page size is required!\n"); return -EPERM; } system("echo 0 > /sys/devices/virtual/bdi/253:0/read_ahead_kb"); system("echo 1 > /proc/sys/vm/drop_caches"); /* Open the xfs file */ fd = open(filename, O_RDONLY); assert(fd > 0); /* Create VMA */ buf = mmap(NULL, TEST_MEM_SIZE, PROT_READ, MAP_SHARED, fd, 0); assert(buf != (void *)-1); fprintf(stdout, "mapped buffer at 0x%p\n", buf); /* Populate VMA */ ret = madvise(buf, TEST_MEM_SIZE, MADV_NOHUGEPAGE); assert(ret == 0); ret = madvise(buf, TEST_MEM_SIZE, MADV_POPULATE_READ); assert(ret == 0); /* Collapse VMA */ ret = madvise(buf, TEST_MEM_SIZE, MADV_HUGEPAGE); assert(ret == 0); ret = madvise(buf, TEST_MEM_SIZE, MADV_COLLAPSE); if (ret) { fprintf(stdout, "Error %d to madvise(MADV_COLLAPSE)\n", errno); goto out; } /* Split xarray entry. Write permission is needed */ munmap(buf, TEST_MEM_SIZE); buf = (void *)-1; close(fd); fd = open(filename, O_RDWR); assert(fd > 0); fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, TEST_MEM_SIZE - pgsize, pgsize); out: if (buf != (void *)-1) munmap(buf, TEST_MEM_SIZE); if (fd > 0) close(fd); return ret; } [root@dhcp-10-26-1-207 ~]# gcc /tmp/test.c -o /tmp/test [root@dhcp-10-26-1-207 ~]# /tmp/test ------------[ cut here ]------------ WARNING: CPU: 25 PID: 7560 at lib/xarray.c:1025 xas_split_alloc+0xf8/0x128 Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib \ nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct \ nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 \ ip_set rfkill nf_tables nfnetlink vfat fat virtio_balloon drm fuse \ xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 virtio_net \ sha1_ce net_failover virtio_blk virtio_console failover dimlib virtio_mmio CPU: 25 PID: 7560 Comm: test Kdump: loaded Not tainted 6.10.0-rc7-gavin+ #9 Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-1.el9 05/24/2024 pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : xas_split_alloc+0xf8/0x128 lr : split_huge_page_to_list_to_order+0x1c4/0x780 sp : ffff8000ac32f660 x29: ffff8000ac32f660 x28: ffff0000e0969eb0 x27: ffff8000ac32f6c0 x26: 0000000000000c40 x25: ffff0000e0969eb0 x24: 000000000000000d x23: ffff8000ac32f6c0 x22: ffffffdfc0700000 x21: 0000000000000000 x20: 0000000000000000 x19: ffffffdfc0700000 x18: 0000000000000000 x17: 0000000000000000 x16: ffffd5f3708ffc70 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: ffffffffffffffc0 x10: 0000000000000040 x9 : ffffd5f3708e692c x8 : 0000000000000003 x7 : 0000000000000000 x6 : ffff0000e0969eb8 x5 : ffffd5f37289e378 x4 : 0000000000000000 x3 : 0000000000000c40 x2 : 000000000000000d x1 : 000000000000000c x0 : 0000000000000000 Call trace: xas_split_alloc+0xf8/0x128 split_huge_page_to_list_to_order+0x1c4/0x780 truncate_inode_partial_folio+0xdc/0x160 truncate_inode_pages_range+0x1b4/0x4a8 truncate_pagecache_range+0x84/0xa0 xfs_flush_unmap_range+0x70/0x90 [xfs] xfs_file_fallocate+0xfc/0x4d8 [xfs] vfs_fallocate+0x124/0x2f0 ksys_fallocate+0x4c/0xa0 __arm64_sys_fallocate+0x24/0x38 invoke_syscall.constprop.0+0x7c/0xd8 do_el0_svc+0xb4/0xd0 el0_svc+0x44/0x1d8 el0t_64_sync_handler+0x134/0x150 el0t_64_sync+0x17c/0x180 Fix it by correcting the supported page cache orders, different sets for DAX and other files. With it corrected, 512MB page cache becomes disallowed on all non-DAX files on ARM64 system where the base page size is 64KB. After this patch is applied, the test program fails with error -EINVAL returned from __thp_vma_allowable_orders() and the madvise() system call to collapse the page caches. Link: https://lkml.kernel.org/r/20240715000423.316491-1-gshan@redhat.com Fixes: 6b24ca4 ("mm: Use multi-index entries in the page cache") Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: <stable@vger.kernel.org> [5.17+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When l2tp tunnels use a socket provided by userspace, we can hit lockdep splats like the below when data is transmitted through another (unrelated) userspace socket which then gets routed over l2tp. This issue was previously discussed here: https://lore.kernel.org/netdev/87sfialu2n.fsf@cloudflare.com/ The solution is to have lockdep treat socket locks of l2tp tunnel sockets separately than those of standard INET sockets. To do so, use a different lockdep subclass where lock nesting is possible. ============================================ WARNING: possible recursive locking detected 6.10.0+ Rust-for-Linux#34 Not tainted -------------------------------------------- iperf3/771 is trying to acquire lock: ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0 but task is already holding lock: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(slock-AF_INET/1); lock(slock-AF_INET/1); *** DEADLOCK *** May be due to missing lock nesting notation 10 locks held by iperf3/771: #0: ffff888102650258 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x1a/0x40 Rust-for-Linux#1: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0 Rust-for-Linux#2: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130 Rust-for-Linux#3: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x28b/0x9f0 Rust-for-Linux#4: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0xf9/0x260 Rust-for-Linux#5: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10 Rust-for-Linux#6: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0 Rust-for-Linux#7: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130 Rust-for-Linux#8: ffffffff822ac1e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0xcc/0x1450 Rust-for-Linux#9: ffff888101f33258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock#2){+...}-{2:2}, at: __dev_queue_xmit+0x513/0x1450 stack backtrace: CPU: 2 UID: 0 PID: 771 Comm: iperf3 Not tainted 6.10.0+ Rust-for-Linux#34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x69/0xa0 dump_stack+0xc/0x20 __lock_acquire+0x135d/0x2600 ? srso_alias_return_thunk+0x5/0xfbef5 lock_acquire+0xc4/0x2a0 ? l2tp_xmit_skb+0x243/0x9d0 ? __skb_checksum+0xa3/0x540 _raw_spin_lock_nested+0x35/0x50 ? l2tp_xmit_skb+0x243/0x9d0 l2tp_xmit_skb+0x243/0x9d0 l2tp_eth_dev_xmit+0x3c/0xc0 dev_hard_start_xmit+0x11e/0x420 sch_direct_xmit+0xc3/0x640 __dev_queue_xmit+0x61c/0x1450 ? ip_finish_output2+0xf4c/0x1130 ip_finish_output2+0x6b6/0x1130 ? srso_alias_return_thunk+0x5/0xfbef5 ? __ip_finish_output+0x217/0x380 ? srso_alias_return_thunk+0x5/0xfbef5 __ip_finish_output+0x217/0x380 ip_output+0x99/0x120 __ip_queue_xmit+0xae4/0xbc0 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? tcp_options_write.constprop.0+0xcb/0x3e0 ip_queue_xmit+0x34/0x40 __tcp_transmit_skb+0x1625/0x1890 __tcp_send_ack+0x1b8/0x340 tcp_send_ack+0x23/0x30 __tcp_ack_snd_check+0xa8/0x530 ? srso_alias_return_thunk+0x5/0xfbef5 tcp_rcv_established+0x412/0xd70 tcp_v4_do_rcv+0x299/0x420 tcp_v4_rcv+0x1991/0x1e10 ip_protocol_deliver_rcu+0x50/0x220 ip_local_deliver_finish+0x158/0x260 ip_local_deliver+0xc8/0xe0 ip_rcv+0xe5/0x1d0 ? __pfx_ip_rcv+0x10/0x10 __netif_receive_skb_one_core+0xce/0xe0 ? process_backlog+0x28b/0x9f0 __netif_receive_skb+0x34/0xd0 ? process_backlog+0x28b/0x9f0 process_backlog+0x2cb/0x9f0 __napi_poll.constprop.0+0x61/0x280 net_rx_action+0x332/0x670 ? srso_alias_return_thunk+0x5/0xfbef5 ? find_held_lock+0x2b/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 handle_softirqs+0xda/0x480 ? __dev_queue_xmit+0xa2c/0x1450 do_softirq+0xa1/0xd0 </IRQ> <TASK> __local_bh_enable_ip+0xc8/0xe0 ? __dev_queue_xmit+0xa2c/0x1450 __dev_queue_xmit+0xa48/0x1450 ? ip_finish_output2+0xf4c/0x1130 ip_finish_output2+0x6b6/0x1130 ? srso_alias_return_thunk+0x5/0xfbef5 ? __ip_finish_output+0x217/0x380 ? srso_alias_return_thunk+0x5/0xfbef5 __ip_finish_output+0x217/0x380 ip_output+0x99/0x120 __ip_queue_xmit+0xae4/0xbc0 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? tcp_options_write.constprop.0+0xcb/0x3e0 ip_queue_xmit+0x34/0x40 __tcp_transmit_skb+0x1625/0x1890 tcp_write_xmit+0x766/0x2fb0 ? __entry_text_end+0x102ba9/0x102bad ? srso_alias_return_thunk+0x5/0xfbef5 ? __might_fault+0x74/0xc0 ? srso_alias_return_thunk+0x5/0xfbef5 __tcp_push_pending_frames+0x56/0x190 tcp_push+0x117/0x310 tcp_sendmsg_locked+0x14c1/0x1740 tcp_sendmsg+0x28/0x40 inet_sendmsg+0x5d/0x90 sock_write_iter+0x242/0x2b0 vfs_write+0x68d/0x800 ? __pfx_sock_write_iter+0x10/0x10 ksys_write+0xc8/0xf0 __x64_sys_write+0x3d/0x50 x64_sys_call+0xfaf/0x1f50 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f4d143af992 Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 01 cc ff ff 41 54 b8 02 00 00 0 RSP: 002b:00007ffd65032058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4d143af992 RDX: 0000000000000025 RSI: 00007f4d143f3bcc RDI: 0000000000000005 RBP: 00007f4d143f2b28 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4d143f3bcc R13: 0000000000000005 R14: 0000000000000000 R15: 00007ffd650323f0 </TASK> Fixes: 0b2c597 ("l2tp: close all race conditions in l2tp_tunnel_register()") Suggested-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+6acef9e0a4d1f46c83d4@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6acef9e0a4d1f46c83d4 CC: gnault@redhat.com CC: cong.wang@bytedance.com Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Link: https://patch.msgid.link/20240806160626.1248317-1-jchapman@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A sysfs reader can race with a device reset or removal, attempting to read device state when the device is not actually present. eg: [exception RIP: qed_get_current_link+17] Rust-for-Linux#8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede] Rust-for-Linux#9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3 Rust-for-Linux#10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4 Rust-for-Linux#11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300 Rust-for-Linux#12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c Rust-for-Linux#13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b Rust-for-Linux#14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3 Rust-for-Linux#15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1 Rust-for-Linux#16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f Rust-for-Linux#17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb crash> struct net_device.state ffff9a9d21336000 state = 5, state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100). The device is not present, note lack of __LINK_STATE_PRESENT (0b10). This is the same sort of panic as observed in commit 4224cfd ("net-sysfs: add check for netdevice being present to speed_show"). There are many other callers of __ethtool_get_link_ksettings() which don't have a device presence check. Move this check into ethtool to protect all callers. Fixes: d519e17 ("net: export device speed and duplex via sysfs") Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On the node of an NFS client, some files saved in the mountpoint of the NFS server were copied to another location of the same NFS server. Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference crash with the following syslog: [232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [232066.588586] Mem abort info: [232066.588701] ESR = 0x0000000096000007 [232066.588862] EC = 0x25: DABT (current EL), IL = 32 bits [232066.589084] SET = 0, FnV = 0 [232066.589216] EA = 0, S1PTW = 0 [232066.589340] FSC = 0x07: level 3 translation fault [232066.589559] Data abort info: [232066.589683] ISV = 0, ISS = 0x00000007 [232066.589842] CM = 0, WnR = 0 [232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400 [232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000 [232066.590757] Internal error: Oops: 96000007 [#1] SMP [232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2 [232066.591052] vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs [232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1 [232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06 [232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4] [232066.598595] sp : ffff8000f568fc70 [232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000 [232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001 [232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050 [232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000 [232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000 [232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6 [232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828 [232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a [232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058 [232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000 [232066.601636] Call trace: [232066.601749] nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.601998] nfs4_do_reclaim+0x1b8/0x28c [nfsv4] [232066.602218] nfs4_state_manager+0x928/0x10f0 [nfsv4] [232066.602455] nfs4_run_state_manager+0x78/0x1b0 [nfsv4] [232066.602690] kthread+0x110/0x114 [232066.602830] ret_from_fork+0x10/0x20 [232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00) [232066.603284] SMP: stopping secondary CPUs [232066.606936] Starting crashdump kernel... [232066.607146] Bye! Analysing the vmcore, we know that nfs4_copy_state listed by destination nfs_server->ss_copies was added by the field copies in handle_async_copy(), and we found a waiting copy process with the stack as: PID: 3511963 TASK: ffff710028b47e00 CPU: 0 COMMAND: "cp" #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4 #1 [ffff8001116ef760] __schedule at ffff800008dd0650 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4] #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4] #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4] #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4] The NULL-pointer dereference was due to nfs42_complete_copies() listed the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state. So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and the data accessed through this pointer was also incorrect. Generally, the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state(). When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED and copies are not deleted in nfs_server->ss_copies, the source state may be passed to the nfs42_complete_copies() process earlier, resulting in this crash scene finally. To solve this issue, we add a list_head nfs_server->ss_src_copies for a server-to-server copy specially. Fixes: 0e65a32 ("NFS: handle source server reboot") Signed-off-by: Yanjun Zhang <zhangyanjun@cestc.cn> Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
There is a race between laundromat handling of revoked delegations and a client sending free_stateid operation. Laundromat thread finds that delegation has expired and needs to be revoked so it marks the delegation stid revoked and it puts it on a reaper list but then it unlock the state lock and the actual delegation revocation happens without the lock. Once the stid is marked revoked a racing free_stateid processing thread does the following (1) it calls list_del_init() which removes it from the reaper list and (2) frees the delegation stid structure. The laundromat thread ends up not calling the revoke_delegation() function for this particular delegation but that means it will no release the lock lease that exists on the file. Now, a new open for this file comes in and ends up finding that lease list isn't empty and calls nfsd_breaker_owns_lease() which ends up trying to derefence a freed delegation stateid. Leading to the followint use-after-free KASAN warning: kernel: ================================================================== kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205 kernel: kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ Rust-for-Linux#9 kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024 kernel: Call trace: kernel: dump_backtrace+0x98/0x120 kernel: show_stack+0x1c/0x30 kernel: dump_stack_lvl+0x80/0xe8 kernel: print_address_description.constprop.0+0x84/0x390 kernel: print_report+0xa4/0x268 kernel: kasan_report+0xb4/0xf8 kernel: __asan_report_load8_noabort+0x1c/0x28 kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd] kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd] kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd] kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd] kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd] kernel: nfsd4_open+0xa08/0xe80 [nfsd] kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd] kernel: nfsd_dispatch+0x22c/0x718 [nfsd] kernel: svc_process_common+0x8e8/0x1960 [sunrpc] kernel: svc_process+0x3d4/0x7e0 [sunrpc] kernel: svc_handle_xprt+0x828/0xe10 [sunrpc] kernel: svc_recv+0x2cc/0x6a8 [sunrpc] kernel: nfsd+0x270/0x400 [nfsd] kernel: kthread+0x288/0x310 kernel: ret_from_fork+0x10/0x20 This patch proposes a fixed that's based on adding 2 new additional stid's sc_status values that help coordinate between the laundromat and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()). First to make sure, that once the stid is marked revoked, it is not removed by the nfsd4_free_stateid(), the laundromat take a reference on the stateid. Then, coordinating whether the stid has been put on the cl_revoked list or we are processing FREE_STATEID and need to make sure to remove it from the list, each check that state and act accordingly. If laundromat has added to the cl_revoke list before the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove it from the list. If nfsd4_free_stateid() finds that operations arrived before laundromat has placed it on cl_revoke list, it marks the state freed and then laundromat will no longer add it to the list. Also, for nfsd4_delegreturn() when looking for the specified stid, we need to access stid that are marked removed or freeable, it means the laundromat has started processing it but hasn't finished and this delegreturn needs to return nfserr_deleg_revoked and not nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the lack of it will leave this stid on the cl_revoked list indefinitely. Fixes: 2d4a532 ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock") CC: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Currently building several built-in modules does not work since shared crates are re-compiled (and therefore re-linked) for each module.
The text was updated successfully, but these errors were encountered: