-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: add support for other map types to bpf_map_lookup_and_delete_elem #81
Conversation
Master branch: c64779e patch https://patchwork.ozlabs.org/project/netdev/patch/20200917135700.649909-1-luka.oreskovic@sartura.hr/ applied successfully |
map types other than stack and queue. This patch adds the necessary parts from bpf_map_lookup_elem and bpf_map_delete_elem so it works as expected for all map types. Signed-off-by: Luka Oreskovic <luka.oreskovic@sartura.hr> CC: Juraj Vijtiuk <juraj.vijtiuk@sartura.hr> CC: Luka Perkov <luka.perkov@sartura.hr> --- kernel/bpf/syscall.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-)
Master branch: 3b03791 patch https://patchwork.ozlabs.org/project/netdev/patch/20200917135700.649909-1-luka.oreskovic@sartura.hr/ applied successfully |
82f205d
to
b138d59
Compare
At least one diff in series https://patchwork.ozlabs.org/project/netdev/list/?series=202427 expired. Closing PR. |
When we have smack enabled, during the creation of a directory smack may attempt to add a "smack transmute" xattr on the inode, which results in the following warning and trace: WARNING: CPU: 3 PID: 2548 at fs/btrfs/transaction.c:537 start_transaction+0x489/0x4f0 Modules linked in: nft_objref nf_conntrack_netbios_ns (...) CPU: 3 PID: 2548 Comm: mkdir Not tainted 5.9.0-rc2smack+ #81 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 RIP: 0010:start_transaction+0x489/0x4f0 Code: e9 be fc ff ff (...) RSP: 0018:ffffc90001887d10 EFLAGS: 00010202 RAX: ffff88816f1e0000 RBX: 0000000000000201 RCX: 0000000000000003 RDX: 0000000000000201 RSI: 0000000000000002 RDI: ffff888177849000 RBP: ffff888177849000 R08: 0000000000000001 R09: 0000000000000004 R10: ffffffff825e8f7a R11: 0000000000000003 R12: ffffffffffffffe2 R13: 0000000000000000 R14: ffff88803d884270 R15: ffff8881680d8000 FS: 00007f67317b8440(0000) GS:ffff88817bcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f67247a22a8 CR3: 000000004bfbc002 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? slab_free_freelist_hook+0xea/0x1b0 ? trace_hardirqs_on+0x1c/0xe0 btrfs_setxattr_trans+0x3c/0xf0 __vfs_setxattr+0x63/0x80 smack_d_instantiate+0x2d3/0x360 security_d_instantiate+0x29/0x40 d_instantiate_new+0x38/0x90 btrfs_mkdir+0x1cf/0x1e0 vfs_mkdir+0x14f/0x200 do_mkdirat+0x6d/0x110 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f673196ae6b Code: 8b 05 11 (...) RSP: 002b:00007ffc3c679b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000053 RAX: ffffffffffffffda RBX: 00000000000001ff RCX: 00007f673196ae6b RDX: 0000000000000000 RSI: 00000000000001ff RDI: 00007ffc3c67a30d RBP: 00007ffc3c67a30d R08: 00000000000001ff R09: 0000000000000000 R10: 000055d3e39fe930 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc3c679cd8 R14: 00007ffc3c67a30d R15: 00007ffc3c679ce0 irq event stamp: 11029 hardirqs last enabled at (11037): [<ffffffff81153fe6>] console_unlock+0x486/0x670 hardirqs last disabled at (11044): [<ffffffff81153c01>] console_unlock+0xa1/0x670 softirqs last enabled at (8864): [<ffffffff81e0102f>] asm_call_on_stack+0xf/0x20 softirqs last disabled at (8851): [<ffffffff81e0102f>] asm_call_on_stack+0xf/0x20 This happens because at btrfs_mkdir() we call d_instantiate_new() while holding a transaction handle, which results in the following call chain: btrfs_mkdir() trans = btrfs_start_transaction(root, 5); d_instantiate_new() smack_d_instantiate() __vfs_setxattr() btrfs_setxattr_trans() btrfs_start_transaction() start_transaction() WARN_ON() --> a tansaction start has TRANS_EXTWRITERS set in its type h->orig_rsv = h->block_rsv h->block_rsv = NULL btrfs_end_transaction(trans) Besides the warning triggered at start_transaction, we set the handle's block_rsv to NULL which may cause some surprises later on. So fix this by making btrfs_setxattr_trans() not start a transaction when we already have a handle on one, stored in current->journal_info, and use that handle. We are good to use the handle because at btrfs_mkdir() we did reserve space for the xattr and the inode item. Reported-by: Casey Schaufler <casey@schaufler-ca.com> CC: stable@vger.kernel.org # 5.4+ Acked-by: Casey Schaufler <casey@schaufler-ca.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Link: https://lore.kernel.org/linux-btrfs/434d856f-bd7b-4889-a6ec-e81aaebfa735@schaufler-ca.com/ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
During our testing of WFM200 module over SDIO on i.MX6Q-based platform, we discovered a memory corruption on the system, tracing back to the wfx driver. Using kfence, it was possible to trace it back to the root cause, which is hw->max_rates set to 8 in wfx_init_common, while the maximum defined by IEEE80211_TX_TABLE_SIZE is 4. This causes array out-of-bounds writes during updates of the rate table, as seen below: BUG: KFENCE: memory corruption in kfree_rcu_work+0x320/0x36c Corrupted memory at 0xe0a4ffe0 [ 0x03 0x03 0x03 0x03 0x01 0x00 0x00 0x02 0x02 0x02 0x09 0x00 0x21 0xbb 0xbb 0xbb ] (in kfence-#81): kfree_rcu_work+0x320/0x36c process_one_work+0x3ec/0x920 worker_thread+0x60/0x7a4 kthread+0x174/0x1b4 ret_from_fork+0x14/0x2c 0x0 kfence-#81: 0xe0a4ffc0-0xe0a4ffdf, size=32, cache=kmalloc-64 allocated by task 297 on cpu 0 at 631.039555s: minstrel_ht_update_rates+0x38/0x2b0 [mac80211] rate_control_tx_status+0xb4/0x148 [mac80211] ieee80211_tx_status_ext+0x364/0x1030 [mac80211] ieee80211_tx_status+0xe0/0x118 [mac80211] ieee80211_tasklet_handler+0xb0/0xe0 [mac80211] tasklet_action_common.constprop.0+0x11c/0x148 __do_softirq+0x1a4/0x61c irq_exit+0xcc/0x104 call_with_stack+0x18/0x20 __irq_svc+0x80/0xb0 wq_worker_sleeping+0x10/0x100 wq_worker_sleeping+0x10/0x100 schedule+0x50/0xe0 schedule_timeout+0x2e0/0x474 wait_for_completion+0xdc/0x1ec mmc_wait_for_req_done+0xc4/0xf8 mmc_io_rw_extended+0x3b4/0x4ec sdio_io_rw_ext_helper+0x290/0x384 sdio_memcpy_toio+0x30/0x38 wfx_sdio_copy_to_io+0x88/0x108 [wfx] wfx_data_write+0x88/0x1f0 [wfx] bh_work+0x1c8/0xcc0 [wfx] process_one_work+0x3ec/0x920 worker_thread+0x60/0x7a4 kthread+0x174/0x1b4 ret_from_fork+0x14/0x2c 0x0 After discussion on the wireless mailing list it was clarified that the issue has been introduced by: commit ee0e16a ("mac80211: minstrel_ht: fill all requested rates") and fix shall be in minstrel_ht_update_rates in rc80211_minstrel_ht.c. Fixes: ee0e16a ("mac80211: minstrel_ht: fill all requested rates") Link: https://lore.kernel.org/all/12e5adcd-8aed-f0f7-70cc-4fb7b656b829@camlingroup.com/ Link: https://lore.kernel.org/linux-wireless/20220915131445.30600-1-lech.perczak@camlingroup.com/ Cc: Jérôme Pouiller <jerome.pouiller@silabs.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Peter Seiderer <ps.report@gmx.net> Cc: Kalle Valo <kvalo@kernel.org> Cc: Krzysztof Drobiński <krzysztof.drobinski@camlingroup.com>, Signed-off-by: Paweł Lenkow <pawel.lenkow@camlingroup.com> Signed-off-by: Lech Perczak <lech.perczak@camlingroup.com> Reviewed-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Acked-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The commit 20b6cc3 ("bpf: Avoid hashtab deadlock with map_locked"), try to fix deadlock, but in some case, the deadlock occurs: * CPUn in task context with K1, and taking lock. * CPUn interrupted by NMI context, with K2. * They are using the same bucket, but different map_locked. | Task | +---v----+ | CPUn | +---^----+ | | NMI Anyway, the lockdep still warn: [ 36.092222] ================================ [ 36.092230] WARNING: inconsistent lock state [ 36.092234] 6.1.0-rc5+ #81 Tainted: G E [ 36.092236] -------------------------------- [ 36.092237] inconsistent {INITIAL USE} -> {IN-NMI} usage. [ 36.092238] perf/1515 [HC1[1]:SC0[0]:HE0:SE1] takes: [ 36.092242] ffff888341acd1a0 (&htab->lockdep_key){....}-{2:2}, at: htab_lock_bucket+0x4d/0x58 [ 36.092253] {INITIAL USE} state was registered at: [ 36.092255] mark_usage+0x1d/0x11d [ 36.092262] __lock_acquire+0x3c9/0x6ed [ 36.092266] lock_acquire+0x23d/0x29a [ 36.092270] _raw_spin_lock_irqsave+0x43/0x7f [ 36.092274] htab_lock_bucket+0x4d/0x58 [ 36.092276] htab_map_delete_elem+0x82/0xfb [ 36.092278] map_delete_elem+0x156/0x1ac [ 36.092282] __sys_bpf+0x138/0xb71 [ 36.092285] __do_sys_bpf+0xd/0x15 [ 36.092288] do_syscall_64+0x6d/0x84 [ 36.092291] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 36.092295] irq event stamp: 120346 [ 36.092296] hardirqs last enabled at (120345): [<ffffffff8180b97f>] _raw_spin_unlock_irq+0x24/0x39 [ 36.092299] hardirqs last disabled at (120346): [<ffffffff81169e85>] generic_exec_single+0x40/0xb9 [ 36.092303] softirqs last enabled at (120268): [<ffffffff81c00347>] __do_softirq+0x347/0x387 [ 36.092307] softirqs last disabled at (120133): [<ffffffff810ba4f0>] __irq_exit_rcu+0x67/0xc6 [ 36.092311] [ 36.092311] other info that might help us debug this: [ 36.092312] Possible unsafe locking scenario: [ 36.092312] [ 36.092313] CPU0 [ 36.092313] ---- [ 36.092314] lock(&htab->lockdep_key); [ 36.092315] <Interrupt> [ 36.092316] lock(&htab->lockdep_key); [ 36.092318] [ 36.092318] *** DEADLOCK *** [ 36.092318] [ 36.092318] 3 locks held by perf/1515: [ 36.092320] #0: ffff8881b9805cc0 (&cpuctx_mutex){+.+.}-{4:4}, at: perf_event_ctx_lock_nested+0x8e/0xba [ 36.092327] #1: ffff8881075ecc20 (&event->child_mutex){+.+.}-{4:4}, at: perf_event_for_each_child+0x35/0x76 [ 36.092332] #2: ffff8881b9805c20 (&cpuctx_lock){-.-.}-{2:2}, at: perf_ctx_lock+0x12/0x27 [ 36.092339] [ 36.092339] stack backtrace: [ 36.092341] CPU: 0 PID: 1515 Comm: perf Tainted: G E 6.1.0-rc5+ #81 [ 36.092344] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 36.092349] Call Trace: [ 36.092351] <NMI> [ 36.092354] dump_stack_lvl+0x57/0x81 [ 36.092359] lock_acquire+0x1f4/0x29a [ 36.092363] ? handle_pmi_common+0x13f/0x1f0 [ 36.092366] ? htab_lock_bucket+0x4d/0x58 [ 36.092371] _raw_spin_lock_irqsave+0x43/0x7f [ 36.092374] ? htab_lock_bucket+0x4d/0x58 [ 36.092377] htab_lock_bucket+0x4d/0x58 [ 36.092379] htab_map_update_elem+0x11e/0x220 [ 36.092386] bpf_prog_f3a535ca81a8128a_bpf_prog2+0x3e/0x42 [ 36.092392] trace_call_bpf+0x177/0x215 [ 36.092398] perf_trace_run_bpf_submit+0x52/0xaa [ 36.092403] ? x86_pmu_stop+0x97/0x97 [ 36.092407] perf_trace_nmi_handler+0xb7/0xe0 [ 36.092415] nmi_handle+0x116/0x254 [ 36.092418] ? x86_pmu_stop+0x97/0x97 [ 36.092423] default_do_nmi+0x3d/0xf6 [ 36.092428] exc_nmi+0xa1/0x109 [ 36.092432] end_repeat_nmi+0x16/0x67 [ 36.092436] RIP: 0010:wrmsrl+0xd/0x1b [ 36.092441] Code: 04 01 00 00 c6 84 07 48 01 00 00 01 5b e9 46 15 80 00 5b c3 cc cc cc cc c3 cc cc cc cc 48 89 f2 89 f9 89 f0 48 c1 ea 20 0f 30 <66> 90 c3 cc cc cc cc 31 d2 e9 2f 04 49 00 0f 1f 44 00 00 40 0f6 [ 36.092443] RSP: 0018:ffffc900043dfc48 EFLAGS: 00000002 [ 36.092445] RAX: 000000000000000f RBX: ffff8881b96153e0 RCX: 000000000000038f [ 36.092447] RDX: 0000000000000007 RSI: 000000070000000f RDI: 000000000000038f [ 36.092449] RBP: 000000070000000f R08: ffffffffffffffff R09: ffff8881053bdaa8 [ 36.092451] R10: ffff8881b9805d40 R11: 0000000000000005 R12: ffff8881b9805c00 [ 36.092452] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8881075ec970 [ 36.092460] ? wrmsrl+0xd/0x1b [ 36.092465] ? wrmsrl+0xd/0x1b [ 36.092469] </NMI> [ 36.092469] <TASK> [ 36.092470] __intel_pmu_enable_all.constprop.0+0x7c/0xaf [ 36.092475] event_function+0xb6/0xd3 [ 36.092478] ? cpu_to_node+0x1a/0x1a [ 36.092482] ? cpu_to_node+0x1a/0x1a [ 36.092485] remote_function+0x1e/0x4c [ 36.092489] generic_exec_single+0x48/0xb9 [ 36.092492] ? __lock_acquire+0x666/0x6ed [ 36.092497] smp_call_function_single+0xbf/0x106 [ 36.092499] ? cpu_to_node+0x1a/0x1a [ 36.092504] ? kvm_sched_clock_read+0x5/0x11 [ 36.092508] ? __perf_event_task_sched_in+0x13d/0x13d [ 36.092513] cpu_function_call+0x47/0x69 [ 36.092516] ? perf_event_update_time+0x52/0x52 [ 36.092519] event_function_call+0x89/0x117 [ 36.092521] ? __perf_event_task_sched_in+0x13d/0x13d [ 36.092526] ? _perf_event_disable+0x4a/0x4a [ 36.092528] perf_event_for_each_child+0x3d/0x76 [ 36.092532] ? _perf_event_disable+0x4a/0x4a [ 36.092533] _perf_ioctl+0x564/0x590 [ 36.092537] ? __lock_release+0xd5/0x1b0 [ 36.092543] ? perf_event_ctx_lock_nested+0x8e/0xba [ 36.092547] perf_ioctl+0x42/0x5f [ 36.092551] vfs_ioctl+0x1e/0x2f [ 36.092554] __do_sys_ioctl+0x66/0x89 [ 36.092559] do_syscall_64+0x6d/0x84 [ 36.092563] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 36.092566] RIP: 0033:0x7fe7110f362b [ 36.092569] Code: 0f 1e fa 48 8b 05 5d b8 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d b8 2c 00 f7 d8 64 89 018 [ 36.092570] RSP: 002b:00007ffebb8e4b08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 36.092573] RAX: ffffffffffffffda RBX: 0000000000002400 RCX: 00007fe7110f362b [ 36.092575] RDX: 0000000000000000 RSI: 0000000000002400 RDI: 0000000000000013 [ 36.092576] RBP: 00007ffebb8e4b40 R08: 0000000000000001 R09: 000055c1db4a5b40 [ 36.092577] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 36.092579] R13: 000055c1db3b2a30 R14: 0000000000000000 R15: 0000000000000000 [ 36.092586] </TASK> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
The commit 20b6cc3 ("bpf: Avoid hashtab deadlock with map_locked"), try to fix deadlock, but in some case, the deadlock occurs: * CPUn in task context with K1, and taking lock. * CPUn interrupted by NMI context, with K2. * They are using the same bucket, but different map_locked. | Task | +---v----+ | CPUn | +---^----+ | | NMI Anyway, the lockdep still warn: [ 36.092222] ================================ [ 36.092230] WARNING: inconsistent lock state [ 36.092234] 6.1.0-rc5+ #81 Tainted: G E [ 36.092236] -------------------------------- [ 36.092237] inconsistent {INITIAL USE} -> {IN-NMI} usage. [ 36.092238] perf/1515 [HC1[1]:SC0[0]:HE0:SE1] takes: [ 36.092242] ffff888341acd1a0 (&htab->lockdep_key){....}-{2:2}, at: htab_lock_bucket+0x4d/0x58 [ 36.092253] {INITIAL USE} state was registered at: [ 36.092255] mark_usage+0x1d/0x11d [ 36.092262] __lock_acquire+0x3c9/0x6ed [ 36.092266] lock_acquire+0x23d/0x29a [ 36.092270] _raw_spin_lock_irqsave+0x43/0x7f [ 36.092274] htab_lock_bucket+0x4d/0x58 [ 36.092276] htab_map_delete_elem+0x82/0xfb [ 36.092278] map_delete_elem+0x156/0x1ac [ 36.092282] __sys_bpf+0x138/0xb71 [ 36.092285] __do_sys_bpf+0xd/0x15 [ 36.092288] do_syscall_64+0x6d/0x84 [ 36.092291] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 36.092295] irq event stamp: 120346 [ 36.092296] hardirqs last enabled at (120345): [<ffffffff8180b97f>] _raw_spin_unlock_irq+0x24/0x39 [ 36.092299] hardirqs last disabled at (120346): [<ffffffff81169e85>] generic_exec_single+0x40/0xb9 [ 36.092303] softirqs last enabled at (120268): [<ffffffff81c00347>] __do_softirq+0x347/0x387 [ 36.092307] softirqs last disabled at (120133): [<ffffffff810ba4f0>] __irq_exit_rcu+0x67/0xc6 [ 36.092311] [ 36.092311] other info that might help us debug this: [ 36.092312] Possible unsafe locking scenario: [ 36.092312] [ 36.092313] CPU0 [ 36.092313] ---- [ 36.092314] lock(&htab->lockdep_key); [ 36.092315] <Interrupt> [ 36.092316] lock(&htab->lockdep_key); [ 36.092318] [ 36.092318] *** DEADLOCK *** [ 36.092318] [ 36.092318] 3 locks held by perf/1515: [ 36.092320] #0: ffff8881b9805cc0 (&cpuctx_mutex){+.+.}-{4:4}, at: perf_event_ctx_lock_nested+0x8e/0xba [ 36.092327] #1: ffff8881075ecc20 (&event->child_mutex){+.+.}-{4:4}, at: perf_event_for_each_child+0x35/0x76 [ 36.092332] #2: ffff8881b9805c20 (&cpuctx_lock){-.-.}-{2:2}, at: perf_ctx_lock+0x12/0x27 [ 36.092339] [ 36.092339] stack backtrace: [ 36.092341] CPU: 0 PID: 1515 Comm: perf Tainted: G E 6.1.0-rc5+ #81 [ 36.092344] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 36.092349] Call Trace: [ 36.092351] <NMI> [ 36.092354] dump_stack_lvl+0x57/0x81 [ 36.092359] lock_acquire+0x1f4/0x29a [ 36.092363] ? handle_pmi_common+0x13f/0x1f0 [ 36.092366] ? htab_lock_bucket+0x4d/0x58 [ 36.092371] _raw_spin_lock_irqsave+0x43/0x7f [ 36.092374] ? htab_lock_bucket+0x4d/0x58 [ 36.092377] htab_lock_bucket+0x4d/0x58 [ 36.092379] htab_map_update_elem+0x11e/0x220 [ 36.092386] bpf_prog_f3a535ca81a8128a_bpf_prog2+0x3e/0x42 [ 36.092392] trace_call_bpf+0x177/0x215 [ 36.092398] perf_trace_run_bpf_submit+0x52/0xaa [ 36.092403] ? x86_pmu_stop+0x97/0x97 [ 36.092407] perf_trace_nmi_handler+0xb7/0xe0 [ 36.092415] nmi_handle+0x116/0x254 [ 36.092418] ? x86_pmu_stop+0x97/0x97 [ 36.092423] default_do_nmi+0x3d/0xf6 [ 36.092428] exc_nmi+0xa1/0x109 [ 36.092432] end_repeat_nmi+0x16/0x67 [ 36.092436] RIP: 0010:wrmsrl+0xd/0x1b [ 36.092441] Code: 04 01 00 00 c6 84 07 48 01 00 00 01 5b e9 46 15 80 00 5b c3 cc cc cc cc c3 cc cc cc cc 48 89 f2 89 f9 89 f0 48 c1 ea 20 0f 30 <66> 90 c3 cc cc cc cc 31 d2 e9 2f 04 49 00 0f 1f 44 00 00 40 0f6 [ 36.092443] RSP: 0018:ffffc900043dfc48 EFLAGS: 00000002 [ 36.092445] RAX: 000000000000000f RBX: ffff8881b96153e0 RCX: 000000000000038f [ 36.092447] RDX: 0000000000000007 RSI: 000000070000000f RDI: 000000000000038f [ 36.092449] RBP: 000000070000000f R08: ffffffffffffffff R09: ffff8881053bdaa8 [ 36.092451] R10: ffff8881b9805d40 R11: 0000000000000005 R12: ffff8881b9805c00 [ 36.092452] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8881075ec970 [ 36.092460] ? wrmsrl+0xd/0x1b [ 36.092465] ? wrmsrl+0xd/0x1b [ 36.092469] </NMI> [ 36.092469] <TASK> [ 36.092470] __intel_pmu_enable_all.constprop.0+0x7c/0xaf [ 36.092475] event_function+0xb6/0xd3 [ 36.092478] ? cpu_to_node+0x1a/0x1a [ 36.092482] ? cpu_to_node+0x1a/0x1a [ 36.092485] remote_function+0x1e/0x4c [ 36.092489] generic_exec_single+0x48/0xb9 [ 36.092492] ? __lock_acquire+0x666/0x6ed [ 36.092497] smp_call_function_single+0xbf/0x106 [ 36.092499] ? cpu_to_node+0x1a/0x1a [ 36.092504] ? kvm_sched_clock_read+0x5/0x11 [ 36.092508] ? __perf_event_task_sched_in+0x13d/0x13d [ 36.092513] cpu_function_call+0x47/0x69 [ 36.092516] ? perf_event_update_time+0x52/0x52 [ 36.092519] event_function_call+0x89/0x117 [ 36.092521] ? __perf_event_task_sched_in+0x13d/0x13d [ 36.092526] ? _perf_event_disable+0x4a/0x4a [ 36.092528] perf_event_for_each_child+0x3d/0x76 [ 36.092532] ? _perf_event_disable+0x4a/0x4a [ 36.092533] _perf_ioctl+0x564/0x590 [ 36.092537] ? __lock_release+0xd5/0x1b0 [ 36.092543] ? perf_event_ctx_lock_nested+0x8e/0xba [ 36.092547] perf_ioctl+0x42/0x5f [ 36.092551] vfs_ioctl+0x1e/0x2f [ 36.092554] __do_sys_ioctl+0x66/0x89 [ 36.092559] do_syscall_64+0x6d/0x84 [ 36.092563] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 36.092566] RIP: 0033:0x7fe7110f362b [ 36.092569] Code: 0f 1e fa 48 8b 05 5d b8 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d b8 2c 00 f7 d8 64 89 018 [ 36.092570] RSP: 002b:00007ffebb8e4b08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 36.092573] RAX: ffffffffffffffda RBX: 0000000000002400 RCX: 00007fe7110f362b [ 36.092575] RDX: 0000000000000000 RSI: 0000000000002400 RDI: 0000000000000013 [ 36.092576] RBP: 00007ffebb8e4b40 R08: 0000000000000001 R09: 000055c1db4a5b40 [ 36.092577] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 36.092579] R13: 000055c1db3b2a30 R14: 0000000000000000 R15: 0000000000000000 [ 36.092586] </TASK> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
The commit 20b6cc3 ("bpf: Avoid hashtab deadlock with map_locked"), try to fix deadlock, but in some case, the deadlock occurs: * CPUn in task context with K1, and taking lock. * CPUn interrupted by NMI context, with K2. * They are using the same bucket, but different map_locked. | Task | +---v----+ | CPUn | +---^----+ | | NMI Anyway, the lockdep still warn: [ 36.092222] ================================ [ 36.092230] WARNING: inconsistent lock state [ 36.092234] 6.1.0-rc5+ #81 Tainted: G E [ 36.092236] -------------------------------- [ 36.092237] inconsistent {INITIAL USE} -> {IN-NMI} usage. [ 36.092238] perf/1515 [HC1[1]:SC0[0]:HE0:SE1] takes: [ 36.092242] ffff888341acd1a0 (&htab->lockdep_key){....}-{2:2}, at: htab_lock_bucket+0x4d/0x58 [ 36.092253] {INITIAL USE} state was registered at: [ 36.092255] mark_usage+0x1d/0x11d [ 36.092262] __lock_acquire+0x3c9/0x6ed [ 36.092266] lock_acquire+0x23d/0x29a [ 36.092270] _raw_spin_lock_irqsave+0x43/0x7f [ 36.092274] htab_lock_bucket+0x4d/0x58 [ 36.092276] htab_map_delete_elem+0x82/0xfb [ 36.092278] map_delete_elem+0x156/0x1ac [ 36.092282] __sys_bpf+0x138/0xb71 [ 36.092285] __do_sys_bpf+0xd/0x15 [ 36.092288] do_syscall_64+0x6d/0x84 [ 36.092291] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 36.092295] irq event stamp: 120346 [ 36.092296] hardirqs last enabled at (120345): [<ffffffff8180b97f>] _raw_spin_unlock_irq+0x24/0x39 [ 36.092299] hardirqs last disabled at (120346): [<ffffffff81169e85>] generic_exec_single+0x40/0xb9 [ 36.092303] softirqs last enabled at (120268): [<ffffffff81c00347>] __do_softirq+0x347/0x387 [ 36.092307] softirqs last disabled at (120133): [<ffffffff810ba4f0>] __irq_exit_rcu+0x67/0xc6 [ 36.092311] [ 36.092311] other info that might help us debug this: [ 36.092312] Possible unsafe locking scenario: [ 36.092312] [ 36.092313] CPU0 [ 36.092313] ---- [ 36.092314] lock(&htab->lockdep_key); [ 36.092315] <Interrupt> [ 36.092316] lock(&htab->lockdep_key); [ 36.092318] [ 36.092318] *** DEADLOCK *** [ 36.092318] [ 36.092318] 3 locks held by perf/1515: [ 36.092320] #0: ffff8881b9805cc0 (&cpuctx_mutex){+.+.}-{4:4}, at: perf_event_ctx_lock_nested+0x8e/0xba [ 36.092327] #1: ffff8881075ecc20 (&event->child_mutex){+.+.}-{4:4}, at: perf_event_for_each_child+0x35/0x76 [ 36.092332] #2: ffff8881b9805c20 (&cpuctx_lock){-.-.}-{2:2}, at: perf_ctx_lock+0x12/0x27 [ 36.092339] [ 36.092339] stack backtrace: [ 36.092341] CPU: 0 PID: 1515 Comm: perf Tainted: G E 6.1.0-rc5+ #81 [ 36.092344] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 36.092349] Call Trace: [ 36.092351] <NMI> [ 36.092354] dump_stack_lvl+0x57/0x81 [ 36.092359] lock_acquire+0x1f4/0x29a [ 36.092363] ? handle_pmi_common+0x13f/0x1f0 [ 36.092366] ? htab_lock_bucket+0x4d/0x58 [ 36.092371] _raw_spin_lock_irqsave+0x43/0x7f [ 36.092374] ? htab_lock_bucket+0x4d/0x58 [ 36.092377] htab_lock_bucket+0x4d/0x58 [ 36.092379] htab_map_update_elem+0x11e/0x220 [ 36.092386] bpf_prog_f3a535ca81a8128a_bpf_prog2+0x3e/0x42 [ 36.092392] trace_call_bpf+0x177/0x215 [ 36.092398] perf_trace_run_bpf_submit+0x52/0xaa [ 36.092403] ? x86_pmu_stop+0x97/0x97 [ 36.092407] perf_trace_nmi_handler+0xb7/0xe0 [ 36.092415] nmi_handle+0x116/0x254 [ 36.092418] ? x86_pmu_stop+0x97/0x97 [ 36.092423] default_do_nmi+0x3d/0xf6 [ 36.092428] exc_nmi+0xa1/0x109 [ 36.092432] end_repeat_nmi+0x16/0x67 [ 36.092436] RIP: 0010:wrmsrl+0xd/0x1b [ 36.092441] Code: 04 01 00 00 c6 84 07 48 01 00 00 01 5b e9 46 15 80 00 5b c3 cc cc cc cc c3 cc cc cc cc 48 89 f2 89 f9 89 f0 48 c1 ea 20 0f 30 <66> 90 c3 cc cc cc cc 31 d2 e9 2f 04 49 00 0f 1f 44 00 00 40 0f6 [ 36.092443] RSP: 0018:ffffc900043dfc48 EFLAGS: 00000002 [ 36.092445] RAX: 000000000000000f RBX: ffff8881b96153e0 RCX: 000000000000038f [ 36.092447] RDX: 0000000000000007 RSI: 000000070000000f RDI: 000000000000038f [ 36.092449] RBP: 000000070000000f R08: ffffffffffffffff R09: ffff8881053bdaa8 [ 36.092451] R10: ffff8881b9805d40 R11: 0000000000000005 R12: ffff8881b9805c00 [ 36.092452] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8881075ec970 [ 36.092460] ? wrmsrl+0xd/0x1b [ 36.092465] ? wrmsrl+0xd/0x1b [ 36.092469] </NMI> [ 36.092469] <TASK> [ 36.092470] __intel_pmu_enable_all.constprop.0+0x7c/0xaf [ 36.092475] event_function+0xb6/0xd3 [ 36.092478] ? cpu_to_node+0x1a/0x1a [ 36.092482] ? cpu_to_node+0x1a/0x1a [ 36.092485] remote_function+0x1e/0x4c [ 36.092489] generic_exec_single+0x48/0xb9 [ 36.092492] ? __lock_acquire+0x666/0x6ed [ 36.092497] smp_call_function_single+0xbf/0x106 [ 36.092499] ? cpu_to_node+0x1a/0x1a [ 36.092504] ? kvm_sched_clock_read+0x5/0x11 [ 36.092508] ? __perf_event_task_sched_in+0x13d/0x13d [ 36.092513] cpu_function_call+0x47/0x69 [ 36.092516] ? perf_event_update_time+0x52/0x52 [ 36.092519] event_function_call+0x89/0x117 [ 36.092521] ? __perf_event_task_sched_in+0x13d/0x13d [ 36.092526] ? _perf_event_disable+0x4a/0x4a [ 36.092528] perf_event_for_each_child+0x3d/0x76 [ 36.092532] ? _perf_event_disable+0x4a/0x4a [ 36.092533] _perf_ioctl+0x564/0x590 [ 36.092537] ? __lock_release+0xd5/0x1b0 [ 36.092543] ? perf_event_ctx_lock_nested+0x8e/0xba [ 36.092547] perf_ioctl+0x42/0x5f [ 36.092551] vfs_ioctl+0x1e/0x2f [ 36.092554] __do_sys_ioctl+0x66/0x89 [ 36.092559] do_syscall_64+0x6d/0x84 [ 36.092563] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 36.092566] RIP: 0033:0x7fe7110f362b [ 36.092569] Code: 0f 1e fa 48 8b 05 5d b8 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d b8 2c 00 f7 d8 64 89 018 [ 36.092570] RSP: 002b:00007ffebb8e4b08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 36.092573] RAX: ffffffffffffffda RBX: 0000000000002400 RCX: 00007fe7110f362b [ 36.092575] RDX: 0000000000000000 RSI: 0000000000002400 RDI: 0000000000000013 [ 36.092576] RBP: 00007ffebb8e4b40 R08: 0000000000000001 R09: 000055c1db4a5b40 [ 36.092577] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 36.092579] R13: 000055c1db3b2a30 R14: 0000000000000000 R15: 0000000000000000 [ 36.092586] </TASK> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
The commit 20b6cc3 ("bpf: Avoid hashtab deadlock with map_locked"), try to fix deadlock, but in some case, the deadlock occurs: * CPUn in task context with K1, and taking lock. * CPUn interrupted by NMI context, with K2. * They are using the same bucket, but different map_locked. | Task | +---v----+ | CPUn | +---^----+ | | NMI Anyway, the lockdep still warn: [ 36.092222] ================================ [ 36.092230] WARNING: inconsistent lock state [ 36.092234] 6.1.0-rc5+ #81 Tainted: G E [ 36.092236] -------------------------------- [ 36.092237] inconsistent {INITIAL USE} -> {IN-NMI} usage. [ 36.092238] perf/1515 [HC1[1]:SC0[0]:HE0:SE1] takes: [ 36.092242] ffff888341acd1a0 (&htab->lockdep_key){....}-{2:2}, at: htab_lock_bucket+0x4d/0x58 [ 36.092253] {INITIAL USE} state was registered at: [ 36.092255] mark_usage+0x1d/0x11d [ 36.092262] __lock_acquire+0x3c9/0x6ed [ 36.092266] lock_acquire+0x23d/0x29a [ 36.092270] _raw_spin_lock_irqsave+0x43/0x7f [ 36.092274] htab_lock_bucket+0x4d/0x58 [ 36.092276] htab_map_delete_elem+0x82/0xfb [ 36.092278] map_delete_elem+0x156/0x1ac [ 36.092282] __sys_bpf+0x138/0xb71 [ 36.092285] __do_sys_bpf+0xd/0x15 [ 36.092288] do_syscall_64+0x6d/0x84 [ 36.092291] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 36.092295] irq event stamp: 120346 [ 36.092296] hardirqs last enabled at (120345): [<ffffffff8180b97f>] _raw_spin_unlock_irq+0x24/0x39 [ 36.092299] hardirqs last disabled at (120346): [<ffffffff81169e85>] generic_exec_single+0x40/0xb9 [ 36.092303] softirqs last enabled at (120268): [<ffffffff81c00347>] __do_softirq+0x347/0x387 [ 36.092307] softirqs last disabled at (120133): [<ffffffff810ba4f0>] __irq_exit_rcu+0x67/0xc6 [ 36.092311] [ 36.092311] other info that might help us debug this: [ 36.092312] Possible unsafe locking scenario: [ 36.092312] [ 36.092313] CPU0 [ 36.092313] ---- [ 36.092314] lock(&htab->lockdep_key); [ 36.092315] <Interrupt> [ 36.092316] lock(&htab->lockdep_key); [ 36.092318] [ 36.092318] *** DEADLOCK *** [ 36.092318] [ 36.092318] 3 locks held by perf/1515: [ 36.092320] #0: ffff8881b9805cc0 (&cpuctx_mutex){+.+.}-{4:4}, at: perf_event_ctx_lock_nested+0x8e/0xba [ 36.092327] #1: ffff8881075ecc20 (&event->child_mutex){+.+.}-{4:4}, at: perf_event_for_each_child+0x35/0x76 [ 36.092332] #2: ffff8881b9805c20 (&cpuctx_lock){-.-.}-{2:2}, at: perf_ctx_lock+0x12/0x27 [ 36.092339] [ 36.092339] stack backtrace: [ 36.092341] CPU: 0 PID: 1515 Comm: perf Tainted: G E 6.1.0-rc5+ #81 [ 36.092344] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 36.092349] Call Trace: [ 36.092351] <NMI> [ 36.092354] dump_stack_lvl+0x57/0x81 [ 36.092359] lock_acquire+0x1f4/0x29a [ 36.092363] ? handle_pmi_common+0x13f/0x1f0 [ 36.092366] ? htab_lock_bucket+0x4d/0x58 [ 36.092371] _raw_spin_lock_irqsave+0x43/0x7f [ 36.092374] ? htab_lock_bucket+0x4d/0x58 [ 36.092377] htab_lock_bucket+0x4d/0x58 [ 36.092379] htab_map_update_elem+0x11e/0x220 [ 36.092386] bpf_prog_f3a535ca81a8128a_bpf_prog2+0x3e/0x42 [ 36.092392] trace_call_bpf+0x177/0x215 [ 36.092398] perf_trace_run_bpf_submit+0x52/0xaa [ 36.092403] ? x86_pmu_stop+0x97/0x97 [ 36.092407] perf_trace_nmi_handler+0xb7/0xe0 [ 36.092415] nmi_handle+0x116/0x254 [ 36.092418] ? x86_pmu_stop+0x97/0x97 [ 36.092423] default_do_nmi+0x3d/0xf6 [ 36.092428] exc_nmi+0xa1/0x109 [ 36.092432] end_repeat_nmi+0x16/0x67 [ 36.092436] RIP: 0010:wrmsrl+0xd/0x1b [ 36.092441] Code: 04 01 00 00 c6 84 07 48 01 00 00 01 5b e9 46 15 80 00 5b c3 cc cc cc cc c3 cc cc cc cc 48 89 f2 89 f9 89 f0 48 c1 ea 20 0f 30 <66> 90 c3 cc cc cc cc 31 d2 e9 2f 04 49 00 0f 1f 44 00 00 40 0f6 [ 36.092443] RSP: 0018:ffffc900043dfc48 EFLAGS: 00000002 [ 36.092445] RAX: 000000000000000f RBX: ffff8881b96153e0 RCX: 000000000000038f [ 36.092447] RDX: 0000000000000007 RSI: 000000070000000f RDI: 000000000000038f [ 36.092449] RBP: 000000070000000f R08: ffffffffffffffff R09: ffff8881053bdaa8 [ 36.092451] R10: ffff8881b9805d40 R11: 0000000000000005 R12: ffff8881b9805c00 [ 36.092452] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8881075ec970 [ 36.092460] ? wrmsrl+0xd/0x1b [ 36.092465] ? wrmsrl+0xd/0x1b [ 36.092469] </NMI> [ 36.092469] <TASK> [ 36.092470] __intel_pmu_enable_all.constprop.0+0x7c/0xaf [ 36.092475] event_function+0xb6/0xd3 [ 36.092478] ? cpu_to_node+0x1a/0x1a [ 36.092482] ? cpu_to_node+0x1a/0x1a [ 36.092485] remote_function+0x1e/0x4c [ 36.092489] generic_exec_single+0x48/0xb9 [ 36.092492] ? __lock_acquire+0x666/0x6ed [ 36.092497] smp_call_function_single+0xbf/0x106 [ 36.092499] ? cpu_to_node+0x1a/0x1a [ 36.092504] ? kvm_sched_clock_read+0x5/0x11 [ 36.092508] ? __perf_event_task_sched_in+0x13d/0x13d [ 36.092513] cpu_function_call+0x47/0x69 [ 36.092516] ? perf_event_update_time+0x52/0x52 [ 36.092519] event_function_call+0x89/0x117 [ 36.092521] ? __perf_event_task_sched_in+0x13d/0x13d [ 36.092526] ? _perf_event_disable+0x4a/0x4a [ 36.092528] perf_event_for_each_child+0x3d/0x76 [ 36.092532] ? _perf_event_disable+0x4a/0x4a [ 36.092533] _perf_ioctl+0x564/0x590 [ 36.092537] ? __lock_release+0xd5/0x1b0 [ 36.092543] ? perf_event_ctx_lock_nested+0x8e/0xba [ 36.092547] perf_ioctl+0x42/0x5f [ 36.092551] vfs_ioctl+0x1e/0x2f [ 36.092554] __do_sys_ioctl+0x66/0x89 [ 36.092559] do_syscall_64+0x6d/0x84 [ 36.092563] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 36.092566] RIP: 0033:0x7fe7110f362b [ 36.092569] Code: 0f 1e fa 48 8b 05 5d b8 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d b8 2c 00 f7 d8 64 89 018 [ 36.092570] RSP: 002b:00007ffebb8e4b08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 36.092573] RAX: ffffffffffffffda RBX: 0000000000002400 RCX: 00007fe7110f362b [ 36.092575] RDX: 0000000000000000 RSI: 0000000000002400 RDI: 0000000000000013 [ 36.092576] RBP: 00007ffebb8e4b40 R08: 0000000000000001 R09: 000055c1db4a5b40 [ 36.092577] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 36.092579] R13: 000055c1db3b2a30 R14: 0000000000000000 R15: 0000000000000000 [ 36.092586] </TASK> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
We can see that "Short form of movsx, dst_reg = (s8,s16,s32)src_reg" in include/linux/filter.h, additionally, for BPF_ALU64 the value of the destination register is unchanged whereas for BPF_ALU the upper 32 bits of the destination register are zeroed, so it should clear the upper 32 bits for BPF_ALU. [root@linux fedora]# echo 1 > /proc/sys/net/core/bpf_jit_enable [root@linux fedora]# modprobe test_bpf Before: test_bpf: #81 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) test_bpf: #82 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times) After: test_bpf: #81 ALU_MOVSX | BPF_B jited:1 6 PASS test_bpf: #82 ALU_MOVSX | BPF_H jited:1 6 PASS By the way, the bpf selftest case "./test_progs -t verifier_movsx" can also be fixed with this patch. Fixes: f48012f ("LoongArch: BPF: Support sign-extension mov instructions") Acked-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Current 'bpftool link' command does not show pids, e.g., $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 Hack the following change to enable normal libbpf debug output, --- a/tools/bpf/bpftool/pids.c +++ b/tools/bpf/bpftool/pids.c @@ -121,9 +121,9 @@ int build_obj_refs_table(struct hashmap **map, enum bpf_obj_type type) /* we don't want output polluted with libbpf errors if bpf_iter is not * supported */ - default_print = libbpf_set_print(libbpf_print_none); + /* default_print = libbpf_set_print(libbpf_print_none); */ err = pid_iter_bpf__load(skel); - libbpf_set_print(default_print); + /* libbpf_set_print(default_print); */ Rerun the above bpftool command: $ tools/build/bpftool/bpftool link libbpf: prog 'iter': BPF program load failed: Permission denied libbpf: prog 'iter': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; struct task_struct *task = ctx->task; @ pid_iter.bpf.c:69 0: (79) r6 = *(u64 *)(r1 +8) ; R1=ctx() R6_w=ptr_or_null_task_struct(id=1) ; struct file *file = ctx->file; @ pid_iter.bpf.c:68 ... ; struct bpf_link *link = (struct bpf_link *) file->private_data; @ pid_iter.bpf.c:103 80: (79) r3 = *(u64 *)(r8 +432) ; R3_w=scalar() R8=ptr_file() ; if (link->type == bpf_core_enum_value(enum bpf_link_type___local, @ pid_iter.bpf.c:105 81: (61) r1 = *(u32 *)(r3 +12) R3 invalid mem access 'scalar' processed 39 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2 -- END PROG LOAD LOG -- libbpf: prog 'iter': failed to load: -13 ... The 'file->private_data' returns a 'void' type and this caused subsequent 'link->type' (insn #81) failed in verification. To fix the issue, do a type cast from 'void *' to 'struct bpf_link *' for file->private_data as in this patch, and the 'bpftool link' runs successfully with 'pids'. $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 pids systemd(1) Fixes: 44ba7b3 ("bpftool: Use a local copy of BPF_LINK_TYPE_PERF_EVENT in pid_iter.bpf.c") Cc: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Current 'bpftool link' command does not show pids, e.g., $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 Hack the following change to enable normal libbpf debug output, --- a/tools/bpf/bpftool/pids.c +++ b/tools/bpf/bpftool/pids.c @@ -121,9 +121,9 @@ int build_obj_refs_table(struct hashmap **map, enum bpf_obj_type type) /* we don't want output polluted with libbpf errors if bpf_iter is not * supported */ - default_print = libbpf_set_print(libbpf_print_none); + /* default_print = libbpf_set_print(libbpf_print_none); */ err = pid_iter_bpf__load(skel); - libbpf_set_print(default_print); + /* libbpf_set_print(default_print); */ Rerun the above bpftool command: $ tools/build/bpftool/bpftool link libbpf: prog 'iter': BPF program load failed: Permission denied libbpf: prog 'iter': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; struct task_struct *task = ctx->task; @ pid_iter.bpf.c:69 0: (79) r6 = *(u64 *)(r1 +8) ; R1=ctx() R6_w=ptr_or_null_task_struct(id=1) ; struct file *file = ctx->file; @ pid_iter.bpf.c:68 ... ; struct bpf_link *link = (struct bpf_link *) file->private_data; @ pid_iter.bpf.c:103 80: (79) r3 = *(u64 *)(r8 +432) ; R3_w=scalar() R8=ptr_file() ; if (link->type == bpf_core_enum_value(enum bpf_link_type___local, @ pid_iter.bpf.c:105 81: (61) r1 = *(u32 *)(r3 +12) R3 invalid mem access 'scalar' processed 39 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2 -- END PROG LOAD LOG -- libbpf: prog 'iter': failed to load: -13 ... The 'file->private_data' returns a 'void' type and this caused subsequent 'link->type' (insn #81) failed in verification. To fix the issue, do a type cast from 'void *' to 'struct bpf_link *' for file->private_data as in this patch, and the 'bpftool link' runs successfully with 'pids'. $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 pids systemd(1) Fixes: 44ba7b3 ("bpftool: Use a local copy of BPF_LINK_TYPE_PERF_EVENT in pid_iter.bpf.c") Cc: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Current 'bpftool link' command does not show pids, e.g., $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 Hack the following change to enable normal libbpf debug output, --- a/tools/bpf/bpftool/pids.c +++ b/tools/bpf/bpftool/pids.c @@ -121,9 +121,9 @@ int build_obj_refs_table(struct hashmap **map, enum bpf_obj_type type) /* we don't want output polluted with libbpf errors if bpf_iter is not * supported */ - default_print = libbpf_set_print(libbpf_print_none); + /* default_print = libbpf_set_print(libbpf_print_none); */ err = pid_iter_bpf__load(skel); - libbpf_set_print(default_print); + /* libbpf_set_print(default_print); */ Rerun the above bpftool command: $ tools/build/bpftool/bpftool link libbpf: prog 'iter': BPF program load failed: Permission denied libbpf: prog 'iter': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; struct task_struct *task = ctx->task; @ pid_iter.bpf.c:69 0: (79) r6 = *(u64 *)(r1 +8) ; R1=ctx() R6_w=ptr_or_null_task_struct(id=1) ; struct file *file = ctx->file; @ pid_iter.bpf.c:68 ... ; struct bpf_link *link = (struct bpf_link *) file->private_data; @ pid_iter.bpf.c:103 80: (79) r3 = *(u64 *)(r8 +432) ; R3_w=scalar() R8=ptr_file() ; if (link->type == bpf_core_enum_value(enum bpf_link_type___local, @ pid_iter.bpf.c:105 81: (61) r1 = *(u32 *)(r3 +12) R3 invalid mem access 'scalar' processed 39 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2 -- END PROG LOAD LOG -- libbpf: prog 'iter': failed to load: -13 ... The 'file->private_data' returns a 'void' type and this caused subsequent 'link->type' (insn #81) failed in verification. To fix the issue, do a type cast from 'void *' to 'struct bpf_link *' for file->private_data as in this patch, and the 'bpftool link' runs successfully with 'pids'. $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 pids systemd(1) Fixes: 44ba7b3 ("bpftool: Use a local copy of BPF_LINK_TYPE_PERF_EVENT in pid_iter.bpf.c") Cc: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Current 'bpftool link' command does not show pids, e.g., $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 Hack the following change to enable normal libbpf debug output, --- a/tools/bpf/bpftool/pids.c +++ b/tools/bpf/bpftool/pids.c @@ -121,9 +121,9 @@ int build_obj_refs_table(struct hashmap **map, enum bpf_obj_type type) /* we don't want output polluted with libbpf errors if bpf_iter is not * supported */ - default_print = libbpf_set_print(libbpf_print_none); + /* default_print = libbpf_set_print(libbpf_print_none); */ err = pid_iter_bpf__load(skel); - libbpf_set_print(default_print); + /* libbpf_set_print(default_print); */ Rerun the above bpftool command: $ tools/build/bpftool/bpftool link libbpf: prog 'iter': BPF program load failed: Permission denied libbpf: prog 'iter': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; struct task_struct *task = ctx->task; @ pid_iter.bpf.c:69 0: (79) r6 = *(u64 *)(r1 +8) ; R1=ctx() R6_w=ptr_or_null_task_struct(id=1) ; struct file *file = ctx->file; @ pid_iter.bpf.c:68 ... ; struct bpf_link *link = (struct bpf_link *) file->private_data; @ pid_iter.bpf.c:103 80: (79) r3 = *(u64 *)(r8 +432) ; R3_w=scalar() R8=ptr_file() ; if (link->type == bpf_core_enum_value(enum bpf_link_type___local, @ pid_iter.bpf.c:105 81: (61) r1 = *(u32 *)(r3 +12) R3 invalid mem access 'scalar' processed 39 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2 -- END PROG LOAD LOG -- libbpf: prog 'iter': failed to load: -13 ... The 'file->private_data' returns a 'void' type and this caused subsequent 'link->type' (insn #81) failed in verification. To fix the issue, do a type cast from 'void *' to 'struct bpf_link *' for file->private_data as in this patch, and the 'bpftool link' runs successfully with 'pids'. $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 pids systemd(1) Fixes: 44ba7b3 ("bpftool: Use a local copy of BPF_LINK_TYPE_PERF_EVENT in pid_iter.bpf.c") Cc: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Current 'bpftool link' command does not show pids, e.g., $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 Hack the following change to enable normal libbpf debug output, --- a/tools/bpf/bpftool/pids.c +++ b/tools/bpf/bpftool/pids.c @@ -121,9 +121,9 @@ int build_obj_refs_table(struct hashmap **map, enum bpf_obj_type type) /* we don't want output polluted with libbpf errors if bpf_iter is not * supported */ - default_print = libbpf_set_print(libbpf_print_none); + /* default_print = libbpf_set_print(libbpf_print_none); */ err = pid_iter_bpf__load(skel); - libbpf_set_print(default_print); + /* libbpf_set_print(default_print); */ Rerun the above bpftool command: $ tools/build/bpftool/bpftool link libbpf: prog 'iter': BPF program load failed: Permission denied libbpf: prog 'iter': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; struct task_struct *task = ctx->task; @ pid_iter.bpf.c:69 0: (79) r6 = *(u64 *)(r1 +8) ; R1=ctx() R6_w=ptr_or_null_task_struct(id=1) ; struct file *file = ctx->file; @ pid_iter.bpf.c:68 ... ; struct bpf_link *link = (struct bpf_link *) file->private_data; @ pid_iter.bpf.c:103 80: (79) r3 = *(u64 *)(r8 +432) ; R3_w=scalar() R8=ptr_file() ; if (link->type == bpf_core_enum_value(enum bpf_link_type___local, @ pid_iter.bpf.c:105 81: (61) r1 = *(u32 *)(r3 +12) R3 invalid mem access 'scalar' processed 39 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2 -- END PROG LOAD LOG -- libbpf: prog 'iter': failed to load: -13 ... The 'file->private_data' returns a 'void' type and this caused subsequent 'link->type' (insn #81) failed in verification. To fix the issue, restore the previous BPF_CORE_READ so old kernels can also work. With this patch, the 'bpftool link' runs successfully with 'pids'. $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 pids systemd(1) Fixes: 44ba7b3 ("bpftool: Use a local copy of BPF_LINK_TYPE_PERF_EVENT in pid_iter.bpf.c") Cc: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Current 'bpftool link' command does not show pids, e.g., $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 Hack the following change to enable normal libbpf debug output, --- a/tools/bpf/bpftool/pids.c +++ b/tools/bpf/bpftool/pids.c @@ -121,9 +121,9 @@ int build_obj_refs_table(struct hashmap **map, enum bpf_obj_type type) /* we don't want output polluted with libbpf errors if bpf_iter is not * supported */ - default_print = libbpf_set_print(libbpf_print_none); + /* default_print = libbpf_set_print(libbpf_print_none); */ err = pid_iter_bpf__load(skel); - libbpf_set_print(default_print); + /* libbpf_set_print(default_print); */ Rerun the above bpftool command: $ tools/build/bpftool/bpftool link libbpf: prog 'iter': BPF program load failed: Permission denied libbpf: prog 'iter': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; struct task_struct *task = ctx->task; @ pid_iter.bpf.c:69 0: (79) r6 = *(u64 *)(r1 +8) ; R1=ctx() R6_w=ptr_or_null_task_struct(id=1) ; struct file *file = ctx->file; @ pid_iter.bpf.c:68 ... ; struct bpf_link *link = (struct bpf_link *) file->private_data; @ pid_iter.bpf.c:103 80: (79) r3 = *(u64 *)(r8 +432) ; R3_w=scalar() R8=ptr_file() ; if (link->type == bpf_core_enum_value(enum bpf_link_type___local, @ pid_iter.bpf.c:105 81: (61) r1 = *(u32 *)(r3 +12) R3 invalid mem access 'scalar' processed 39 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2 -- END PROG LOAD LOG -- libbpf: prog 'iter': failed to load: -13 ... The 'file->private_data' returns a 'void' type and this caused subsequent 'link->type' (insn #81) failed in verification. To fix the issue, restore the previous BPF_CORE_READ so old kernels can also work. With this patch, the 'bpftool link' runs successfully with 'pids'. $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 pids systemd(1) Fixes: 44ba7b3 ("bpftool: Use a local copy of BPF_LINK_TYPE_PERF_EVENT in pid_iter.bpf.c") Cc: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Quentin Monnet <quentin@isovalent.com> Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Current 'bpftool link' command does not show pids, e.g., $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 Hack the following change to enable normal libbpf debug output, --- a/tools/bpf/bpftool/pids.c +++ b/tools/bpf/bpftool/pids.c @@ -121,9 +121,9 @@ int build_obj_refs_table(struct hashmap **map, enum bpf_obj_type type) /* we don't want output polluted with libbpf errors if bpf_iter is not * supported */ - default_print = libbpf_set_print(libbpf_print_none); + /* default_print = libbpf_set_print(libbpf_print_none); */ err = pid_iter_bpf__load(skel); - libbpf_set_print(default_print); + /* libbpf_set_print(default_print); */ Rerun the above bpftool command: $ tools/build/bpftool/bpftool link libbpf: prog 'iter': BPF program load failed: Permission denied libbpf: prog 'iter': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; struct task_struct *task = ctx->task; @ pid_iter.bpf.c:69 0: (79) r6 = *(u64 *)(r1 +8) ; R1=ctx() R6_w=ptr_or_null_task_struct(id=1) ; struct file *file = ctx->file; @ pid_iter.bpf.c:68 ... ; struct bpf_link *link = (struct bpf_link *) file->private_data; @ pid_iter.bpf.c:103 80: (79) r3 = *(u64 *)(r8 +432) ; R3_w=scalar() R8=ptr_file() ; if (link->type == bpf_core_enum_value(enum bpf_link_type___local, @ pid_iter.bpf.c:105 81: (61) r1 = *(u32 *)(r3 +12) R3 invalid mem access 'scalar' processed 39 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2 -- END PROG LOAD LOG -- libbpf: prog 'iter': failed to load: -13 ... The 'file->private_data' returns a 'void' type and this caused subsequent 'link->type' (insn #81) failed in verification. To fix the issue, restore the previous BPF_CORE_READ so old kernels can also work. With this patch, the 'bpftool link' runs successfully with 'pids'. $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 pids systemd(1) Fixes: 44ba7b3 ("bpftool: Use a local copy of BPF_LINK_TYPE_PERF_EVENT in pid_iter.bpf.c") Cc: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Quentin Monnet <quentin@isovalent.com> Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Current 'bpftool link' command does not show pids, e.g., $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 Hack the following change to enable normal libbpf debug output, --- a/tools/bpf/bpftool/pids.c +++ b/tools/bpf/bpftool/pids.c @@ -121,9 +121,9 @@ int build_obj_refs_table(struct hashmap **map, enum bpf_obj_type type) /* we don't want output polluted with libbpf errors if bpf_iter is not * supported */ - default_print = libbpf_set_print(libbpf_print_none); + /* default_print = libbpf_set_print(libbpf_print_none); */ err = pid_iter_bpf__load(skel); - libbpf_set_print(default_print); + /* libbpf_set_print(default_print); */ Rerun the above bpftool command: $ tools/build/bpftool/bpftool link libbpf: prog 'iter': BPF program load failed: Permission denied libbpf: prog 'iter': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; struct task_struct *task = ctx->task; @ pid_iter.bpf.c:69 0: (79) r6 = *(u64 *)(r1 +8) ; R1=ctx() R6_w=ptr_or_null_task_struct(id=1) ; struct file *file = ctx->file; @ pid_iter.bpf.c:68 ... ; struct bpf_link *link = (struct bpf_link *) file->private_data; @ pid_iter.bpf.c:103 80: (79) r3 = *(u64 *)(r8 +432) ; R3_w=scalar() R8=ptr_file() ; if (link->type == bpf_core_enum_value(enum bpf_link_type___local, @ pid_iter.bpf.c:105 81: (61) r1 = *(u32 *)(r3 +12) R3 invalid mem access 'scalar' processed 39 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 2 -- END PROG LOAD LOG -- libbpf: prog 'iter': failed to load: -13 ... The 'file->private_data' returns a 'void' type and this caused subsequent 'link->type' (insn #81) failed in verification. To fix the issue, restore the previous BPF_CORE_READ so old kernels can also work. With this patch, the 'bpftool link' runs successfully with 'pids'. $ tools/build/bpftool/bpftool link ... 4: tracing prog 23 prog_type lsm attach_type lsm_mac target_obj_id 1 target_btf_id 31320 pids systemd(1) Fixes: 44ba7b3 ("bpftool: Use a local copy of BPF_LINK_TYPE_PERF_EVENT in pid_iter.bpf.c") Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Quentin Monnet <quentin@isovalent.com> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20240312023249.3776718-1-yonghong.song@linux.dev
Pull request for series with
subject: bpf: add support for other map types to bpf_map_lookup_and_delete_elem
version: 1
url: https://patchwork.ozlabs.org/project/netdev/list/?series=202427