-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s390/bpf: Fix multiple tail calls #31
Conversation
Master branch: 8081ede patch https://patchwork.ozlabs.org/project/netdev/patch/20200909232141.3099367-1-iii@linux.ibm.com/ applied successfully |
Master branch: 2f7de98 patch https://patchwork.ozlabs.org/project/netdev/patch/20200909232141.3099367-1-iii@linux.ibm.com/ applied successfully |
de95b8d
to
11481c8
Compare
Master branch: e3b9626 patch https://patchwork.ozlabs.org/project/netdev/patch/20200909232141.3099367-1-iii@linux.ibm.com/ applied successfully |
11481c8
to
59609ed
Compare
Master branch: d66423f patch https://patchwork.ozlabs.org/project/netdev/patch/20200909232141.3099367-1-iii@linux.ibm.com/ applied successfully |
59609ed
to
b83943f
Compare
Master branch: 90a1ded patch https://patchwork.ozlabs.org/project/netdev/patch/20200909232141.3099367-1-iii@linux.ibm.com/ applied successfully |
b83943f
to
36ed747
Compare
Master branch: 18841da patch https://patchwork.ozlabs.org/project/netdev/patch/20200909232141.3099367-1-iii@linux.ibm.com/ applied successfully |
36ed747
to
db9224f
Compare
Master branch: 2bab48c patch https://patchwork.ozlabs.org/project/netdev/patch/20200909232141.3099367-1-iii@linux.ibm.com/ applied successfully |
db9224f
to
da00798
Compare
exceeding tail call count or missing tail call target), JIT uses label[0] field, which contains the address of the instruction following the tail call. When there are multiple tail calls, label[0] value comes from handling of a previous tail call, which is incorrect. Fix by getting rid of label array and resolving the label address locally: for all 3 branches that jump to it, emit 0 offsets at the beginning, and then backpatch them with the correct value. Also, do not use the long jump infrastructure: the tail call sequence is known to be short, so make all 3 jumps short. Fixes: 6651ee0 ("s390/bpf: implement bpf_tail_call() helper") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> --- arch/s390/net/bpf_jit_comp.c | 61 ++++++++++++++++-------------------- 1 file changed, 27 insertions(+), 34 deletions(-)
Master branch: 2bab48c patch https://patchwork.ozlabs.org/project/netdev/patch/20200909232141.3099367-1-iii@linux.ibm.com/ applied successfully |
da00798
to
a3013de
Compare
At least one diff in series https://patchwork.ozlabs.org/project/netdev/list/?series=200680 irrelevant now. Closing PR. |
Fix BPF_CORE_READ_BITFIELD() macro used for reading CO-RE-relocatable bitfields. Missing breaks in a switch caused 8-byte reads always. This can confuse libbpf because it does strict checks that memory load size corresponds to the original size of the field, which in this case quite often would be wrong. After fixing that, we run into another problem, which quite subtle, so worth documenting here. The issue is in Clang optimization and CO-RE relocation interactions. Without that asm volatile construct (also known as barrier_var()), Clang will re-order BYTE_OFFSET and BYTE_SIZE relocations and will apply BYTE_OFFSET 4 times for each switch case arm. This will result in the same error from libbpf about mismatch of memory load size and original field size. I.e., if we were reading u32, we'd still have *(u8 *), *(u16 *), *(u32 *), and *(u64 *) memory loads, three of which will fail. Using barrier_var() forces Clang to apply BYTE_OFFSET relocation first (and once) to calculate p, after which value of p is used without relocation in each of switch case arms, doing appropiately-sized memory load. Here's the list of relevant relocations and pieces of generated BPF code before and after this patch for test_core_reloc_bitfields_direct selftests. BEFORE ===== #45: core_reloc: insn #160 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32 #46: core_reloc: insn #167 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #47: core_reloc: insn #174 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #48: core_reloc: insn #178 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #49: core_reloc: insn #182 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 157: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll 159: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1 160: b7 02 00 00 04 00 00 00 r2 = 4 ; BYTE_SIZE relocation here ^^^ 161: 66 02 07 00 03 00 00 00 if w2 s> 3 goto +7 <LBB0_63> 162: 16 02 0d 00 01 00 00 00 if w2 == 1 goto +13 <LBB0_65> 163: 16 02 01 00 02 00 00 00 if w2 == 2 goto +1 <LBB0_66> 164: 05 00 12 00 00 00 00 00 goto +18 <LBB0_69> 0000000000000528 <LBB0_66>: 165: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 167: 69 11 08 00 00 00 00 00 r1 = *(u16 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 168: 05 00 0e 00 00 00 00 00 goto +14 <LBB0_69> 0000000000000548 <LBB0_63>: 169: 16 02 0a 00 04 00 00 00 if w2 == 4 goto +10 <LBB0_67> 170: 16 02 01 00 08 00 00 00 if w2 == 8 goto +1 <LBB0_68> 171: 05 00 0b 00 00 00 00 00 goto +11 <LBB0_69> 0000000000000560 <LBB0_68>: 172: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 174: 79 11 08 00 00 00 00 00 r1 = *(u64 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 175: 05 00 07 00 00 00 00 00 goto +7 <LBB0_69> 0000000000000580 <LBB0_65>: 176: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 178: 71 11 08 00 00 00 00 00 r1 = *(u8 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 179: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69> 00000000000005a0 <LBB0_67>: 180: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 182: 61 11 08 00 00 00 00 00 r1 = *(u32 *)(r1 + 8) ; BYTE_OFFSET relo here w/ RIGHT size ^^^^^^^^^^^^^^^^ 00000000000005b8 <LBB0_69>: 183: 67 01 00 00 20 00 00 00 r1 <<= 32 184: b7 02 00 00 00 00 00 00 r2 = 0 185: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71> 186: c7 01 00 00 20 00 00 00 r1 s>>= 32 187: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72> 00000000000005e0 <LBB0_71>: 188: 77 01 00 00 20 00 00 00 r1 >>= 32 AFTER ===== #30: core_reloc: insn #132 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #31: core_reloc: insn #134 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32 129: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll 131: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1 132: b7 01 00 00 08 00 00 00 r1 = 8 ; BYTE_OFFSET relo here ^^^ ; no size check for non-memory dereferencing instructions 133: 0f 12 00 00 00 00 00 00 r2 += r1 134: b7 03 00 00 04 00 00 00 r3 = 4 ; BYTE_SIZE relocation here ^^^ 135: 66 03 05 00 03 00 00 00 if w3 s> 3 goto +5 <LBB0_63> 136: 16 03 09 00 01 00 00 00 if w3 == 1 goto +9 <LBB0_65> 137: 16 03 01 00 02 00 00 00 if w3 == 2 goto +1 <LBB0_66> 138: 05 00 0a 00 00 00 00 00 goto +10 <LBB0_69> 0000000000000458 <LBB0_66>: 139: 69 21 00 00 00 00 00 00 r1 = *(u16 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 140: 05 00 08 00 00 00 00 00 goto +8 <LBB0_69> 0000000000000468 <LBB0_63>: 141: 16 03 06 00 04 00 00 00 if w3 == 4 goto +6 <LBB0_67> 142: 16 03 01 00 08 00 00 00 if w3 == 8 goto +1 <LBB0_68> 143: 05 00 05 00 00 00 00 00 goto +5 <LBB0_69> 0000000000000480 <LBB0_68>: 144: 79 21 00 00 00 00 00 00 r1 = *(u64 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 145: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69> 0000000000000490 <LBB0_65>: 146: 71 21 00 00 00 00 00 00 r1 = *(u8 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 147: 05 00 01 00 00 00 00 00 goto +1 <LBB0_69> 00000000000004a0 <LBB0_67>: 148: 61 21 00 00 00 00 00 00 r1 = *(u32 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 00000000000004a8 <LBB0_69>: 149: 67 01 00 00 20 00 00 00 r1 <<= 32 150: b7 02 00 00 00 00 00 00 r2 = 0 151: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71> 152: c7 01 00 00 20 00 00 00 r1 s>>= 32 153: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72> 00000000000004d0 <LBB0_71>: 154: 77 01 00 00 20 00 00 00 r1 >>= 323 Fixes: ee26dad ("libbpf: Add support for relocatable bitfields") Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Fix BPF_CORE_READ_BITFIELD() macro used for reading CO-RE-relocatable bitfields. Missing breaks in a switch caused 8-byte reads always. This can confuse libbpf because it does strict checks that memory load size corresponds to the original size of the field, which in this case quite often would be wrong. After fixing that, we run into another problem, which quite subtle, so worth documenting here. The issue is in Clang optimization and CO-RE relocation interactions. Without that asm volatile construct (also known as barrier_var()), Clang will re-order BYTE_OFFSET and BYTE_SIZE relocations and will apply BYTE_OFFSET 4 times for each switch case arm. This will result in the same error from libbpf about mismatch of memory load size and original field size. I.e., if we were reading u32, we'd still have *(u8 *), *(u16 *), *(u32 *), and *(u64 *) memory loads, three of which will fail. Using barrier_var() forces Clang to apply BYTE_OFFSET relocation first (and once) to calculate p, after which value of p is used without relocation in each of switch case arms, doing appropiately-sized memory load. Here's the list of relevant relocations and pieces of generated BPF code before and after this patch for test_core_reloc_bitfields_direct selftests. BEFORE ===== #45: core_reloc: insn #160 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32 #46: core_reloc: insn #167 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #47: core_reloc: insn #174 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #48: core_reloc: insn #178 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #49: core_reloc: insn #182 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 157: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll 159: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1 160: b7 02 00 00 04 00 00 00 r2 = 4 ; BYTE_SIZE relocation here ^^^ 161: 66 02 07 00 03 00 00 00 if w2 s> 3 goto +7 <LBB0_63> 162: 16 02 0d 00 01 00 00 00 if w2 == 1 goto +13 <LBB0_65> 163: 16 02 01 00 02 00 00 00 if w2 == 2 goto +1 <LBB0_66> 164: 05 00 12 00 00 00 00 00 goto +18 <LBB0_69> 0000000000000528 <LBB0_66>: 165: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 167: 69 11 08 00 00 00 00 00 r1 = *(u16 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 168: 05 00 0e 00 00 00 00 00 goto +14 <LBB0_69> 0000000000000548 <LBB0_63>: 169: 16 02 0a 00 04 00 00 00 if w2 == 4 goto +10 <LBB0_67> 170: 16 02 01 00 08 00 00 00 if w2 == 8 goto +1 <LBB0_68> 171: 05 00 0b 00 00 00 00 00 goto +11 <LBB0_69> 0000000000000560 <LBB0_68>: 172: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 174: 79 11 08 00 00 00 00 00 r1 = *(u64 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 175: 05 00 07 00 00 00 00 00 goto +7 <LBB0_69> 0000000000000580 <LBB0_65>: 176: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 178: 71 11 08 00 00 00 00 00 r1 = *(u8 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 179: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69> 00000000000005a0 <LBB0_67>: 180: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 182: 61 11 08 00 00 00 00 00 r1 = *(u32 *)(r1 + 8) ; BYTE_OFFSET relo here w/ RIGHT size ^^^^^^^^^^^^^^^^ 00000000000005b8 <LBB0_69>: 183: 67 01 00 00 20 00 00 00 r1 <<= 32 184: b7 02 00 00 00 00 00 00 r2 = 0 185: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71> 186: c7 01 00 00 20 00 00 00 r1 s>>= 32 187: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72> 00000000000005e0 <LBB0_71>: 188: 77 01 00 00 20 00 00 00 r1 >>= 32 AFTER ===== #30: core_reloc: insn #132 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #31: core_reloc: insn #134 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32 129: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll 131: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1 132: b7 01 00 00 08 00 00 00 r1 = 8 ; BYTE_OFFSET relo here ^^^ ; no size check for non-memory dereferencing instructions 133: 0f 12 00 00 00 00 00 00 r2 += r1 134: b7 03 00 00 04 00 00 00 r3 = 4 ; BYTE_SIZE relocation here ^^^ 135: 66 03 05 00 03 00 00 00 if w3 s> 3 goto +5 <LBB0_63> 136: 16 03 09 00 01 00 00 00 if w3 == 1 goto +9 <LBB0_65> 137: 16 03 01 00 02 00 00 00 if w3 == 2 goto +1 <LBB0_66> 138: 05 00 0a 00 00 00 00 00 goto +10 <LBB0_69> 0000000000000458 <LBB0_66>: 139: 69 21 00 00 00 00 00 00 r1 = *(u16 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 140: 05 00 08 00 00 00 00 00 goto +8 <LBB0_69> 0000000000000468 <LBB0_63>: 141: 16 03 06 00 04 00 00 00 if w3 == 4 goto +6 <LBB0_67> 142: 16 03 01 00 08 00 00 00 if w3 == 8 goto +1 <LBB0_68> 143: 05 00 05 00 00 00 00 00 goto +5 <LBB0_69> 0000000000000480 <LBB0_68>: 144: 79 21 00 00 00 00 00 00 r1 = *(u64 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 145: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69> 0000000000000490 <LBB0_65>: 146: 71 21 00 00 00 00 00 00 r1 = *(u8 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 147: 05 00 01 00 00 00 00 00 goto +1 <LBB0_69> 00000000000004a0 <LBB0_67>: 148: 61 21 00 00 00 00 00 00 r1 = *(u32 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 00000000000004a8 <LBB0_69>: 149: 67 01 00 00 20 00 00 00 r1 <<= 32 150: b7 02 00 00 00 00 00 00 r2 = 0 151: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71> 152: c7 01 00 00 20 00 00 00 r1 s>>= 32 153: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72> 00000000000004d0 <LBB0_71>: 154: 77 01 00 00 20 00 00 00 r1 >>= 323 Fixes: ee26dad ("libbpf: Add support for relocatable bitfields") Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Fix BPF_CORE_READ_BITFIELD() macro used for reading CO-RE-relocatable bitfields. Missing breaks in a switch caused 8-byte reads always. This can confuse libbpf because it does strict checks that memory load size corresponds to the original size of the field, which in this case quite often would be wrong. After fixing that, we run into another problem, which quite subtle, so worth documenting here. The issue is in Clang optimization and CO-RE relocation interactions. Without that asm volatile construct (also known as barrier_var()), Clang will re-order BYTE_OFFSET and BYTE_SIZE relocations and will apply BYTE_OFFSET 4 times for each switch case arm. This will result in the same error from libbpf about mismatch of memory load size and original field size. I.e., if we were reading u32, we'd still have *(u8 *), *(u16 *), *(u32 *), and *(u64 *) memory loads, three of which will fail. Using barrier_var() forces Clang to apply BYTE_OFFSET relocation first (and once) to calculate p, after which value of p is used without relocation in each of switch case arms, doing appropiately-sized memory load. Here's the list of relevant relocations and pieces of generated BPF code before and after this patch for test_core_reloc_bitfields_direct selftests. BEFORE ===== #45: core_reloc: insn #160 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32 #46: core_reloc: insn #167 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #47: core_reloc: insn #174 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #48: core_reloc: insn #178 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #49: core_reloc: insn #182 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 157: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll 159: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1 160: b7 02 00 00 04 00 00 00 r2 = 4 ; BYTE_SIZE relocation here ^^^ 161: 66 02 07 00 03 00 00 00 if w2 s> 3 goto +7 <LBB0_63> 162: 16 02 0d 00 01 00 00 00 if w2 == 1 goto +13 <LBB0_65> 163: 16 02 01 00 02 00 00 00 if w2 == 2 goto +1 <LBB0_66> 164: 05 00 12 00 00 00 00 00 goto +18 <LBB0_69> 0000000000000528 <LBB0_66>: 165: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 167: 69 11 08 00 00 00 00 00 r1 = *(u16 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 168: 05 00 0e 00 00 00 00 00 goto +14 <LBB0_69> 0000000000000548 <LBB0_63>: 169: 16 02 0a 00 04 00 00 00 if w2 == 4 goto +10 <LBB0_67> 170: 16 02 01 00 08 00 00 00 if w2 == 8 goto +1 <LBB0_68> 171: 05 00 0b 00 00 00 00 00 goto +11 <LBB0_69> 0000000000000560 <LBB0_68>: 172: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 174: 79 11 08 00 00 00 00 00 r1 = *(u64 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 175: 05 00 07 00 00 00 00 00 goto +7 <LBB0_69> 0000000000000580 <LBB0_65>: 176: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 178: 71 11 08 00 00 00 00 00 r1 = *(u8 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 179: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69> 00000000000005a0 <LBB0_67>: 180: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 182: 61 11 08 00 00 00 00 00 r1 = *(u32 *)(r1 + 8) ; BYTE_OFFSET relo here w/ RIGHT size ^^^^^^^^^^^^^^^^ 00000000000005b8 <LBB0_69>: 183: 67 01 00 00 20 00 00 00 r1 <<= 32 184: b7 02 00 00 00 00 00 00 r2 = 0 185: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71> 186: c7 01 00 00 20 00 00 00 r1 s>>= 32 187: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72> 00000000000005e0 <LBB0_71>: 188: 77 01 00 00 20 00 00 00 r1 >>= 32 AFTER ===== #30: core_reloc: insn #132 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #31: core_reloc: insn #134 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32 129: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll 131: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1 132: b7 01 00 00 08 00 00 00 r1 = 8 ; BYTE_OFFSET relo here ^^^ ; no size check for non-memory dereferencing instructions 133: 0f 12 00 00 00 00 00 00 r2 += r1 134: b7 03 00 00 04 00 00 00 r3 = 4 ; BYTE_SIZE relocation here ^^^ 135: 66 03 05 00 03 00 00 00 if w3 s> 3 goto +5 <LBB0_63> 136: 16 03 09 00 01 00 00 00 if w3 == 1 goto +9 <LBB0_65> 137: 16 03 01 00 02 00 00 00 if w3 == 2 goto +1 <LBB0_66> 138: 05 00 0a 00 00 00 00 00 goto +10 <LBB0_69> 0000000000000458 <LBB0_66>: 139: 69 21 00 00 00 00 00 00 r1 = *(u16 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 140: 05 00 08 00 00 00 00 00 goto +8 <LBB0_69> 0000000000000468 <LBB0_63>: 141: 16 03 06 00 04 00 00 00 if w3 == 4 goto +6 <LBB0_67> 142: 16 03 01 00 08 00 00 00 if w3 == 8 goto +1 <LBB0_68> 143: 05 00 05 00 00 00 00 00 goto +5 <LBB0_69> 0000000000000480 <LBB0_68>: 144: 79 21 00 00 00 00 00 00 r1 = *(u64 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 145: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69> 0000000000000490 <LBB0_65>: 146: 71 21 00 00 00 00 00 00 r1 = *(u8 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 147: 05 00 01 00 00 00 00 00 goto +1 <LBB0_69> 00000000000004a0 <LBB0_67>: 148: 61 21 00 00 00 00 00 00 r1 = *(u32 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 00000000000004a8 <LBB0_69>: 149: 67 01 00 00 20 00 00 00 r1 <<= 32 150: b7 02 00 00 00 00 00 00 r2 = 0 151: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71> 152: c7 01 00 00 20 00 00 00 r1 s>>= 32 153: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72> 00000000000004d0 <LBB0_71>: 154: 77 01 00 00 20 00 00 00 r1 >>= 323 Fixes: ee26dad ("libbpf: Add support for relocatable bitfields") Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Fix BPF_CORE_READ_BITFIELD() macro used for reading CO-RE-relocatable bitfields. Missing breaks in a switch caused 8-byte reads always. This can confuse libbpf because it does strict checks that memory load size corresponds to the original size of the field, which in this case quite often would be wrong. After fixing that, we run into another problem, which quite subtle, so worth documenting here. The issue is in Clang optimization and CO-RE relocation interactions. Without that asm volatile construct (also known as barrier_var()), Clang will re-order BYTE_OFFSET and BYTE_SIZE relocations and will apply BYTE_OFFSET 4 times for each switch case arm. This will result in the same error from libbpf about mismatch of memory load size and original field size. I.e., if we were reading u32, we'd still have *(u8 *), *(u16 *), *(u32 *), and *(u64 *) memory loads, three of which will fail. Using barrier_var() forces Clang to apply BYTE_OFFSET relocation first (and once) to calculate p, after which value of p is used without relocation in each of switch case arms, doing appropiately-sized memory load. Here's the list of relevant relocations and pieces of generated BPF code before and after this patch for test_core_reloc_bitfields_direct selftests. BEFORE ===== #45: core_reloc: insn #160 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32 #46: core_reloc: insn #167 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #47: core_reloc: insn #174 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #48: core_reloc: insn #178 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #49: core_reloc: insn #182 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 157: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll 159: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1 160: b7 02 00 00 04 00 00 00 r2 = 4 ; BYTE_SIZE relocation here ^^^ 161: 66 02 07 00 03 00 00 00 if w2 s> 3 goto +7 <LBB0_63> 162: 16 02 0d 00 01 00 00 00 if w2 == 1 goto +13 <LBB0_65> 163: 16 02 01 00 02 00 00 00 if w2 == 2 goto +1 <LBB0_66> 164: 05 00 12 00 00 00 00 00 goto +18 <LBB0_69> 0000000000000528 <LBB0_66>: 165: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 167: 69 11 08 00 00 00 00 00 r1 = *(u16 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 168: 05 00 0e 00 00 00 00 00 goto +14 <LBB0_69> 0000000000000548 <LBB0_63>: 169: 16 02 0a 00 04 00 00 00 if w2 == 4 goto +10 <LBB0_67> 170: 16 02 01 00 08 00 00 00 if w2 == 8 goto +1 <LBB0_68> 171: 05 00 0b 00 00 00 00 00 goto +11 <LBB0_69> 0000000000000560 <LBB0_68>: 172: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 174: 79 11 08 00 00 00 00 00 r1 = *(u64 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 175: 05 00 07 00 00 00 00 00 goto +7 <LBB0_69> 0000000000000580 <LBB0_65>: 176: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 178: 71 11 08 00 00 00 00 00 r1 = *(u8 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 179: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69> 00000000000005a0 <LBB0_67>: 180: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 182: 61 11 08 00 00 00 00 00 r1 = *(u32 *)(r1 + 8) ; BYTE_OFFSET relo here w/ RIGHT size ^^^^^^^^^^^^^^^^ 00000000000005b8 <LBB0_69>: 183: 67 01 00 00 20 00 00 00 r1 <<= 32 184: b7 02 00 00 00 00 00 00 r2 = 0 185: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71> 186: c7 01 00 00 20 00 00 00 r1 s>>= 32 187: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72> 00000000000005e0 <LBB0_71>: 188: 77 01 00 00 20 00 00 00 r1 >>= 32 AFTER ===== #30: core_reloc: insn #132 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #31: core_reloc: insn #134 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32 129: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll 131: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1 132: b7 01 00 00 08 00 00 00 r1 = 8 ; BYTE_OFFSET relo here ^^^ ; no size check for non-memory dereferencing instructions 133: 0f 12 00 00 00 00 00 00 r2 += r1 134: b7 03 00 00 04 00 00 00 r3 = 4 ; BYTE_SIZE relocation here ^^^ 135: 66 03 05 00 03 00 00 00 if w3 s> 3 goto +5 <LBB0_63> 136: 16 03 09 00 01 00 00 00 if w3 == 1 goto +9 <LBB0_65> 137: 16 03 01 00 02 00 00 00 if w3 == 2 goto +1 <LBB0_66> 138: 05 00 0a 00 00 00 00 00 goto +10 <LBB0_69> 0000000000000458 <LBB0_66>: 139: 69 21 00 00 00 00 00 00 r1 = *(u16 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 140: 05 00 08 00 00 00 00 00 goto +8 <LBB0_69> 0000000000000468 <LBB0_63>: 141: 16 03 06 00 04 00 00 00 if w3 == 4 goto +6 <LBB0_67> 142: 16 03 01 00 08 00 00 00 if w3 == 8 goto +1 <LBB0_68> 143: 05 00 05 00 00 00 00 00 goto +5 <LBB0_69> 0000000000000480 <LBB0_68>: 144: 79 21 00 00 00 00 00 00 r1 = *(u64 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 145: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69> 0000000000000490 <LBB0_65>: 146: 71 21 00 00 00 00 00 00 r1 = *(u8 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 147: 05 00 01 00 00 00 00 00 goto +1 <LBB0_69> 00000000000004a0 <LBB0_67>: 148: 61 21 00 00 00 00 00 00 r1 = *(u32 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 00000000000004a8 <LBB0_69>: 149: 67 01 00 00 20 00 00 00 r1 <<= 32 150: b7 02 00 00 00 00 00 00 r2 = 0 151: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71> 152: c7 01 00 00 20 00 00 00 r1 s>>= 32 153: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72> 00000000000004d0 <LBB0_71>: 154: 77 01 00 00 20 00 00 00 r1 >>= 323 Fixes: ee26dad ("libbpf: Add support for relocatable bitfields") Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Fix BPF_CORE_READ_BITFIELD() macro used for reading CO-RE-relocatable bitfields. Missing breaks in a switch caused 8-byte reads always. This can confuse libbpf because it does strict checks that memory load size corresponds to the original size of the field, which in this case quite often would be wrong. After fixing that, we run into another problem, which quite subtle, so worth documenting here. The issue is in Clang optimization and CO-RE relocation interactions. Without that asm volatile construct (also known as barrier_var()), Clang will re-order BYTE_OFFSET and BYTE_SIZE relocations and will apply BYTE_OFFSET 4 times for each switch case arm. This will result in the same error from libbpf about mismatch of memory load size and original field size. I.e., if we were reading u32, we'd still have *(u8 *), *(u16 *), *(u32 *), and *(u64 *) memory loads, three of which will fail. Using barrier_var() forces Clang to apply BYTE_OFFSET relocation first (and once) to calculate p, after which value of p is used without relocation in each of switch case arms, doing appropiately-sized memory load. Here's the list of relevant relocations and pieces of generated BPF code before and after this patch for test_core_reloc_bitfields_direct selftests. BEFORE ===== #45: core_reloc: insn #160 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32 #46: core_reloc: insn #167 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #47: core_reloc: insn #174 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #48: core_reloc: insn #178 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #49: core_reloc: insn #182 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 157: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll 159: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1 160: b7 02 00 00 04 00 00 00 r2 = 4 ; BYTE_SIZE relocation here ^^^ 161: 66 02 07 00 03 00 00 00 if w2 s> 3 goto +7 <LBB0_63> 162: 16 02 0d 00 01 00 00 00 if w2 == 1 goto +13 <LBB0_65> 163: 16 02 01 00 02 00 00 00 if w2 == 2 goto +1 <LBB0_66> 164: 05 00 12 00 00 00 00 00 goto +18 <LBB0_69> 0000000000000528 <LBB0_66>: 165: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 167: 69 11 08 00 00 00 00 00 r1 = *(u16 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 168: 05 00 0e 00 00 00 00 00 goto +14 <LBB0_69> 0000000000000548 <LBB0_63>: 169: 16 02 0a 00 04 00 00 00 if w2 == 4 goto +10 <LBB0_67> 170: 16 02 01 00 08 00 00 00 if w2 == 8 goto +1 <LBB0_68> 171: 05 00 0b 00 00 00 00 00 goto +11 <LBB0_69> 0000000000000560 <LBB0_68>: 172: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 174: 79 11 08 00 00 00 00 00 r1 = *(u64 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 175: 05 00 07 00 00 00 00 00 goto +7 <LBB0_69> 0000000000000580 <LBB0_65>: 176: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 178: 71 11 08 00 00 00 00 00 r1 = *(u8 *)(r1 + 8) ; BYTE_OFFSET relo here w/ WRONG size ^^^^^^^^^^^^^^^^ 179: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69> 00000000000005a0 <LBB0_67>: 180: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 182: 61 11 08 00 00 00 00 00 r1 = *(u32 *)(r1 + 8) ; BYTE_OFFSET relo here w/ RIGHT size ^^^^^^^^^^^^^^^^ 00000000000005b8 <LBB0_69>: 183: 67 01 00 00 20 00 00 00 r1 <<= 32 184: b7 02 00 00 00 00 00 00 r2 = 0 185: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71> 186: c7 01 00 00 20 00 00 00 r1 s>>= 32 187: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72> 00000000000005e0 <LBB0_71>: 188: 77 01 00 00 20 00 00 00 r1 >>= 32 AFTER ===== #30: core_reloc: insn #132 --> [5] + 0:5: byte_off --> struct core_reloc_bitfields.u32 #31: core_reloc: insn #134 --> [5] + 0:5: byte_sz --> struct core_reloc_bitfields.u32 129: 18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll 131: 7b 12 20 01 00 00 00 00 *(u64 *)(r2 + 288) = r1 132: b7 01 00 00 08 00 00 00 r1 = 8 ; BYTE_OFFSET relo here ^^^ ; no size check for non-memory dereferencing instructions 133: 0f 12 00 00 00 00 00 00 r2 += r1 134: b7 03 00 00 04 00 00 00 r3 = 4 ; BYTE_SIZE relocation here ^^^ 135: 66 03 05 00 03 00 00 00 if w3 s> 3 goto +5 <LBB0_63> 136: 16 03 09 00 01 00 00 00 if w3 == 1 goto +9 <LBB0_65> 137: 16 03 01 00 02 00 00 00 if w3 == 2 goto +1 <LBB0_66> 138: 05 00 0a 00 00 00 00 00 goto +10 <LBB0_69> 0000000000000458 <LBB0_66>: 139: 69 21 00 00 00 00 00 00 r1 = *(u16 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 140: 05 00 08 00 00 00 00 00 goto +8 <LBB0_69> 0000000000000468 <LBB0_63>: 141: 16 03 06 00 04 00 00 00 if w3 == 4 goto +6 <LBB0_67> 142: 16 03 01 00 08 00 00 00 if w3 == 8 goto +1 <LBB0_68> 143: 05 00 05 00 00 00 00 00 goto +5 <LBB0_69> 0000000000000480 <LBB0_68>: 144: 79 21 00 00 00 00 00 00 r1 = *(u64 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 145: 05 00 03 00 00 00 00 00 goto +3 <LBB0_69> 0000000000000490 <LBB0_65>: 146: 71 21 00 00 00 00 00 00 r1 = *(u8 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 147: 05 00 01 00 00 00 00 00 goto +1 <LBB0_69> 00000000000004a0 <LBB0_67>: 148: 61 21 00 00 00 00 00 00 r1 = *(u32 *)(r2 + 0) ; NO CO-RE relocation here ^^^^^^^^^^^^^^^^ 00000000000004a8 <LBB0_69>: 149: 67 01 00 00 20 00 00 00 r1 <<= 32 150: b7 02 00 00 00 00 00 00 r2 = 0 151: 16 02 02 00 00 00 00 00 if w2 == 0 goto +2 <LBB0_71> 152: c7 01 00 00 20 00 00 00 r1 s>>= 32 153: 05 00 01 00 00 00 00 00 goto +1 <LBB0_72> 00000000000004d0 <LBB0_71>: 154: 77 01 00 00 20 00 00 00 r1 >>= 323 Acked-by: Lorenz Bauer <lmb@cloudflare.com> Fixes: ee26dad ("libbpf: Add support for relocatable bitfields") Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231112010609.848406-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
syzkaller reported an overflown write in arp_req_get(). [0] When ioctl(SIOCGARP) is issued, arp_req_get() looks up an neighbour entry and copies neigh->ha to struct arpreq.arp_ha.sa_data. The arp_ha here is struct sockaddr, not struct sockaddr_storage, so the sa_data buffer is just 14 bytes. In the splat below, 2 bytes are overflown to the next int field, arp_flags. We initialise the field just after the memcpy(), so it's not a problem. However, when dev->addr_len is greater than 22 (e.g. MAX_ADDR_LEN), arp_netmask is overwritten, which could be set as htonl(0xFFFFFFFFUL) in arp_ioctl() before calling arp_req_get(). To avoid the overflow, let's limit the max length of memcpy(). Note that commit b5f0de6 ("net: dev: Convert sa_data to flexible array in struct sockaddr") just silenced syzkaller. [0]: memcpy: detected field-spanning write (size 16) of single field "r->arp_ha.sa_data" at net/ipv4/arp.c:1128 (size 14) WARNING: CPU: 0 PID: 144638 at net/ipv4/arp.c:1128 arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128 Modules linked in: CPU: 0 PID: 144638 Comm: syz-executor.4 Not tainted 6.1.74 #31 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 RIP: 0010:arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128 Code: fd ff ff e8 41 42 de fb b9 0e 00 00 00 4c 89 fe 48 c7 c2 20 6d ab 87 48 c7 c7 80 6d ab 87 c6 05 25 af 72 04 01 e8 5f 8d ad fb <0f> 0b e9 6c fd ff ff e8 13 42 de fb be 03 00 00 00 4c 89 e7 e8 a6 RSP: 0018:ffffc900050b7998 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff88803a815000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff8641a44a RDI: 0000000000000001 RBP: ffffc900050b7a98 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 203a7970636d656d R12: ffff888039c54000 R13: 1ffff92000a16f37 R14: ffff88803a815084 R15: 0000000000000010 FS: 00007f172bf306c0(0000) GS:ffff88805aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f172b3569f0 CR3: 0000000057f12005 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> arp_ioctl+0x33f/0x4b0 net/ipv4/arp.c:1261 inet_ioctl+0x314/0x3a0 net/ipv4/af_inet.c:981 sock_do_ioctl+0xdf/0x260 net/socket.c:1204 sock_ioctl+0x3ef/0x650 net/socket.c:1321 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x18e/0x220 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x64/0xce RIP: 0033:0x7f172b262b8d Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f172bf300b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f172b3abf80 RCX: 00007f172b262b8d RDX: 0000000020000000 RSI: 0000000000008954 RDI: 0000000000000003 RBP: 00007f172b2d3493 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007f172b3abf80 R15: 00007f172bf10000 </TASK> Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: Bjoern Doebel <doebel@amazon.de> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240215230516.31330-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The current implementation of the mov instruction with sign extension has the following problems: 1. It clobbers the source register if it is not stacked because it sign extends the source and then moves it to the destination. 2. If the dst_reg is stacked, the current code doesn't write the value back in case of 64-bit mov. 3. There is room for improvement by emitting fewer instructions. The steps for fixing this and the instructions emitted by the JIT are explained below with examples in all combinations: Case A: offset == 32: ===================== Case A.1: src and dst are stacked registers: -------------------------------------------- 1. Load src_lo into tmp_lo 2. Store tmp_lo into dst_lo 3. Sign extend tmp_lo into tmp_hi 4. Store tmp_hi to dst_hi Example: r3 = (s32)r3 r3 is a stacked register ldr r6, [r11, #-16] // Load r3_lo into tmp_lo // str to dst_lo is not emitted because src_lo == dst_lo asr r7, r6, #31 // Sign extend tmp_lo into tmp_hi str r7, [r11, #-12] // Store tmp_hi into r3_hi Case A.2: src is stacked but dst is not: ---------------------------------------- 1. Load src_lo into dst_lo 2. Sign extend dst_lo into dst_hi Example: r6 = (s32)r3 r6 maps to {ARM_R5, ARM_R4} and r3 is stacked ldr r4, [r11, #-16] // Load r3_lo into r6_lo asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case A.3: src is not stacked but dst is stacked: ------------------------------------------------ 1. Store src_lo into dst_lo 2. Sign extend src_lo into tmp_hi 3. Store tmp_hi to dst_hi Example: r3 = (s32)r6 r3 is stacked and r6 maps to {ARM_R5, ARM_R4} str r4, [r11, #-16] // Store r6_lo to r3_lo asr r7, r4, #31 // Sign extend r6_lo into tmp_hi str r7, [r11, #-12] // Store tmp_hi to dest_hi Case A.4: Both src and dst are not stacked: ------------------------------------------- 1. Mov src_lo into dst_lo 2. Sign extend src_lo into dst_hi Example: (bf) r6 = (s32)r6 r6 maps to {ARM_R5, ARM_R4} // Mov not emitted because dst == src asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case B: offset != 32: ===================== Case B.1: src and dst are stacked registers: -------------------------------------------- 1. Load src_lo into tmp_lo 2. Sign extend tmp_lo according to offset. 3. Store tmp_lo into dst_lo 4. Sign extend tmp_lo into tmp_hi 5. Store tmp_hi to dst_hi Example: r9 = (s8)r3 r9 and r3 are both stacked registers ldr r6, [r11, #-16] // Load r3_lo into tmp_lo lsl r6, r6, #24 // Sign extend tmp_lo asr r6, r6, #24 // .. str r6, [r11, #-56] // Store tmp_lo to r9_lo asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi str r7, [r11, #-52] // Store tmp_hi to r9_hi Case B.2: src is stacked but dst is not: ---------------------------------------- 1. Load src_lo into dst_lo 2. Sign extend dst_lo according to offset. 3. Sign extend tmp_lo into dst_hi Example: r6 = (s8)r3 r6 maps to {ARM_R5, ARM_R4} and r3 is stacked ldr r4, [r11, #-16] // Load r3_lo to r6_lo lsl r4, r4, #24 // Sign extend r6_lo asr r4, r4, #24 // .. asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case B.3: src is not stacked but dst is stacked: ------------------------------------------------ 1. Sign extend src_lo into tmp_lo according to offset. 2. Store tmp_lo into dst_lo. 3. Sign extend src_lo into tmp_hi. 4. Store tmp_hi to dst_hi. Example: r3 = (s8)r1 r3 is stacked and r1 maps to {ARM_R3, ARM_R2} lsl r6, r2, #24 // Sign extend r1_lo to tmp_lo asr r6, r6, #24 // .. str r6, [r11, #-16] // Store tmp_lo to r3_lo asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi str r7, [r11, #-12] // Store tmp_hi to r3_hi Case B.4: Both src and dst are not stacked: ------------------------------------------- 1. Sign extend src_lo into dst_lo according to offset. 2. Sign extend dst_lo into dst_hi. Example: r6 = (s8)r1 r6 maps to {ARM_R5, ARM_R4} and r1 maps to {ARM_R3, ARM_R2} lsl r4, r2, #24 // Sign extend r1_lo to r6_lo asr r4, r4, #24 // .. asr r5, r4, #31 // Sign extend r6_lo to r6_hi Fixes: fc83265 ("arm32, bpf: add support for sign-extension mov instruction") Reported-by: syzbot+186522670e6722692d86@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@google.com/ Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
The current implementation of the mov instruction with sign extension has the following problems: 1. It clobbers the source register if it is not stacked because it sign extends the source and then moves it to the destination. 2. If the dst_reg is stacked, the current code doesn't write the value back in case of 64-bit mov. 3. There is room for improvement by emitting fewer instructions. The steps for fixing this and the instructions emitted by the JIT are explained below with examples in all combinations: Case A: offset == 32: ===================== Case A.1: src and dst are stacked registers: -------------------------------------------- 1. Load src_lo into tmp_lo 2. Store tmp_lo into dst_lo 3. Sign extend tmp_lo into tmp_hi 4. Store tmp_hi to dst_hi Example: r3 = (s32)r3 r3 is a stacked register ldr r6, [r11, #-16] // Load r3_lo into tmp_lo // str to dst_lo is not emitted because src_lo == dst_lo asr r7, r6, #31 // Sign extend tmp_lo into tmp_hi str r7, [r11, #-12] // Store tmp_hi into r3_hi Case A.2: src is stacked but dst is not: ---------------------------------------- 1. Load src_lo into dst_lo 2. Sign extend dst_lo into dst_hi Example: r6 = (s32)r3 r6 maps to {ARM_R5, ARM_R4} and r3 is stacked ldr r4, [r11, #-16] // Load r3_lo into r6_lo asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case A.3: src is not stacked but dst is stacked: ------------------------------------------------ 1. Store src_lo into dst_lo 2. Sign extend src_lo into tmp_hi 3. Store tmp_hi to dst_hi Example: r3 = (s32)r6 r3 is stacked and r6 maps to {ARM_R5, ARM_R4} str r4, [r11, #-16] // Store r6_lo to r3_lo asr r7, r4, #31 // Sign extend r6_lo into tmp_hi str r7, [r11, #-12] // Store tmp_hi to dest_hi Case A.4: Both src and dst are not stacked: ------------------------------------------- 1. Mov src_lo into dst_lo 2. Sign extend src_lo into dst_hi Example: (bf) r6 = (s32)r6 r6 maps to {ARM_R5, ARM_R4} // Mov not emitted because dst == src asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case B: offset != 32: ===================== Case B.1: src and dst are stacked registers: -------------------------------------------- 1. Load src_lo into tmp_lo 2. Sign extend tmp_lo according to offset. 3. Store tmp_lo into dst_lo 4. Sign extend tmp_lo into tmp_hi 5. Store tmp_hi to dst_hi Example: r9 = (s8)r3 r9 and r3 are both stacked registers ldr r6, [r11, #-16] // Load r3_lo into tmp_lo lsl r6, r6, #24 // Sign extend tmp_lo asr r6, r6, #24 // .. str r6, [r11, #-56] // Store tmp_lo to r9_lo asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi str r7, [r11, #-52] // Store tmp_hi to r9_hi Case B.2: src is stacked but dst is not: ---------------------------------------- 1. Load src_lo into dst_lo 2. Sign extend dst_lo according to offset. 3. Sign extend tmp_lo into dst_hi Example: r6 = (s8)r3 r6 maps to {ARM_R5, ARM_R4} and r3 is stacked ldr r4, [r11, #-16] // Load r3_lo to r6_lo lsl r4, r4, #24 // Sign extend r6_lo asr r4, r4, #24 // .. asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case B.3: src is not stacked but dst is stacked: ------------------------------------------------ 1. Sign extend src_lo into tmp_lo according to offset. 2. Store tmp_lo into dst_lo. 3. Sign extend src_lo into tmp_hi. 4. Store tmp_hi to dst_hi. Example: r3 = (s8)r1 r3 is stacked and r1 maps to {ARM_R3, ARM_R2} lsl r6, r2, #24 // Sign extend r1_lo to tmp_lo asr r6, r6, #24 // .. str r6, [r11, #-16] // Store tmp_lo to r3_lo asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi str r7, [r11, #-12] // Store tmp_hi to r3_hi Case B.4: Both src and dst are not stacked: ------------------------------------------- 1. Sign extend src_lo into dst_lo according to offset. 2. Sign extend dst_lo into dst_hi. Example: r6 = (s8)r1 r6 maps to {ARM_R5, ARM_R4} and r1 maps to {ARM_R3, ARM_R2} lsl r4, r2, #24 // Sign extend r1_lo to r6_lo asr r4, r4, #24 // .. asr r5, r4, #31 // Sign extend r6_lo to r6_hi Fixes: fc83265 ("arm32, bpf: add support for sign-extension mov instruction") Reported-by: syzbot+186522670e6722692d86@syzkaller.appspotmail.com Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@google.com Link: https://lore.kernel.org/bpf/20240419182832.27707-1-puranjay@kernel.org
Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee. However, when freplace prog has been attached and then updates to PROG_ARRAY map, it will panic, because the updating checks prog type of freplace prog and 'prog->aux->dst_prog' of freplace prog is NULL. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [kernel-patches#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic kernel-patches#31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Why 'prog->aux->dst_prog' of freplace prog is NULL? It causes by commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach"). As 'prog->aux->dst_prog' of freplace prog is set as NULL when attach, freplace prog does not have stable prog type. But when to update freplace prog to PROG_ARRAY map, it requires checking prog type. They are conflict in theory. This patch is unable to resolve this issue thoroughly. It resolves prog type of freplace prog by 'prog->aux->saved_dst_prog_type' to avoid panic. Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee. However, when freplace prog has been attached and then updates to PROG_ARRAY map, it will panic, because the updating checks prog type of freplace prog by 'prog->aux->dst_prog->type' and 'prog->aux->dst_prog' of freplace prog is NULL. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic #31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Why 'prog->aux->dst_prog' of freplace prog is NULL? It causes by commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach"). As 'prog->aux->dst_prog' of freplace prog is set as NULL when attach, freplace prog does not have stable prog type. But when to update freplace prog to PROG_ARRAY map, it requires checking prog type. They are conflict in theory. This patch is unable to resolve this issue thoroughly. It resolves prog type of freplace prog by 'prog->aux->saved_dst_prog_type' to avoid panic. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee. However, when freplace prog has been attached and then updates to PROG_ARRAY map, it will panic, because the updating checks prog type of freplace prog by 'prog->aux->dst_prog->type' and 'prog->aux->dst_prog' of freplace prog is NULL. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [kernel-patches#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic kernel-patches#31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Why 'prog->aux->dst_prog' of freplace prog is NULL? It causes by commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach"). As 'prog->aux->dst_prog' of freplace prog is set as NULL when attach, freplace prog does not have stable prog type. But when to update freplace prog to PROG_ARRAY map, it requires checking prog type. They are conflict in theory. This patch is unable to resolve this issue thoroughly. It resolves prog type of freplace prog by 'prog->aux->saved_dst_prog_type' to avoid panic. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee. However, when freplace prog has been attached and then updates to PROG_ARRAY map, it will panic, because the updating checks prog type of freplace prog by 'prog->aux->dst_prog->type' and 'prog->aux->dst_prog' of freplace prog is NULL. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [kernel-patches#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic kernel-patches#31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Why 'prog->aux->dst_prog' of freplace prog is NULL? It causes by commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach"). As 'prog->aux->dst_prog' of freplace prog is set as NULL when attach, freplace prog does not have stable dst_prog type. But when to update freplace prog to PROG_ARRAY map, it requires checking prog type. They are conflict in theory. This patch resolves prog type of freplace prog by 'prog->aux->saved_dst_prog_type' to avoid panic. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee. However, when freplace prog has been attached and then updates to PROG_ARRAY map, it will panic, because the updating checks prog type of freplace prog by 'prog->aux->dst_prog->type' and 'prog->aux->dst_prog' of freplace prog is NULL. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [kernel-patches#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic kernel-patches#31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Why 'prog->aux->dst_prog' of freplace prog is NULL? It causes by commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach"). As 'prog->aux->dst_prog' of freplace prog is set as NULL when attach, freplace prog does not have stable dst_prog type. But when to update freplace prog to PROG_ARRAY map, it requires checking prog type. They are conflict in theory. This patch resolves prog type of freplace prog by 'prog->aux->saved_dst_prog_type' to avoid panic. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
The commit f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed the following panic, which was caused by updating attached freplace prog to PROG_ARRAY map. But, it does not support updating attached freplace prog to PROG_ARRAY map. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [kernel-patches#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic kernel-patches#31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee of its target prog. And the commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL when attach freplace prog to its target. Then, as for following example: tailcall_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; count++; bpf_tail_call_static(skb, &jmp_table, 0); return ret; } SEC("freplace") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; return ret; } SEC("tc") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; And freplace entry prog's target is the tc subprog. After loading, the freplace jmp_table's owner type is BPF_PROG_TYPE_SCHED_CLS. Next, after attaching freplace prog to tc subprog, its prog->aux-> dst_prog is NULL. Next, when update freplace prog to jmp_table, bpf_prog_map_compatible() will return false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. With this patch, resolve_prog_type() return BPF_PROG_TYPE_SCHED_CLS to support updating attached freplace prog to PROG_ARRY map for this example. Fixes: f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Signed-off-by: Leon Hwang <hffilwlqm@gmail.com>
The commit f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed the following panic, which was caused by updating attached freplace prog to PROG_ARRAY map. But, it does not support updating attached freplace prog to PROG_ARRAY map. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [kernel-patches#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic kernel-patches#31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee of its target prog. And the commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL when attach freplace prog to its target. Then, as for following example: tailcall_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; count++; bpf_tail_call_static(skb, &jmp_table, 0); return ret; } SEC("freplace") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; return ret; } SEC("tc") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; And freplace entry prog's target is the tc subprog. After loading, the freplace jmp_table's owner type is BPF_PROG_TYPE_SCHED_CLS. Next, after attaching freplace prog to tc subprog, its prog->aux-> dst_prog is NULL. Next, when update freplace prog to jmp_table, bpf_prog_map_compatible() returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. With this patch, resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS to support updating attached freplace prog to PROG_ARRY map for this example. Fixes: f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
The commit f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed the following panic, which was caused by updating attached freplace prog to PROG_ARRAY map. But, it does not support updating attached freplace prog to PROG_ARRAY map. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic #31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee of its target prog. And the commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL when attach freplace prog to its target. Then, as for following example: tailcall_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; count++; bpf_tail_call_static(skb, &jmp_table, 0); return ret; } SEC("freplace") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; return ret; } SEC("tc") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; And freplace entry prog's target is the tc subprog. After loading, the freplace jmp_table's owner type is BPF_PROG_TYPE_SCHED_CLS. Next, after attaching freplace prog to tc subprog, its prog->aux-> dst_prog is NULL. Next, when update freplace prog to jmp_table, bpf_prog_map_compatible() returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. With this patch, resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS to support updating attached freplace prog to PROG_ARRY map for this example. Fixes: f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
The commit f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed the following panic, which was caused by updating attached freplace prog to PROG_ARRAY map. But, it does not support updating attached freplace prog to PROG_ARRAY map. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic #31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee of its target prog. And the commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL when attach freplace prog to its target. Then, as for following example: tailcall_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; count++; bpf_tail_call_static(skb, &jmp_table, 0); return ret; } SEC("freplace") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; return ret; } SEC("tc") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; And freplace entry prog's target is the tc subprog. After loading, the freplace jmp_table's owner type is BPF_PROG_TYPE_SCHED_CLS. Next, after attaching freplace prog to tc subprog, its prog->aux-> dst_prog is NULL. Next, when update freplace prog to jmp_table, bpf_prog_map_compatible() returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. With this patch, resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS to support updating attached freplace prog to PROG_ARRY map for this example. Fixes: f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
The commit f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed the following panic, which was caused by updating attached freplace prog to PROG_ARRAY map. But, it does not support updating attached freplace prog to PROG_ARRAY map. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic #31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee of its target prog. And the commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL when attach freplace prog to its target. Then, as for following example: tailcall_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; count++; bpf_tail_call_static(skb, &jmp_table, 0); return ret; } SEC("freplace") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; return ret; } SEC("tc") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; And freplace entry prog's target is the tc subprog. After loading, the freplace jmp_table's owner type is BPF_PROG_TYPE_SCHED_CLS. Next, after attaching freplace prog to tc subprog, its prog->aux-> dst_prog is NULL. Next, when update freplace prog to jmp_table, bpf_prog_map_compatible() returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. With this patch, resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS to support updating attached freplace prog to PROG_ARRY map for this example. Fixes: f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
The commit f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed the following panic, which was caused by updating attached freplace prog to PROG_ARRAY map. But, it does not support updating attached freplace prog to PROG_ARRAY map. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic #31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee of its target prog. And the commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL when attach freplace prog to its target. Then, as for following example: tailcall_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; count++; bpf_tail_call_static(skb, &jmp_table, 0); return ret; } SEC("freplace") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; return ret; } SEC("tc") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; And freplace entry prog's target is the tc subprog. After loading, the freplace jmp_table's owner type is BPF_PROG_TYPE_SCHED_CLS. Next, after attaching freplace prog to tc subprog, its prog->aux-> dst_prog is NULL. Next, when update freplace prog to jmp_table, bpf_prog_map_compatible() returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. With this patch, resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS to support updating attached freplace prog to PROG_ARRY map for this example. Fixes: f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
The commit f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed the following panic, which was caused by updating attached freplace prog to PROG_ARRAY map. But, it does not support updating attached freplace prog to PROG_ARRAY map. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic #31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee of its target prog. And the commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL when attach freplace prog to its target. Then, as for following example: tailcall_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; count++; bpf_tail_call_static(skb, &jmp_table, 0); return ret; } SEC("freplace") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; return ret; } SEC("tc") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; And freplace entry prog's target is the tc subprog. After loading, the freplace jmp_table's owner type is BPF_PROG_TYPE_SCHED_CLS. Next, after attaching freplace prog to tc subprog, its prog->aux-> dst_prog is NULL. Next, when update freplace prog to jmp_table, bpf_prog_map_compatible() returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. With this patch, resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS to support updating attached freplace prog to PROG_ARRY map for this example. Fixes: f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen <toke@redhat.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Pull request for series with
subject: s390/bpf: Fix multiple tail calls
version: 1
url: https://patchwork.ozlabs.org/project/netdev/list/?series=200680