-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kallsyms: change Modules caching strategy, cache address lookups #1590
Conversation
Since this function allows lookups to be performed in batches, caching lookup results becomes interesting, without having to resort to caching all of kallsyms. Signed-off-by: Timo Beckers <timo@isovalent.com>
} | ||
} | ||
if len(mods) != 0 { | ||
if err := kallsyms.AssignModules(mods); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this branch locally and it fails for me with an ambiguous error for kernel module load_elf_phdrs
. If I understood the old code correctly, it didn't care about duplicate symbols in /proc/kallsyms
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, it would ignore kallsyms entries without modules, which is not correct. If a symbol is ambiguous between the core kernel and a module, we should not attach to it. The old code might've picked the address of the base symbol to attach to, but provided the BTF of the symbol in the kernel module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickpichler Is this a breaking change for you, or were you just testing out ambiguous symbol handling during kmod attach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a breaking change, but I guess it was not right before due to ambiguity and anyway would have been a bug 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if you'd actually make use of CO:RE things, build your BPF C program against headers of one kernel, deploy your app to another host, then be unlucky enough for some of the fields to have changed between versions, you would've likely hit a verifier error at runtime.
'Unfortunately', in most cases like load_elf_phdrs
, it's actually the same function with the same signature and underlying types that gets included into multiple compilation units (by way of including a common header) and then gets merged into the same binary during linking of the kernel binary. Things would've kept working fine in your case since it's likely an exact duplicate, but you would run into funky cases where some code paths in the kernel invoke one version of the symbol, while other paths invoke the other, meaning you'd miss events since you're only attaching to one of the symbols.
Let's see if any other skeletons fall out of the ceiling when we ship this in a release, but this has been a bugbear for bpf since the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all makes sense. Very nice
@ti-mo Thanks, I tested this PR in Inspektor Gadget, and it uses between 8 and 10MB less memory than before. |
Renamed SymbolModule to Module and introduced AssignModules, a batch version of Module similar to the existing AssignAddresses. Along with this change, ambiguous symbols (with non-unique names) are now rejected to avoid sending the wrong BTF ID for a symbol for the kernel to use during verification. Similar to AssignAddresses, this API now only caches requested symbols instead of all of kallsyms. Some users found the permanent memory allocation burdensome and relied on flushing them after loading the programs they needed. btf.FlushKernelSpec now no longer calls FlushKernelModuleCache, and the latter has been removed due to the significantly lowered memory consumption. Signed-off-by: Timo Beckers <timo@isovalent.com>
The original implementation opened /proc/kallsyms and resolved all symbols during CollectionSpec loading. This creates issues like bpf2go not being able to generate its output when run unprivileged, since it can't properly resolve ksym addresses. /proc/kallsyms addresses are zero in this case. Also, some users load CollectionSpecs in non-linux environments like wasm, so we need the spec loading to remain hermetic. Probably a good idea to add some tests for this at some point, though not clear how at this time. Signed-off-by: Timo Beckers <timo@isovalent.com>
As mentioned in the code comments, scan all programs in the CollectionSpec to allow batching together kallsyms lookups, avoiding repeated scanning of /proc/kallsyms. Signed-off-by: Timo Beckers <timo@isovalent.com>
daf2425
to
2e21f0b
Compare
See individual commits for context.
At a high level:
newProgramWithOptions
onwards