small speedups during elf loading #1207

paulcacheux · 2023-11-07T10:00:18Z

This PR implements two speedups/memory usage improvements in the elf loading path.

The first one is about preallocating the instructions buffer, based on the size provided by the elf section header. Due to double width load instruction we might over allocate by a bit, but from testing on our ebpf code this over allocation is negligible compared to the pre-allocation done by slice.append anyway, and the win is significant in case of large instructions buffer.

The second one implements a fast path for splitSymbols in the case where the instructions are composed of only one symbol, which is the dominant use case from my understanding (happy to be challenged on that).

I implemented a small benchmark depicting some instructions on both ends of the spectrum (one symbol, and 2 symbols at both ends of the instructions stream).

Please let me know if you would like separate PRs for each of those.

Thanks

elf_reader.go

lmb · 2023-11-07T14:30:26Z

linker.go

+	}
+	if onlyOneSymbol {
+		return map[string]asm.Instructions{
+			firstSymbol: insns,


Currently splitSymbols always returns a copy of insns. The new code will sometimes return a copy, sometimes not, which is a great bug magnet IMO.

Have you tried instead to optimize the loop below? I can see two big inefficiencies:

append will do extra copies

map lookup and assigns are also expensive

Getting rid of the two should make this a lot better without changing the alloc behaviour. You can probably track last name + index and do an exact allocation + copy whenever name changes.

That's a good point, I checked that we were not modifying the result of this function but this can change in the future.

I think the main point is that we split into a map and then iterate over the map to create another array (in the only place this function is called). Would you be open to removing the map completely and replacing it with an iterator so that we skip the intermediate allocation completely ?

I don't follow, sorry. You want to inline splitSymbols into loadFunctions? loadFunctions into loadProgramSections?

Basically splitSymbols returns a map, but we never really take advantage of the fact that it's a map (except in tests), we directly iterate on it and create something else from it. What I was suggesting is that splitSymbols returns a lazy iterator type that would allow iteration without creating a big map with all the symbols and instructions.

I don't think it matters that much, in the latest version of the commit I improved the algorithm a lot in both the fast and slow paths.

linker_test.go

linker.go

Reuse the existing manyprogs.c test to exercise the ELF loader. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

The elf section can provide a good approximation of the amount of instruction we will unmarshal. goos: linux goarch: amd64 pkg: github.com/cilium/ebpf cpu: 12th Gen Intel(R) Core(TM) i7-1260P │ base.txt │ new.txt │ │ sec/op │ sec/op vs base │ LoadCollectionManyProgs 1.301m ± 2% 1.263m ± 0% -2.89% (p=0.000 n=10) │ base.txt │ new.txt │ │ B/op │ B/op vs base │ LoadCollectionManyProgs 826.5Ki ± 0% 767.2Ki ± 0% -7.18% (p=0.000 n=10) │ base.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ LoadCollectionManyProgs 6.932k ± 0% 6.752k ± 0% -2.60% (p=0.000 n=10) Signed-off-by: Paul Cacheux <paul.cacheux@datadoghq.com>

Make the case of a single program per section faster by removing map accesses and repeated copying via append. goos: linux goarch: amd64 pkg: github.com/cilium/ebpf cpu: 12th Gen Intel(R) Core(TM) i7-1260P │ base.txt │ new.txt │ │ sec/op │ sec/op vs base │ LoadCollectionManyProgs 1.263m ± 0% 1.173m ± 0% -7.13% (p=0.000 n=10) │ base.txt │ new.txt │ │ B/op │ B/op vs base │ LoadCollectionManyProgs 767.2Ki ± 0% 707.9Ki ± 0% -7.73% (p=0.000 n=10) │ base.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ LoadCollectionManyProgs 6.752k ± 0% 6.572k ± 0% -2.67% (p=0.000 n=10) Signed-off-by: Paul Cacheux <paul.cacheux@datadoghq.com> Co-developed-by: Lorenz Bauer <lmb@isovalent.com>

goos: linux goarch: amd64 pkg: github.com/cilium/ebpf cpu: 12th Gen Intel(R) Core(TM) i7-1260P │ base.txt │ new.txt │ │ sec/op │ sec/op vs base │ LoadCollectionManyProgs 1.173m ± 0% 1.166m ± 2% ~ (p=0.481 n=10) │ base.txt │ new.txt │ │ B/op │ B/op vs base │ LoadCollectionManyProgs 707.9Ki ± 0% 699.0Ki ± 0% -1.25% (p=0.000 n=10) │ base.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ LoadCollectionManyProgs 6.572k ± 0% 6.452k ± 0% -1.83% (p=0.000 n=10) Signed-off-by: Paul Cacheux <paul.cacheux@datadoghq.com>

goos: linux goarch: amd64 pkg: github.com/cilium/ebpf cpu: 12th Gen Intel(R) Core(TM) i7-1260P │ base.txt │ new.txt │ │ sec/op │ sec/op vs base │ LoadCollectionManyProgs 1.166m ± 2% 1.145m ± 1% -1.84% (p=0.000 n=10) │ base.txt │ new.txt │ │ B/op │ B/op vs base │ LoadCollectionManyProgs 699.0Ki ± 0% 693.3Ki ± 0% -0.81% (p=0.000 n=10) │ base.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ LoadCollectionManyProgs 6.452k ± 0% 6.092k ± 0% -5.58% (p=0.000 n=10) Signed-off-by: Paul Cacheux <paul.cacheux@datadoghq.com>

lmb

Ran the new benchmark and added the results to the commit messages.

lmb · 2023-11-08T17:29:40Z

Thanks!

paulcacheux force-pushed the paulcacheux/elf-read-speedup branch from e463916 to 4acb87b Compare November 7, 2023 10:02

paulcacheux changed the title ~~small speedups when during elf loading~~ small speedups during elf loading Nov 7, 2023

paulcacheux force-pushed the paulcacheux/elf-read-speedup branch from 4acb87b to 0e174cf Compare November 7, 2023 10:25

paulcacheux marked this pull request as ready for review November 7, 2023 10:29

lmb requested changes Nov 7, 2023

View reviewed changes

paulcacheux force-pushed the paulcacheux/elf-read-speedup branch from 06cbbf4 to dd7b16d Compare November 7, 2023 16:42

lmb reviewed Nov 7, 2023

View reviewed changes

linker_test.go Outdated Show resolved Hide resolved

linker.go Outdated Show resolved Hide resolved

paulcacheux force-pushed the paulcacheux/elf-read-speedup branch from dd7b16d to b815e6a Compare November 8, 2023 12:26

lmb and others added 5 commits November 8, 2023 16:23

collection: benchmark parsing an ELF with many programs

835f9a1

Reuse the existing manyprogs.c test to exercise the ELF loader. Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

lmb force-pushed the paulcacheux/elf-read-speedup branch from b815e6a to d511049 Compare November 8, 2023 17:17

lmb approved these changes Nov 8, 2023

View reviewed changes

lmb merged commit 1027911 into cilium:main Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

small speedups during elf loading #1207

small speedups during elf loading #1207

paulcacheux commented Nov 7, 2023 •

edited

Loading

lmb Nov 7, 2023

paulcacheux Nov 7, 2023

lmb Nov 7, 2023

paulcacheux Nov 7, 2023

lmb left a comment

lmb commented Nov 8, 2023

small speedups during elf loading #1207

small speedups during elf loading #1207

Conversation

paulcacheux commented Nov 7, 2023 • edited Loading

lmb Nov 7, 2023

Choose a reason for hiding this comment

paulcacheux Nov 7, 2023

Choose a reason for hiding this comment

lmb Nov 7, 2023

Choose a reason for hiding this comment

paulcacheux Nov 7, 2023

Choose a reason for hiding this comment

lmb left a comment

Choose a reason for hiding this comment

lmb commented Nov 8, 2023

paulcacheux commented Nov 7, 2023 •

edited

Loading