-
Notifications
You must be signed in to change notification settings - Fork 350
Support for dynamically instrumenting hotpachable functions
Orbit allows instrumenting functions in the target binary dynamically, i.e. one choses a function from the debug symbols and Orbit modifies the binary such that every call to that function produces a time interval in the capture timeline.
The way this works is that we overwrite the first five bytes of the function with a jump into some carefully crafted code that backs up the current state of the calling thread, emits the tracing information and then continues the execution. This works fine in most cases but there are some challenging situations e.g. there can be a jump to an address in the first five bytes somewhere in the instrumented function or the function can be shorter than five bytes or the first instruction can be a call to another function.
Compilers offer options to make dynamic instrumentation easier. The way this works is that the compiler adds a few bytes of padding in front of every function and a two byte nop instruction at the function entry. One can then overwrite these first two bytes with a “jmp -7” which ends up in the padded bytes before the function. These padded bytes are overwritten with the actual jump to a 32 bit offset that leads to the instrumentation code. Details vary a bit from compiler to compiler - see the sections “Support in X” below.
Up until now we ignore whether or not the target binary was compiled with support for “hotpatchable functions” (clang and gcc call them “patchable functions”, the msvc documentation uses the term “hotpatchable image”). The purpose of this document is to suggest a way to utilise the support from the compiler in case the binary was compiled with the relevant options.
In order to represent functions in Orbit we read information from various different inputs:
Elf files: debug symbols, dynsym, eh or debug frame entries. Coff files: coff symbol table, dwarf, export table, exception table Pdb files parsed with dia: SymTagFunction, SymTagPublicSymbol Pdb files parsed with llvm: ProcSym, public symbols
This is less colourful than it looks at first sight: Basically we either parse debug info or exported functions or something derived from unwinding info. In the end all of these sources produce a ModuleSymbols which is a repeated SymbolInfo’s. We add a bool is_hotpatchable to SymbolInfo.
For now we only support elf binaries (created by clang or gcc). As shown below both compilers produce the same __patchable_function_entries section in the elf file. If this section is present and a given symbol is listed there we mark it as hotpatchable otherwise not. Specifically we mark windows code as not hotpatchable although msvc supports that.
In ModuleData::AddSymbolsInternal the SymbolInfo is translated into a FunctionInfo (src/ClientData/include/ClientData/FunctionInfo.h). So FunctionInfo get an additional field ”bool is_hotpatchable_;”
In src/CaptureClient/CaptureClient.cpp ToGrpcCaptureOptions translates the FunctionInfo to a InstrumentedFunction (src/GrpcProtos/capture.proto) which also gets the is_hotpatchable field. With this the information is available in the service. Specially InstrumentedProcess::InstrumentFunctions has this information (since it gets the InstrumentedFunction’s in the capture options).
The above only considered whether or not the binary was compiled with hotpatchable function support. But besides that the compilers offer an option to adjust the size of the padding and - in case of clang or gcc - also an option to adjust the size of the nop at the start. So a user could compile a binary with a padding that is too short or a nop that is not exactly two bytes in size. Both would lead to crashes if we ignore it.
We should either
- add a capture option to explicitly activate the usage of hotpatchable function support. The explanation for the option should suggest the “-fpatchable-function-entry=7,5” parameter explicitly. This comes with the complication that subsequent captures could use different flavours of instrumentation and therefore the trampolines cannot be reused (putting it another way: we would need a hotpachable trampoline and a regular trampoline).
or
- detect the size of the padding and the nop and disable hotpatchable function support if the binary code looks unexpected.
or both of the above.
Only implementing the second option seems most straightforward. There is no reason for switching off the hotpatchable support if the binary is compiled with appropriate parameters.
Some changes are needed in CreateTrampoline:
- CreateTrampoline needs to know about is_hotpatchable
- CheckForRelativeJumpIntoFirstFiveBytes is not required if is_hotpatchable
- AppendRelocatedPrologueCode can be skipped - there is no need to relocate a nop
- the address to jump back to after the trampolin has completed execution is always function_address+2
InstrumentFunction needs to do something different.
UninstrumentFunctions works fine since it just restores the first 20 bytes of the function (so this works fine for the 2 byte relative 8 bit jump as well)
https://github.com/google/orbit/pull/4497 implements the things described here for Linux binaries produced by clang or gcc.
The test for the correct parameter setting as described here is missing.
The information whether a function is hotpatchable is not yet used in dynamic instrumentation (as outlined in here).
The different compilers handle hotpatchable functions slightly different. The sections below
https://clang.llvm.org/docs/AttributeReference.html#patchable-function-entry
#include <iostream>
int fun(int x) {
return 2 * x;
}
int main(int argc, char **argv) {
int x = 42;
std::cout << fun(x) << "\n";
return 0;
}
> clang++ main.cc -std=c++17 -fpatchable-function-entry=7,5
And then have a look at the binary
> objdump -D a.out
00000000000011c0 <frame_dummy>:
11c0: f3 0f 1e fa endbr64
11c4: e9 77 ff ff ff jmp 1140 <register_tm_clones>
11c9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
11d0: 90 nop
11d1: 90 nop
11d2: 90 nop
11d3: 90 nop
11d4: 90 nop
00000000000011d5 <_Z3funi>:
11d5: 66 90 xchg %ax,%ax
11d7: 55 push %rbp
11d8: 48 89 e5 mov %rsp,%rbp
There are 7 bytes of NOP’s (11d0 - 11d6). The function starts after the 5th byte. So there is a 7-5==2 byte NOP at the function entry.
> readelf -t a.out
Section Headers:
[Nr] Name
Type Address Offset Link
Size EntSize Info Align
Flags
…
[27] __patchable_function_entries
PROGBITS 0000000000004030 0000000000003030 16
0000000000000020 0000000000000000 0 8
[0000000000000083]: WRITE, ALLOC, LINK ORDER
…
> readelf --hex-dump=27 a.out
Hex dump of section '__patchable_function_entries':
0x00004030 80100000 00000000 c0100000 00000000 ................
0x00004040 d0110000 00000000 f0110000 00000000 ................
These are just the addresses of the patchable functions. The address given is the start of the first NOP (in this example 5 bytes before the function entry point).
Code as above. Compile with
> g++ main.cc -g -std=c++17 -fpatchable-function-entry=7,5
And then have a look at the binary
> objdump -D a.out
0000000000001160 <frame_dummy>:
1160: f3 0f 1e fa endbr64
1164: e9 77 ff ff ff jmp 10e0 <register_tm_clones>
1169: 90 nop
116a: 90 nop
116b: 90 nop
116c: 90 nop
116d: 90 nop
000000000000116e <_Z3funi>:
116e: 90 nop
116f: 90 nop
1170: 55 push %rbp
1171: 48 89 e5 mov %rsp,%rbp
As with clang above there are 7 bytes of NOP’s (1169 - 116f). The function starts after the 5th byte. The two bytes of nops are individual instructions here (using g++-12) whereas clang above produced a single instruction of length two.
> readelf -t a.out
[26] __patchable_function_entries
PROGBITS 0000000000004030 0000000000003030 15
0000000000000020 0000000000000000 0 8
[0000000000000083]: WRITE, ALLOC, LINK ORDER
> readelf --hex-dump=26 a.out
Hex dump of section '__patchable_function_entries':
0x00004030 69110000 00000000 7e110000 00000000 i.......~.......
0x00004040 d2110000 00000000 2b120000 00000000 ........+.......
So the __patchable_function_entries section looks exactly like the one produced by clang.
For x64 the first instruction after the function entry is at least two bytes long (so we are guaranteed that we only need to relocate one instruction). The padding can be adjusted (and defaults to six bytes for x64; would be interesting to know why - we only need five?!).
Looks like msvc inserts a 0xcc’s as a padding (at least six but rounds up so the function entry is on a multiple of 16 - this is just what I guess from looking at examples). As pointed out in the documentation - there is no nop at the beginning of the function; we need to relocate the first instruction (and only this one because it is at least two bytes long).
It's unclear to me which functions are hotpatchable. Maybe all of them? Dumpbin does not show any section as they are present in elf binaries.