Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

53 research inlined instance data #59

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
1a342bc
Update main.go
brigadier-general Jun 5, 2024
b298ea4
initial commit for inline func id changes
brigadier-general Jun 5, 2024
682a59f
initial commit for inline func changes
brigadier-general Jun 5, 2024
b4fc18f
Update main.go -- fixed parsing v1.20+ inlining
brigadier-general Jun 6, 2024
105e95b
Update symtab.go -- fixed inline Id logic for go v1.20+
brigadier-general Jun 6, 2024
7c3536d
fixing size for InlinedCall struct v1.16-1.18
brigadier-general Jun 6, 2024
7b21415
Update symtab.go. Func description comment fixed
brigadier-general Jun 6, 2024
7dcddfc
Create inlinedFunctions.md -- high-ish level write up on how we curre…
brigadier-general Jun 6, 2024
41e39ab
adding more references
brigadier-general Jun 6, 2024
911a044
added description of how we validate inline call data
brigadier-general Jun 6, 2024
35b5d92
Update inlinedFunctions.md
brigadier-general Jun 6, 2024
189e50e
adding exported Gofunc to all ModuleData definitions
brigadier-general Jun 7, 2024
f5dc310
assigning Gofunc
brigadier-general Jun 7, 2024
ad709c6
moving InlinedList to field in Func -- main.go
brigadier-general Jun 10, 2024
de0de0f
moving InlinedCall struct def
brigadier-general Jun 10, 2024
f767801
typo fix
brigadier-general Jun 10, 2024
a116cfd
adding Gofunc var in main
brigadier-general Jun 10, 2024
1309938
adding Gofunc field to Pclntabcandidate struct and as param to New ta…
brigadier-general Jun 10, 2024
e11a030
typo fix
brigadier-general Jun 10, 2024
cf6ae24
Update pclntab.go
brigadier-general Jun 24, 2024
015e848
Update symtab.go
brigadier-general Jun 24, 2024
46c3b58
Update objfile.go
brigadier-general Jun 24, 2024
52b00a9
Update disasm.go
brigadier-general Jun 24, 2024
39b22a5
Update main.go
brigadier-general Jun 24, 2024
86761c0
Update pclntab.go -- copying inline data to InlinedList correctly
brigadier-general Jun 25, 2024
190dbd8
Update symtab.go -- correctly adding inlined data to list
brigadier-general Jun 25, 2024
1018a1f
Update objfile.go -- adding inlined data to candidate Func list corre…
brigadier-general Jun 25, 2024
6c347c1
Update main.go - removing excess print statements
brigadier-general Jun 25, 2024
31e5ef1
Update symtab.go - only calculating offset into runtime.__func.funcda…
brigadier-general Jun 26, 2024
5e05d76
Update objfile.go - OOB indexing fix
brigadier-general Jun 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 15 additions & 5 deletions debug/gosym/pclntab.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ func (v version) String() string {
type LineTable struct {
Data []byte
PC uint64
GofuncVA uint64
Line int

// This mutex is used to keep parsing of pclntab synchronous.
Expand Down Expand Up @@ -172,8 +173,8 @@ func (t *LineTable) LineToPC(line int, maxpc uint64) uint64 {
// corresponding to the encoded data.
// Text must be the start address of the
// corresponding text segment.
func NewLineTable(data []byte, text uint64) *LineTable {
return &LineTable{Data: data, PC: text, Line: 0, funcNames: make(map[uint32]string), strings: make(map[uint32]string)}
func NewLineTable(data []byte, text uint64, gofunc uint64) *LineTable {
return &LineTable{Data: data, PC: text, GofuncVA: gofunc, Line: 0, funcNames: make(map[uint32]string), strings: make(map[uint32]string)}
}

// Go 1.2 symbol table format.
Expand Down Expand Up @@ -366,6 +367,7 @@ func (t *LineTable) go12Funcs() []Func {
info := t.funcData(uint32(i))
f.LineTable = t
f.FrameSize = int(info.deferreturn())
f.FuncData = info
syms[i] = Sym{
Value: f.Entry,
Type: 'T',
Expand Down Expand Up @@ -515,13 +517,15 @@ func (f funcData) nameoff() uint32 { return f.field(1) }
func (f funcData) deferreturn() uint32 { return f.field(3) }
func (f funcData) pcfile() uint32 { return f.field(5) }
func (f funcData) pcln() uint32 { return f.field(6) }
func (f funcData) Num_pcdata() uint32 { return f.field(7) }
func (f funcData) cuOffset() uint32 { return f.field(8) }
func (f funcData) Num_funcdata() uint32 { return f.field(10) }

// field returns the nth field of the _func struct.
// It panics if n == 0 or n > 9; for n == 0, call f.entryPC.
// Most callers should use a named field accessor (just above).
func (f funcData) field(n uint32) uint32 {
if n == 0 || n > 9 {
if n == 0 || n > 10 {
panic("bad funcdata field")
}
// In Go 1.18, the first field of _func changed
Expand All @@ -531,8 +535,14 @@ func (f funcData) field(n uint32) uint32 {
sz0 = 4
}
off := sz0 + (n-1)*4 // subsequent fields are 4 bytes each
data := f.data[off:]
return f.t.Binary.Uint32(data)

if n == 10 { // except for the last 4 fields which are 1 byte each
off = off + 3 // we want the last byte
return uint32(f.data[off])
} else {
data := f.data[off:]
return f.t.Binary.Uint32(data)
}
}

// step advances to the next pc, value pair in the encoded table.
Expand Down
229 changes: 223 additions & 6 deletions debug/gosym/symtab.go
Original file line number Diff line number Diff line change
Expand Up @@ -128,16 +128,233 @@ func (s *Sym) BaseName() string {
return s.Name
}

// go v1.16-v1.18
type inlinedCall_v116 struct {
parent int16
funcId uint8
_pad uint8
file int32
line int32
func_ int32
parentPc int32
}

// go v.1.20+
type inlinedCall_v120 struct {
funcId uint8
_pad [3]uint8
nameOff int32
parentPc int32
startLine int32
}

const (
MAX_TREE_SIZE = 4096
size_inlinedCall_v116 = 20
size_inlinedCall_v120 = 16
FUNCID_MAX = 22 // funcID maximum value
)

// An InlinedCall collects information about a function that has been inlined as well as its parent
type InlinedCall struct {
Funcname string
ParentName string
CallingPc uint64
ParentEntry uint64
Data []byte
}

// A Func collects information about a single function.
type Func struct {
Entry uint64
*Sym
End uint64
Params []*Sym // nil for Go 1.3 and later binaries
Locals []*Sym // nil for Go 1.3 and later binaries
FrameSize int
LineTable *LineTable
Obj *Obj
End uint64
Params []*Sym // nil for Go 1.3 and later binaries
Locals []*Sym // nil for Go 1.3 and later binaries
FrameSize int
LineTable *LineTable
FuncData funcData
InlinedList []InlinedCall
Obj *Obj
}

const (
PCDATA_InlTreeIndex = 2
FUNCDATA_InlTree = 3
)

func (f *Func) HasInline() uint32 {
npcdata := int(f.FuncData.Num_pcdata())
nfuncdata := int(f.FuncData.Num_funcdata())

// check the relevant index exists
if nfuncdata <= FUNCDATA_InlTree {
return 0xffff
}

// get the size of runtime_func actual fields
sz0 := f.LineTable.Ptrsize
if f.LineTable.Version >= ver118 {
sz0 = 4
}

// calculate where funcdata array begins
func_hdr_size := int(sz0) + (4 * 10) // sz of first elt + size of remaining elts
pcdata_size := 4 * npcdata // elts in pcdata[npcdata] are 4 bytes each
funcdata_offset := func_hdr_size + pcdata_size

// isolate the funcdata array
funcdata_size := 4 * nfuncdata
funcdata_raw := f.FuncData.data[funcdata_offset:(funcdata_offset + funcdata_size)]
if len(funcdata_raw) != funcdata_size {
fmt.Printf("wanted %d bytes for uint32_t funcdata[nfuncdata], got %d\n", funcdata_size, len(funcdata_raw))
return 0xffff
}

// get the actual inline data value
funcdata_InlTree := f.LineTable.Binary.Uint32(funcdata_raw[4*FUNCDATA_InlTree:])

// check if the value is valid
if funcdata_InlTree == ^uint32(0) {
return 0xffff
}

return funcdata_InlTree
}

func isValidFuncID(data []byte) bool {

// TODO -- currently only accepts "FuncIDNormal"
// We may want to include other types.
if data[0] != 0 {
return false
}

i := 1
for i < 4 {
if data[i] != 0 {
return false
}
i += 1
}

return true
}

// validate that calling PC falls within calling function
func isValidPC(data []byte, f *Func) (bool, int32) {
var pc int32
var pc_address uint64

// convert bytes to int32
// TODO -- see isValidFuncName()
err := binary.Read(bytes.NewReader(data), binary.LittleEndian, &pc)
if err != nil {
fmt.Println(err)
return false, -1
}
pc_address = uint64(pc) + f.Entry
if (pc_address <= f.End) && (pc_address >= f.Entry) {
return true, pc
}

return false, -1
}

// TODO -- pull out binary converter to its own func for reuse
// TODO -- check for little vs big endian
func isValidFuncName(data []byte, f *Func) (bool, string) {
var nameOff int32

err := binary.Read(bytes.NewReader(data), binary.LittleEndian, &nameOff)
if err != nil {
fmt.Println(err)
return false, ""
}

// check that name offset falls within func name table boundaries
funcNameTable := f.LineTable.funcnametab
if nameOff < int32(len(funcNameTable)) {
i := nameOff
for i < int32(len(funcNameTable)) {
// get str len by iterating until we hit a null byte
if funcNameTable[i] == '\000' {
break
}
i += 1
}

name := string(funcNameTable[nameOff:i])
return true, name
}
return false, ""
}

func (f *Func) iterateInline_v116(tree []byte) []InlinedCall {
var inlineList []InlinedCall
fmt.Println("\tinside version116. BAD.")
return inlineList
}

func (f *Func) iterateInline_v120(tree []byte) []InlinedCall {
var inlineList []InlinedCall

// check there are enough bytes for an inlinedCall struct
off := 0
// iterate until we hit invalid data
// that indicates we've read this function's entire inline tree
for len(tree)-off >= size_inlinedCall_v120 {
// get elt bytes
elt_raw := tree[off : off+size_inlinedCall_v120]

// verify funcId and padding look normal
if !isValidFuncID(elt_raw[:4]) {
break
}
// verify calling PC exists within parent func bounds
is_valid_pc, pc := isValidPC(elt_raw[8:12], f)
if !is_valid_pc {
break
}
// resolve name
is_valid_fname, fname := isValidFuncName(elt_raw[4:8], f)
if !is_valid_fname {
break
}
// create InlinedCall object
inlineList = append(inlineList, InlinedCall{
Funcname: fname,
ParentName: f.Name,
CallingPc: uint64(pc),
ParentEntry: f.Entry,
Data: elt_raw,
})
// add obj to InlineList
off = off + size_inlinedCall_v120
}
return inlineList
}

// return array of inlined functions inside f or nil
func (f *Func) GetInlinedCalls(data []byte) {
var inlList []InlinedCall

// get size of inlined struct based on version
if f.LineTable.Version >= ver118 {
inlList = f.iterateInline_v120(data)
} else {
inlList = f.iterateInline_v116(data)
}

for _, elt := range inlList {
f.InlinedList = append(f.InlinedList, InlinedCall{
Funcname: elt.Funcname,
ParentName: elt.ParentName,
CallingPc: elt.CallingPc,
ParentEntry: elt.ParentEntry,
Data: elt.Data,
})
}
}

// An Obj represents a collection of functions in a symbol table.
Expand Down
94 changes: 94 additions & 0 deletions doc/inlinedFunctions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Inlined Function Identification

This file describes the process of identifying functions that have been inlined by the Go compiler.
It also details how this process is implemented in GoReSym and lists known TODOs.

## Finding []runtime__inlinedCall

These steps calculate where to find the inline tree for a function `f` .
The inline tree holds information about any and all functions that were inlined into `f` by the Go compiler.
The documentation implies that each function that contains inlined functions will have its own distinct tree.

1. Choose a function `f`
2. Get `funcData` for `f`.
3. Check whether `funcData.funcdata[FUNCDATA_InlTree]` exists (want `funcData.nfuncdata` >= `FUNCDATA_InlTree`)
4. Check whether `funcData.funcdata[FUNCDATA_InlTree]` is valid (want `funcData.funcdata[FUNCDATA_InlTree]` != `^uint32(0)`)
5. Save `funcData.funcdata[FUNCDATA_InlTree]` -- this is the inline tree offset for `f`
6. Get `go:func.*` via `moduledata`. (there are other ways but this is the least complicated)
7. Adjust `go:func.*` from absolute address to file offset by subtracting the preferred base address (in file header). `go:func.* -= baseAddress`
9. Go to inline tree. `InlineTreeAddress` = `go:func.*` + `funcData.funcdata[FUNCDATA_InlTree]`. This is an offset relative to the start of the binary file because we adjusted `go:func.*` in step 7 above.

*NOTE: the inline tree and `go:func.*` addresses may be earlier in the binary than `pclntab`*
Therefore whatever component resolves inline functions MUST have access to the full file.

## Validating inline tree entries

We iterate over the file bytes from the `InlineTreeAddress` (N.B. file bytes in toto, not bytes in `pclntab`). For each iteration, we grab enough bytes to fill a single `runtime__inlinedCall` instance. Validate its fields. If any validation check fails or there are not enough bytes to fill the struct, assume that we have reached the end of the tree. Break. Return results.

```
Start at InlineTreeAddress.
While there are at least sizeof(runtime__inlinedCall) bytes not yet checked:
Get sizeof(runtime__inlinedCall) bytes as potentialCall
Check potentialCall.funcID
- funcID must be 0 (i.e. "normal")
- the subsequent padding bytes must also be 0 (number of pad bytes depends on Go version)
- if these fail, break
Check potentialCall.parentPc
- get parentFunction.Entry (aka start offset) (we have this data in the funcData used to locate the inline tree)
- get parentFunction.End (aka end offset)
- potentialCall.parentPc + parentFunction.Entry must be less than parentFunction.End
- if parentPc falls beyond the end of parentFunction, break
Check potentialCall.name (the field name varies between Go versions)
- get pcHeader.funcNameOffset
- get size in bytes of function name table
- pcHeader.funcNameOffset + potentialCall.name must fall within bounds of the function name table
- [NOT IMPLEMENTED] first char must be ASCII, previous char should be 0 (null-terminator)
- if name offset is invalid, break
Save data from the runtime__inlinedCall struct into a version-agnostic InlinedCall object
Add the new InlinedCall object to array of found InlinedCall objects
Move forward by sizeof(runtime_inlinedCall)

Return array of InlinedCall objects

```

## Known issues

### Not implemented yet for Go v1.11-1.18

### Only tested on ELF format

### Saving inlined function names

Currently we manually calculate the size of the string to creat the slice. There's gotta be a better way to save the string and probably some extra validation we could to make sure that the func name offset points to the start of a string.

### Only processing inlined calls where funcID==0 (normal function)

Haven't found a great description of the funcID types. We may want to include more than just normal functions. We might also find that each inline tree ends with inline info for a type. If we can find a description of the inline tree or trees section as a whole, then we might be able to use this as a pattern to separate where inline trees begin/end.

### How to calculate size of inline tree for a given `f`.

Right now we start at a function's inline tree base and
process inline call data until we hit invalid data. If two inline trees for functions `f` and `j` are next to each other
with no buffer then `j`'s tree will be mistakenly included in `f`'s tree. The functions that were inlined into `j`
will be listed twice as inlined inside `f` as well as `j`.

Another heuristic to help could be checking the tree bases for all functions with inline data and stopping the inline struct iteration when we reach another function's tree base.

Haven't found any overview of an "inlined data section". Either finding one or walking through the compiler steps to build one would be useful.

### Using pcdata

We don't currently use `funcData.pcdata[PCDATA_InlineTreeIndex]`.
Use `funcData.pcdata[PCDATA_InlineTreeIndex]` + `pcHeader.pctabOffset` to go to relevant offset in `pcdata`.
(N.B. Some docs call `pcdata` the `pctab`. These are distinct from `pclntab`.)
The inline tree index could be used to check whether any given PC in a function kicks off inlined function instructions.
Since we want every inlined function and are not iterating over every PC in every function, we're not currently using this. HOWEVER.
This info might be helpful in determining how many functions were inlined into `f`. We would then be able to separate inline trees.


## References

* [pclntab structs reference](https://github.com/elastic/otel-profiling-agent/blob/main/docs/gopclntab.md)
* [adding inline functions for golang debugger](https://developers.redhat.com/articles/2024/04/03/how-add-debug-support-go-stripped-binaries)
* [how and why inlining with source examples](https://dave.cheney.net/2020/04/25/inlining-optimisations-in-go)
Loading
Loading