Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

renovates the LLVM backend #1187

Merged

Conversation

ivg
Copy link
Member

@ivg ivg commented Jul 23, 2020

renovates the LLVM backend

Droping support of old LLVM and legacy backends

We drop a lot of old code (minus 3k lines of code) thus removing the support
burden and making it easier to maintain, fix, and upgrade the code.

Fixes #1166

Simplifies the implementation

The remaining code base is significanly simplified. We dropped the
separation between relocatable and non-relocatable files, removed any
transformations of addresses from the LLVM backend (we now emit
absolute virtual addresses). The whole logic of transforming from the
llvm view to the bap image view now fits into a hundred lines of
code (instead of hundreds lines spread across 16 files as it was
before).

Fixes #1183
Fixes #1189

Produces more information

The relocation information is now emitted for all files (not only for
relocatable). Also, removes tons of checks that were preventing our
backends from emitting valuable symbolic information.

Paves the road to #1135 and #1161

@XVilka
Copy link
Contributor

XVilka commented Jul 24, 2020

@ivg ivg force-pushed the rectifies-base-address-computation branch from edb5ca6 to 49f3d08 Compare July 24, 2020 11:59
@gitoleg
Copy link
Contributor

gitoleg commented Jul 24, 2020

Speaking about relocatable files, I didn't mean just files with relocations, but files that contain a code that will be relocated, i.e. code that will be linked at some different address.
https://refspecs.linuxbase.org/elf/gabi4+/ch4.intro.html

Such files can contain symbols with no address assigned. For example, in the Elf_sym struct, the st_value field contains symbol address for the executable files but can contain any other value, depending on the file type or even ABI, and we shouldn't try to use this field as an address.

https://docs.oracle.com/cd/E19683-01/816-1386/6m7qcoblj/index.html#chapter6-35166

Also, in LLVM 3.8 (and maybe 3.4 as well), an attempt to access relocations of an executable file would fail the whole bap application, so we had to treat them separately.

Speaking about MachO, I still believe that something should be done:
for our testsute/x86_64-macho-obj, bap and objdump show different addresses for the first symbol: it's 1d0 for the former and 0 for the latter.

@ivg ivg force-pushed the rectifies-base-address-computation branch 4 times, most recently from 3036367 to 161c0b4 Compare July 24, 2020 21:13
@ivg ivg requested a review from gitoleg July 24, 2020 21:18
@ivg ivg force-pushed the rectifies-base-address-computation branch from 161c0b4 to 6e22f64 Compare July 24, 2020 21:21
@ivg
Copy link
Member Author

ivg commented Jul 24, 2020

The main misconception that your brought into the implementation is that relocations could only happen in the relocatable files, while the only difference is that relocatable files do not have any fixed virtual addresses so we have to give them some base and this base must be more or less in sync with the state-of-the-art tools.

As this PR shows, we can easily handle relocatable files by just making the base calculation more robust and amend it for the relocatable files. The issue that I have with the current implementation, is that we are not providing a lot of valuable information, for non-relocatable files, e.g., the rellocations itself, indirect symbols, external symbols, etc. Again, all these features commonly occur in regular binaries. But this is a topic for another PR. This should be considered done. We have fixed macho and cleanup ELF and removed lots of unnecessary code. Everything else will be tracked in #1189. See also #1188 where I also significantly refactored the loader part and introduced proper namespaces for properties. I will rebase #1188 once this PR is merged.

@ivg ivg force-pushed the rectifies-base-address-computation branch from 68caac6 to cb60bbe Compare July 28, 2020 19:12
@ivg ivg changed the title fixes the base calculation renovates the LLVM backend Aug 3, 2020
@ivg ivg force-pushed the rectifies-base-address-computation branch 4 times, most recently from f70d3d0 to 0f4e47b Compare August 4, 2020 18:40
ivg and others added 4 commits August 5, 2020 09:46
1. For ELF files we compute base as the difference between the address of
any loadable code segment and its offset. If there are no loadable code
segments, then we find a section with minimal offset value and
substract its address from its offset.

2. For MachO, when the file is relocatable, i.e., it doesn't have addresses we
compute base as $vaddr - offset$, the same as we do in ELF. This
gives us results that match objdump (but do not match radare2, however
radare2 is not seeing any symbols, so it doesn't really matter)

3. For COFF nothing is done, and I am not sure that we need
to do anything.

4. Removed special computation of the base
address (Base.from_sections_offset) from ELF, MachO, and COFF.

It is not tested on LLVM versions below 6, but I believe it should
work up to 3.4.

resolves BinaryAnalysisPlatform#1183

Co-authored-by: gitoleg <forown@yandex.ru>
Hope we will pass it now.
Droping support of old LLVM and legacy backends
-----------------------------------------------

We drop a lot of old code (minus 3k lines of code) thus removing the support
burden and making it easier to maintain, fix, and upgrade the code.

Fixes BinaryAnalysisPlatform#1166

Simplifies the implementation
-----------------------------

The remaining code base is significanly simplified. We dropped the
separation between relocatable and non-relocatable files, removed any
transformations of addresses from the LLVM backend (we now emit
absolute virtual addresses). The whole logic of transforming from the
llvm view to the bap image view now fits into a hundred lines of
code (instead of hundreds lines spread across 16 files as it was
before).

Fixes BinaryAnalysisPlatform#1183
Fixes BinaryAnalysisPlatform#1189

Produces more information
-------------------------------

The relocation information is now emitted for all files (not only for
relocatable). Also, removes tons of checks that were preventing our
backends from emitting valuable symbolic information.

Paves the road to BinaryAnalysisPlatform#1135 and BinaryAnalysisPlatform#1161
@ivg ivg force-pushed the rectifies-base-address-computation branch from 4a35ce4 to 7110388 Compare August 5, 2020 13:46
@ivg ivg merged commit 27a7a5d into BinaryAnalysisPlatform:master Aug 5, 2020
@ivg ivg deleted the rectifies-base-address-computation branch March 9, 2022 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants