-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BAP does not support well for thumb instruction set #951
Comments
@valour01 Well, it's the known issue. BAP just doesn't support ARM Thumb mode. Once we have time, we will add support for it, but I can't promise it will happen in the nearest future. So PRs are welcome! |
Hi, Can you explain me the processing chain from elf to the Arm_lifter, where is it "programmed"? Did you consider to use the elf_loader and asm to binary mapping from the sail project? Is the |
This is a manifold problem and the only thing that works reliably here is actually the decoding.
However, our lifter will not understand it. We re using the LLVM MC decoder, which for some reason, most likely valid, distinguishes between the same named ARM and Thumb instructions, e.g.,
The same
We have to consult the LLVM documentation and source code to really understand why they chose different codes for the same instruction and how this affects the semantics of operands. The long story short, lifter has to be updated.
Once lifter is updated we can move to the solution of the main problem. If you will look into the file, that was nicely provided by @valour01, you will notice that it contains both ARM and Thumb instructions. Moreover, it is actually an ARM binary,
Notice, And it starts as an ARM binary, instructions at Such kind of multiarch binaries are not an exception, they are common as most of the ARM processors has two (or even more) decoders for each instruction set they support. Depending on a state of the CPU it will interpret bytes differently. The state is usually just a flag, which is set with branching, e.g.,
in the Thumb mode, i.e., and and operation followed by a pc-relative branch, and as
And as a storage operation in the ARM mode. While modern compilers will unlikely generate code that will reuse the same location for different interpretations, it is possible that a malformed or malicious program will do this. For us, as reverse engineers, it means that both interpretations are valid, depending on a context. It also means, that the same address may have different instructions depending on a context. It also means, that every time we see a blx instruction, we have to fork our disassembler to produce two versions of the program - one for the case when we were in the ARM mode and another for the case when we were in the Thumb mode. It is easily seen that we have an exponential growth of the program, i.e.,, Now we can see, that the correct disassembling of a thumb interworked binary is nearly impossible, but doable. There are, however, some roadblocks in the current implementation of BAP 1.x disassembler. It doesn't allow switching architectures, as the architecture is the property of the whole binary. This is being fixed in BAP 2.0, where the new disassembler engine ascribes arbitrary architecture to any program location. And program locations are no longer represented with addresses, so that we can now treat the same address as two different locations, dependending on the current instruction set. The new framework also enables speculative disassembly, so that we can fully disassemble all possible interpretations of a program and get a sound model of a binary. But this was a problem in general, and as you can see, we're moving in the right direction. A smaller problem would be updating the lifter, so that we can at least get the semantics of thumb instructions. |
Wow, thank you very much! I was using bap on a cortex-m ELF file. Using bap-mc on the pure encoding produces entirely different results :) Thanks a lot again! |
the work on this issue has moved to #1174 |
fixed in #1178. We now support interworking and Thumb/Thumb2 instruction sets. |
That's great! I was wondering whether there are stable release version of BAP that supports the Thumb/Thumb2 instruction sets? Or I have to clone the latest git repo to get the support. Many Thanks. |
@valour01, the stable release (2.2.0) will be out soon. You can get the latest testing (that matches the master branch of BAP) by just adding the testing repository to opam,
Alternatively, you can use Debian packages that are automatically released every Saturday, see here. |
Hi, I noticed that BAP does not have very good support on binaries in thumb instruction set.
You can try this test file test.zip
We noticed that BAP would disassemble the binary with arm instruction set, which is completely wrong. Due to the mistake of disassembly, BAP has very bad performance on function detection, cfg and cg construction for thumb binaries. I could provide more test cases if you want. Many Thanks
The text was updated successfully, but these errors were encountered: