Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite x86 decoder to use Capstone for instruction decoding #69

Merged
merged 82 commits into from
Nov 14, 2018

Conversation

ceeac
Copy link
Member

@ceeac ceeac commented Sep 7, 2018

This PR completely replaces the old PentiumDecoder by a new CapstoneX86Decoder using Capstone for decoding x86 instructions. This has several benefits:

  • No 50k source code lines file
  • Faster compile times, especially with optimizations enabled
  • Cleaner code
  • Better unit test coverage
  • Better regression test coverage
  • Support for more x86 instructions (when finished)
  • Better error messages when decoding instructions with unknown semantics
  • Fixes crashes when decoding x86 instructions with swapped prefixes

This is a work in progress - do not merge yet.

@XVilka
Copy link

XVilka commented Sep 7, 2018

Why not to make it work on top of radare2 instead? See radareorg/radare2#10102

@ceeac
Copy link
Member Author

ceeac commented Sep 7, 2018

Only using radare2 as an instruction decoder would ultimately limit flexibility: radare2 would then be a hard dependency. What I eventually want to do is being able to add decoders as plugins, so they can be changed at will. So using radare2 as an instruction decoder might be possible in the future, but as an optional dependency instead of as a hard dependendy.

@ceeac ceeac added this to the v0.5.0 milestone Sep 7, 2018
@ceeac ceeac added PR: work in progress (P) This is in development. Don't merge. dom: decoder (P/I) Related to instruction decoding / frontend labels Sep 7, 2018
@nemerle
Copy link
Collaborator

nemerle commented Sep 7, 2018

I agree with @ceeac, using capstone in a plugin is a better approach. And I'm afraid that integrating radare2 would lead the boomerang to becoming another radare2 frontend ( and there are a few of those already ? :) )

@nemerle
Copy link
Collaborator

nemerle commented Sep 7, 2018

Also, two things to consider:

  • downloading + building a specific capstone version using CMake's ExternalProject support ( more stable environment )
  • a way to integrate 16bit x86 ( maybe another plugin/separate issue altogether ? )

@ceeac ceeac force-pushed the capstone branch 2 times, most recently from ef4d16c to ed64863 Compare September 28, 2018 14:32
@ceeac
Copy link
Member Author

ceeac commented Oct 2, 2018

Re. ExternalProject / 16bit x86 support:
I think these should be adressed after this PR is merged. I want to stabilize x86_32 support first; however adding 16 bit x86 support should not be too hard afterwards.

@ceeac ceeac force-pushed the capstone branch 2 times, most recently from 97bd77d to f9d124c Compare October 11, 2018 10:42
@XVilka
Copy link

XVilka commented Oct 12, 2018

Capstone is not very updated lately. You can try Zydis.

@ceeac
Copy link
Member Author

ceeac commented Oct 12, 2018

There are a couple of things I want to say about this:

  • I would not call a library that has had its last release three months ago "not updated" (although I agree that there was a large gap between the last release and the next-to-last one).
  • Zydis only supports x86 and x86-64, while Capstone supports mote architectures. Using a single library for all instruction decoders reduces required maintenance, and why use another library when we most likely are going to use Capstone anyways in one way or another?
  • The majority of the work left to do (for this PR) is in updating x86.ssl, which can be used regardless of which disassembler library is used. So imo it makes sense to keep Capstone because it works for the scope of this PR (The base IA-32 instruction set without the (now not-so-new) extensions like SSE, MMX etc.)
  • After this PR is merged, it is not too hard to add a new decoder plugin that uses Zydis for disassembling instructions. Feel free to submit a PR for this @XVilka.

@ceeac
Copy link
Member Author

ceeac commented Oct 12, 2018

To clarify what instructions still need semantics for this PR, these are:

  • String instructions with and without prefixes (some are already implemented)
  • Lots of FP instructions
  • Some MUL/IMUL variants
  • BSF/BSR instructions

In addition, all instructions still have to be checked for correctness before this PR can be merged.

@XVilka
Copy link

XVilka commented Oct 16, 2018

@ceeac Surely it is abandoned - author cannot support latest ARM instructions for example, PRs are rotting even without response: https://github.com/aquynh/capstone/pulls
This is why x96dbg for example switched to capstone https://x64dbg.com/blog/2017/10/18/goodbye-capstone-hello-zydis.html

I am not pushing, just as we (radare2 project) use capstone we met this inability to keep the library updated, thus rendering it is almost useless for any modern real world binaries.

@ceeac ceeac removed the PR: work in progress (P) This is in development. Don't merge. label Nov 14, 2018
@ceeac
Copy link
Member Author

ceeac commented Nov 14, 2018

I have now tested the new decoder on a number of binaries and found it to work and perform much better than the old one. Some finishing touches will still be necessary, but I'll leave that for additional PRs.

Re. Zydis support: My current plan is to implement a Zydis decoder plugin after the release of 0.5.0 so users can choose between Capstone and Zydis.

@ceeac ceeac merged commit 0079415 into BoomerangDecompiler:develop Nov 14, 2018
@ceeac ceeac mentioned this pull request Nov 14, 2018
@ceeac ceeac deleted the capstone branch November 16, 2018 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dom: decoder (P/I) Related to instruction decoding / frontend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants