-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overhauls the target/architecture abstraction (2/n) #1226
Merged
ivg
merged 5 commits into
BinaryAnalysisPlatform:master
from
ivg:overhauls-targets-part-2
Oct 1, 2020
Merged
overhauls the target/architecture abstraction (2/n) #1226
ivg
merged 5 commits into
BinaryAnalysisPlatform:master
from
ivg:overhauls-targets-part-2
Oct 1, 2020
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
What is interworking -------------------- Interworking is a feature of some architectures that enables mixing several instruction sets in the same compilation unit. Example, arm and thumb interworking that this branch is trying to add. What is done ------------- 1. We add the switch primitive to the basic interface that changes the dissassembler in the current disassembling state. It is a bold move and can have conseqeuences, should be carefully reviewed 2. Attributes each destination in the disassembler driver state with the architecture and calls switch every time we are going to disassemble the next chunk of memory. 3. The default rule that extends the unit architecture to all instructions in that unit is disabled for ARM/Thumb and is overriden in the arm plugin with the following behavior, if an arm unit has a file and that file has a symbol table then we provide information based on the last bit of that symbol table (todo: we should also check for abi), otherwise we propagate the unit arch to instructions. What is to be done ------------------ Next, the arm lifter shall provide a promise to compute destinations (which itself will require destinations, because we don't really want to compute them) and provide the destination architecture, based on the source encoding. We can safely examine any representation of the instruction since it is already will be lifted by that moment.
also makes Enum more strict by checking that the element is indeed a member of the set of elements and by preventing double declarations.
ivg
force-pushed
the
overhauls-targets-part-2
branch
2 times, most recently
from
October 1, 2020 12:58
6f2f7ad
to
6355807
Compare
In the second patch of this series (BinaryAnalysisPlatform#1225) we completely got rid of Arch.t dependency in the disassembler engine that finally opens the path for seamless integration of targets that are not representable with Arch.t. To achieve this, we introduced a proper dependency injection into the disassembler driver so that it is no longer responsible for creating the llvm MC disassembler. Instead a plugin that implements a target, aka the target support package, has to create a disassembler and is now in full control of all parameters and can choose backend, specify the CPU and other details of encoding. The encoding is a new abstraction in our knowledge base that breaks the tight connection between the target and the way how the program for that target is encoded. Unlike the target, which is a property of a unit of code, the encoding is associated with a program itself, i.e., it is a property of each instruction. That enables targets with context-dependent encodings such ARM's thumb mode and MIPS16e for binary encodings as well as paves the road for non-binary encodings for the same architecture, e.g., text assembly (which also may have several encodings on its own, cf. att vs intel syntax). We base this branch on the enable-interworking (BinaryAnalysisPlatform#1188) and this branch fully superseeds and includes it, since encodings made it much more natural. It is still highlty untested how it will work with real thumb binaries but we will get back to it when we will merge BinaryAnalysisPlatform#1178. Another big update, is that the disassembler backend (which is responsible for translating bits into machine instructions) is no longer required to be implemented in C++ and it is now possible to write your own backends/disassemblers in pure OCaml, e.g., to support PIC microcontrollers. The Backend interface is pretty low-level and we might provide higher-level interfaces later, see `Disasm_expert.Backend` for the interface and detailed comments. Finally, we rectify the interface introduced in the previous PR and flatten the hierarchy of newly introduced to the Core Theory abtractions, i.e., instead of `Theory.Target.Endiannes` we now have `Theory.Endianness` and so on. We also made the `Enum` module public which introduced enumerated types built on to of `Knowledge.Value`s. In the next episodes of this series we will gradually remove Arch.t from other bap components and further clean up the newly introduced interfaces.
ivg
force-pushed
the
overhauls-targets-part-2
branch
from
October 1, 2020 15:19
6355807
to
d27fd0e
Compare
ivg
added a commit
to ivg/bap
that referenced
this pull request
Feb 22, 2021
This PR is the continuation of the BinaryAnalysisPlatform#1225, BinaryAnalysisPlatform#1226, and BinaryAnalysisPlatform#1227 series of changes that were focused on substituting the old and inextensible `Arch.t` abstraction with the new `Theory.Target.t` representation. This episode is instigated by the upcoming implementation of the RISCV target. Since RISCV is the out target that is not supported with Arch.t it became a good test of the new Theory.Target.t abstraction. As the RISCV worked showed, we still have lots of code that depends on Arch.t, most importantly Primus, which was fully dependent on Arch.t. The main issue was that Theory.Target.t doesn't provide any means to encode register classes, which prevented us from using it everywhere in Primus, e.g., we need to know which register is the stack pointer in order to setup the stack. To implement this, we introduce a new abstraction called _role_. A _role_ could be generally applied to any entity but so far we are only talking about the roles of registers in various targets. The target definiton now acccepts the `regs` paramater that takes the register file specification with each register assigned one or more roles, e.g., here is the register file specification for 8086, ```ocaml Theory.Role.Register.[ [general; integer], main @< index @< segment; [stack_pointer], untyped [reg r16 "SP"]; [frame_pointer], untyped [reg r16 "BP"]; [Role.index], untyped index; [Role.segment], untyped segment; [status], untyped flags; [integer], untyped [ reg bool "CF"; reg bool "PF"; reg bool "AF"; reg bool "ZF"; reg bool "SF"; reg bool "OF"; ] ``` I.e., we assign a set of roles to a set of registers. We also now have two new functions `Theory.Target.regs` and `Theory.Target.reg` that enable querying the register file of the target for register that fulfill one or more roles. Whilst we publish a limited number of well-known (blessed) roles in the `Theory.Role.Register` module, more roles could be added as user need it. For example, in the code snippet above we have two non-standard roles that are specific to the x86 architectures, `Role.index` and `Role.segment`. With roles we can drop the dependency on Target in most of the places where it makes sense (I still left it in x86 and other target-specific plugins, which obviously are independent on the newly added architectures).
ivg
added a commit
that referenced
this pull request
Feb 22, 2021
This PR is the continuation of the #1225, #1226, and #1227 series of changes that were focused on substituting the old and inextensible `Arch.t` abstraction with the new `Theory.Target.t` representation. This episode is instigated by the upcoming implementation of the RISCV target. Since RISCV is the out target that is not supported with Arch.t it became a good test of the new Theory.Target.t abstraction. As the RISCV worked showed, we still have lots of code that depends on Arch.t, most importantly Primus, which was fully dependent on Arch.t. The main issue was that Theory.Target.t doesn't provide any means to encode register classes, which prevented us from using it everywhere in Primus, e.g., we need to know which register is the stack pointer in order to setup the stack. To implement this, we introduce a new abstraction called _role_. A _role_ could be generally applied to any entity but so far we are only talking about the roles of registers in various targets. The target definiton now acccepts the `regs` paramater that takes the register file specification with each register assigned one or more roles, e.g., here is the register file specification for 8086, ```ocaml Theory.Role.Register.[ [general; integer], main @< index @< segment; [stack_pointer], untyped [reg r16 "SP"]; [frame_pointer], untyped [reg r16 "BP"]; [Role.index], untyped index; [Role.segment], untyped segment; [status], untyped flags; [integer], untyped [ reg bool "CF"; reg bool "PF"; reg bool "AF"; reg bool "ZF"; reg bool "SF"; reg bool "OF"; ] ``` I.e., we assign a set of roles to a set of registers. We also now have two new functions `Theory.Target.regs` and `Theory.Target.reg` that enable querying the register file of the target for register that fulfill one or more roles. Whilst we publish a limited number of well-known (blessed) roles in the `Theory.Role.Register` module, more roles could be added as user need it. For example, in the code snippet above we have two non-standard roles that are specific to the x86 architectures, `Role.index` and `Role.segment`. With roles we can drop the dependency on Target in most of the places where it makes sense (I still left it in x86 and other target-specific plugins, which obviously are independent on the newly added architectures).
ivg
added a commit
to BinaryAnalysisPlatform/bap-bindings
that referenced
this pull request
Jul 27, 2021
resolves #19 The example was buggy as the size of the pointer was specified incorrectly. It was acceptable before but is not longer tolerated after we enabled [interworking][1] (several architectures in the same binary). [1]: BinaryAnalysisPlatform/bap#1226
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the second patch of this series (#1225) we completely got rid of
Arch.t dependency in the disassembler engine that finally opens the
path for seamless integration of targets that are not representable
with Arch.t.
To achieve this, we introduced a proper dependency injection into the
disassembler driver so that it is no longer responsible for creating
the llvm MC disassembler. Instead a plugin that implements a target,
aka the target support package, has to create a disassembler and is
now in full control of all parameters and can choose backend, specify
the CPU and other details of encoding. The encoding is a new
abstraction in our knowledge base that breaks the tight connection
between the target and the way how the program for that target is
encoded. Unlike the target, which is a property of a unit of code, the
encoding is associated with a program itself, i.e., it is a property
of each instruction. That enables targets with context-dependent
encodings such ARM's thumb mode and MIPS16e for binary encodings as
well as paves the road for non-binary encodings for the same
architecture, e.g., text assembly (which also may have several
encodings on its own, cf. att vs intel syntax). We base this branch on
the enable-interworking (#1188) and this branch fully superseeds and
includes it, since encodings made it much more natural. It is still
highlty untested how it will work with real thumb binaries but we will
get back to it when we will merge #1178.
Another big update, is that the disassembler backend (which is
responsible for translating bits into machine instructions) is no
longer required to be implemented in C++ and it is now possible to
write your own backends/disassemblers in pure OCaml, e.g., to support
PIC microcontrollers. The Backend interface is pretty low-level and we
might provide higher-level interfaces later, see
Disasm_expert.Backend
for the interface and detailed comments.Finally, we rectify the interface introduced in the previous PR and
flatten the hierarchy of newly introduced to the Core Theory
abtractions, i.e., instead of
Theory.Target.Endiannes
we now haveTheory.Endianness
and so on. We also made theEnum
module publicwhich introduced enumerated types built on to of
Knowledge.Value
s.In the next episodes of this series we will gradually remove Arch.t
from other bap components and further clean up the newly introduced
interfaces.