Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overhauls the target/architecture abstraction (2/n) #1226

Merged
merged 5 commits into from
Oct 1, 2020

Commits on Sep 25, 2020

  1. enables interworking in the disassembler driver

    What is interworking
    --------------------
    
    Interworking is a feature of some architectures that enables mixing
    several instruction sets in the same compilation unit. Example, arm
    and thumb interworking that this branch is trying to add.
    
    What is done
    -------------
    
    1. We add the switch primitive to the basic interface that changes the
    dissassembler in the current disassembling state. It is a bold move
    and can have conseqeuences, should be carefully reviewed
    
    2. Attributes each destination in the disassembler driver state with
    the architecture and calls switch every time we are going to
    disassemble the next chunk of memory.
    
    3. The default rule that extends the unit architecture to all
    instructions in that unit is disabled for ARM/Thumb and is overriden
    in the arm plugin with the following behavior, if an arm unit has a file
    and that file has a symbol table then we provide information based on
    the last bit of that symbol table (todo: we should also check for
    abi), otherwise we propagate the unit arch to instructions.
    
    What is to be done
    ------------------
    
    Next, the arm lifter shall provide a promise to compute
    destinations (which itself will require destinations, because we don't
    really want to compute them) and provide the destination architecture,
    based on the source encoding. We can safely examine any representation
    of the instruction since it is already will be lifted by that moment.
    ivg committed Sep 25, 2020
    Configuration menu
    Copy the full SHA
    38a656f View commit details
    Browse the repository at this point in the history

Commits on Sep 29, 2020

  1. flattens the target interface, publishes the Enum module

    also makes Enum more strict by checking that the element is indeed a
    member of the set of elements and by preventing double declarations.
    ivg committed Sep 29, 2020
    Configuration menu
    Copy the full SHA
    c0a3d55 View commit details
    Browse the repository at this point in the history
  2. adds an llvm decode for x86

    ivg committed Sep 29, 2020
    Configuration menu
    Copy the full SHA
    4808ca9 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e6888e2 View commit details
    Browse the repository at this point in the history

Commits on Oct 1, 2020

  1. overhauls the target/architecture abstraction (2/n)

    In the second patch of this series (BinaryAnalysisPlatform#1225) we completely got rid of
    Arch.t dependency in the disassembler engine that finally opens the
    path for seamless integration of targets that are not representable
    with Arch.t.
    
    To achieve this, we introduced a proper dependency injection into the
    disassembler driver so that it is no longer responsible for creating
    the llvm MC disassembler. Instead a plugin that implements a target,
    aka the target support package, has to create a disassembler and is
    now in full control of all parameters and can choose backend, specify
    the CPU and other details of encoding. The encoding is a new
    abstraction in our knowledge base that breaks the tight connection
    between the target and the way how the program for that target is
    encoded. Unlike the target, which is a property of a unit of code, the
    encoding is associated with a program itself, i.e., it is a property
    of each instruction. That enables targets with context-dependent
    encodings such ARM's thumb mode and MIPS16e for binary encodings as
    well as paves the road for non-binary encodings for the same
    architecture, e.g., text assembly (which also may have several
    encodings on its own, cf. att vs intel syntax). We base this branch on
    the enable-interworking (BinaryAnalysisPlatform#1188) and this branch fully superseeds and
    includes it, since encodings made it much more natural. It is still
    highlty untested how it will work with real thumb binaries but we will
    get back to it when we will merge BinaryAnalysisPlatform#1178.
    
    Another big update, is that the disassembler backend (which is
    responsible for translating bits into machine instructions) is no
    longer required to be implemented in C++ and it is now possible to
    write your own backends/disassemblers in pure OCaml, e.g., to support
    PIC microcontrollers. The Backend interface is pretty low-level and we
    might provide higher-level interfaces later, see
    `Disasm_expert.Backend` for the interface and detailed comments.
    
    Finally, we rectify the interface introduced in the previous PR and
    flatten the hierarchy of newly introduced to the Core Theory
    abtractions, i.e., instead of `Theory.Target.Endiannes` we now have
    `Theory.Endianness` and so on. We also made the `Enum` module public
    which introduced enumerated types built on to of `Knowledge.Value`s.
    
    In the next episodes of this series we will gradually remove Arch.t
    from other bap components and further clean up the newly introduced
    interfaces.
    ivg committed Oct 1, 2020
    Configuration menu
    Copy the full SHA
    d27fd0e View commit details
    Browse the repository at this point in the history