Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provides versioned backends for program units symbol tables #1037

Merged
merged 3 commits into from
Jan 27, 2020

Conversation

ivg
Copy link
Member

@ivg ivg commented Jan 24, 2020

TL;DR; follow mainstream OCaml and make dynamic loading sound and safe
(not again, as it never was before).

The Problem

The long story. Before OCaml 4.08 the dynlink module was
unsound. Linking the same module resulted in GC roots table corruption
and segmentation faults. To prevent this behavior, we were tracking
loaded units. However, we weren't able to reliably track units that
were linked into the program directly with the OCaml linker. We were
using findlib.dynload and a corresponding functionality in dune
that provide us the information about the units that were used to link
the host program. Unfortunately, findlib.dynload was recording this
information in terms of the findlib packages, not in compilation
units. Therefore, in order to resolve a package name to corresponding
compilation unit names, we needed a working findlib system, i.e., the
META files for all packages that are statically linked in the binary
should be present in the hard-coded locations. In normal mode of
operation, when packages that were used to build bap are present in
the file system it didn't pose any problems. However, when bap
together with its plugins was packed into a Debian package and
distributed to other machines no meta files were available. To enable
binary distributions we developed an ocamlbuild plugin that was
resolving package names to compilation unit names during the
compilation time so that bap (and other tools built from our main
source repository) wasn't dependent on the runtime presence of META
files. However, when a host program is compiled with dune or any
other build system that doesn't reflect package names to unit names
and store them in the host file predicates (read it anywhere outside
of bap), then when packaged and distributed to other hosts the program
will fail in runtime.

The Solution

Since the bug is fixed in 4.08 there is no longer problem. The solution
is trivial, just use Dynlink and the newly provided all_units
function, that enumerates unit names as we always needed.

The only problem with this solution is that it is probably to early
for us to drop the support for OCaml 4.07. Therefore, we decided to
provide two backends. The fallback solution still uses the old findlib
approach, but we decided to make it a little bit more robust, to
minimize the debugging time in case it will fail. We now check if we
have the static information about the compilation units that comprise
the host program, and if we don't then we ensure that we have working
ocamlfind and META files. If not then we terminate the program with
a more or less comprehensible message. If we have a modern compiler,
then we just use the Dynlink module (which is guarded with
ppx_optcomp).

The module name is controlled by the configure variable. We will be
using the old findlib based tracking for OCaml 4.07, and the new
interface of Dynlink for versions newer 4.08 and newer.
TL;DR; follow mainstream OCaml and make dynamic loading sound and safe
(not again, as it never was before).

The Problem
===========

The long story. Before OCaml 4.08 the dynlink module was
unsound. Linking the same module resulted in GC roots table corruption
and segmentation faults. To prevent this behavior, we were tracking
loaded units. However, we weren't able to reliably track units that
were linked into the program directly with the OCaml linker. We were
using `findlib.dynload` and a corresponding functionality in dune
that provide us the information about the units that were used to link
the host program. Unfortunately, `findlib.dynload` was recording this
information in terms of the findlib packages, not in compilation
units. Therefore, in order to resolve a package name to corresponding
compilation unit names we needed a working findlib system, i.e., the
META files for all packages that are statically linked in the binary
should be present in the hard-coded locations. In normal mode of
operation, when packages that were used to build bap are present in
the file system it didn't pose any problems. However, when bap
together with its plugins was packed into a debian package and
distributed to other machines no meta files were available. To enable
binary distributions we developed an ocamlbuild plugin that was
resolving package names to compilation unit names during the
compilation time, so that bap (and other tools built from our main
source repository) wasn't dependent on the runtime presence of META
files. However, when a host program is compiled with dune, or any
other build system that doesn't reflect package names to unit names
and store them in the host file predicates (read it anywhere outside
of bap), then when packaged and distributed to other hosts the program
will fail in runtime.

The Solution
============

Sine the bug is fixed in 4.08 there is no a big problem. There is
still some impendance mismatch between the names of the
libraries (cmxs or cma) that we load and the names of the units that
comprise the library, therefore, we can't know beforehand whether a
library that we load is already linked into the main program, because
we can only query dynlink for the names of the compilation units, not
for the names of the libraries, that used during the linking
procedure. We address this problem by looking into the error code, if
the code is `Dynlink.Module_already_loaded _` then instead of failing,
we record the library that loads this module in our repository.

The only problem with this solution is that it is probably to early
for us to drop the support for OCaml 4.07. Therefore, we decided to
provide two backends. The fallback solution still uses the old findlib
approach, but we decided to make it a little bit more robust, to
minimize the debugging time in case it will fail. We now check if we
have the static information about the compilation units that comprise
the host program, and if we don't then we ensure that we have working
ocamlfind and META files. If not than we terminate the program with
more or less comprehensible message. If we have a modern compiler,
then we just use the Dynlink module (which is guarded with
ppx_optcomp).
@ivg ivg merged commit 0ec9508 into BinaryAnalysisPlatform:master Jan 27, 2020
@ivg ivg deleted the switch-to-new-dynlink branch June 10, 2020 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants