Skip to content

Cross checks design

Per Larsen edited this page Aug 8, 2017 · 2 revisions

We need to insert cross-checks on both the C and Rust code, and the cross-checks need to (mostly) match. We assume that we have 2 parallel compilation pipelines, one per language:

  • C -> clang -> clang AST -> clang IR emitter -> LLVM IR -> LLVM compilation -> output executable
  • C -> clang -> clang AST -> C2Rust transpiler -> Rust -> Rust frontend -> LLVM IR -> LLVM compilation -> output executable

We can insert our cross-checks in several places in both pipelines: directly on C code, on clang AST, or on LLVM IR. We believe the most flexible solution is a hybrid approach.

The hybrid approach

To reduce the C2Rust user's porting effort, we should automatically insert as many cross-checks as possible. Additionally, most of the cross-checks should not be visible in Rust code, which means that the best place to insert them is the LLVM IR. However, this has a few drawbacks:

  • Late in the pipeline, the LLVM IRs might not exactly match between the 2 front-end languages. For example, the C function void foo(const char *arr, size_t len) might have been translated to fn foo(arr: &[u8]). This could be due to either automatic or manual refactoring.
  • Some front-end information might not be available in the IR, e.g., type information. For example, C structures do not correspond 1:1 to LLVM IR structures, and we might need the former to implement more advanced checks.

For these reasons, we propose a hybrid approach: the LLVM backend automatically inserts cross-checks, but also provides a cross-check mutator interface that lets the Rust code do the following:

  • Insert new cross-checks (in case the refactoring removes or refactors C code)
  • Remove implicit cross-checks, e.g., where the Rust code adds additional functions or other code that isn't present in the C version
  • Mutate the implicit cross-checks, e.g. see the foo function above

For mutators, we may want to support certain implicit mutations. For example, we could always assume that a (const char*, size_t) pair of C values always corresponds to a str string in Rust.

Initial cross-check ideas

Initially, we want to add a cross-check on each function entry and exit. As a starting point, we will only check which functions get called, and later add checks for the values of the arguments. The latter could prove very tricky, as pointer, array and structure arguments could pose problems.