Skip to content

Status of ThreadSanitizer for OCaml

Olivier Nicole edited this page Dec 8, 2022 · 4 revisions

Google is behind LLVM's ThreadSanitizer, an effective approach to locate data races in concurrent code bases. The goal of this work is

  • to extend the approach to full OCaml 5.0 and
  • to keep the run-time and instrumentation overhead as low as possible

Background

There are two components to ThreadSanitizer (TSan):

  1. A run-time library to track accesses to shared data and report races
  2. Compiler instrumentation that emits calls to the run-time library

Internally the run-time library associates with each word of application memory at least 2 “shadow words”. Each shadow word contains information about a recent memory access to that word, including a “scalar clock”. Those clocks serve to establish a happens-before (HB) relation, i.e. an event orderings that we are certain of.

This information is maintained as a “shadow state” in a separate memory region, and updated at every (instrumented) memory access. A data race is reported every time two memory accesses are made to overlapping memory regions, and:

  • one of them is a write, and
  • there is no established happens-before relation between them. More information about TSan's algorithm on their wiki.

When a potential data race is detected, TSan prints a data race reports with the affected memory location, and two backtraces (one for each memory access). While the backtrace for the latest access is easy to obtain by the usual unwinding (like the debugger does), the backtrace of the previous access is not readily available. What TSan does is keeping a history of function entry and exit events for each thread, which allows to reconstruct past backtraces. This history is implemented as a large ring buffer, so stacktraces for events that are too far in the past are lost.

The run-time library is reusable across different programming languages (C,C++,Go, ...).

Status

For OCaml there have been two supplementary efforts - ideally these should be combined:

  • Anmol's initial prototype extending the OCaml compiler to emit TSan instrumentation
  • Sadiq has been running TSan on the OCaml runtime, investigating reported data races, and annotating the C run-time to reduce false positives

Briefly, Anmol's prototype

  • add's a new cmm-based compiler pass adding instrumentation
  • the new pass instruments
    • functions with __tsan_func_entry and __tsan_func_exit
    • loads and stores with __tsan_readN and __tsan_writeN
  • as an optimization the pass currently doesn’t instrument
    • immutable values (since they won't have races) or
    • initializing writes (since they are already protected by a barrier making sure they are visible to all threads)

Anmol's work has been taken up by Olivier (@OlivierNicole) and Fabrice (@fabbing). The added work consisted, among other tasks, in:

  • Report backtrace correctly. This was tricky to implement as it relies on the compiler inserting calls to __tsan_func_entry() and __tsan_func_exit() at the right places, which is hard to do in the presence of tail calls, exceptions and effects. We have implemented ad hoc calls to __tsan_func_entry() and __tsan_func_exit() upon exception raising/catching and effect performing/handling/resuming, so the backtraces should be correct. This stack trace reconstruction is performed for both OCaml stack frames and C frames. For OCaml frames
  • Some data race reports concern the runtime and are not the responsability of the programmer. It is very hard to eliminate all the reports in the runtime (see ocaml#11040). For this reason, in the future we may want to silence these reports by default.

As of now, memory accesses are properly instrumented and running with ThreadSanitizer already allows to detect data races. The observed slowdown is about 7-13x.

Status of the work items

  • Get TSan running smoothly for pure OCaml code ✔️
  • Instrumentation of atomic variables ✔️
  • Address and/or decide on the below challenges
  • Measure overhead of approach. We expect the overhead to be significantly lower than, e.g., Go or C/C++ where mutability is the default. ✔️ (Overhead has been evaluated to roughly 7-13x.)
  • Test approach on larger code bases ✔️
  • Extend to support TSan-instrumented C-code called through the FFI ✔️ (Works pretty much out of the box)
  • Extend with a suppression mechanism that disables tracking or silences warnings for selected parts of the code (similar to the C TSan)
  • Extend to bytecode interpreter (not a priority ATM)

Challenges

  • Anmol and Francois: In the OCaml run-time system values can change address which means that the run-time library looses track of any relations. If there is a race on such a value with two accesses happening before/after such an address change, this could mean that the race is not detected. This can be caused by a GC promotion from the minor to the major heap.
    • Sadiq: Since a global lock is taken during garbage collections, there should be no data race in this situation.
  • Jan: What happens when a heap memory location is reclaimed and reused for a new value? Does this potentially cause false alarms?
    • Olivier: Not as long as TSan can establish a happens-before relation between accesses to the old value and accesses to the new value. For this we need to make sure that TSan considers collections as a synchronization event.
  • Instrumentation is based on __tsan_func_entry and __tsan_func_exit. Do tail-calls require special TSan treatment?
    • Yes, definitely. In fact, that was where the majority of our effort was spent.
  • Lazy values gets special treatment in the compiler: they have internal state and are short-circuited once they have been forced. Do they need special TSan treatment?
    • The fact that the C runtime is instrumented should suffice to handle those.

Caveats

  • TSan investigates a particular execution and therefore will not detect races in code paths that are not visited.

Resources