Skip to content

Commit

Permalink
Update the whamm documentation (#174)
Browse files Browse the repository at this point in the history
* update the README

* high level pass at updating Rust mdbook

* Flesh out the documentation

* mark unfinished pages with TODO
  • Loading branch information
ejrgilbert authored Nov 11, 2024
1 parent 8f1d32a commit 462e14f
Show file tree
Hide file tree
Showing 33 changed files with 500 additions and 427 deletions.
22 changes: 5 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,6 @@ Take a look at the official [`whamm!` book](https://ejrgilbert.github.io/whamm/i

### Build ###

Get [`orca`](https://github.com/thesuhas/orca), should be in parent directory at `../orca` (see Cargo.toml):
```shell
# Inside base directory of this project
cd ..
git clone git@github.com:thesuhas/orca.git
cd orca
git checkout DO_NOT_DELETE/whamm_dependency
cd ../whamm
```

To run basic build:
```shell
cargo build
```
Expand All @@ -37,7 +26,6 @@ cargo build
In order to run the tests, a WebAssembly interpreter must be configured.
The supported interpreters are:
1. the Wizard engine interpreter. https://github.com/titzer/wizard-engine/tree/master
- Note that the Wizard interpreter does not run on Macs (yet...), so the Wasm reference interpreter will need to be configured in this context.
2. the Wasm reference interpreter. https://github.com/WebAssembly/spec/tree/main/interpreter

**How to build the [Wizard GH project]() to acquire these binaries:**
Expand Down Expand Up @@ -83,9 +71,9 @@ To specify log level:
RUST_LOG={ error | warn | info | debug | trace | off } cargo run -- --app <path_to_app_wasm> --script <path_to_script> <path_for_compiled_output>
```

To visually debug the decision tree used during Wasm bytecode emission:
To use the utility that provides information about match rule globals/functions that can be leveraged by a probe's logic/predicate:
```shell
cargo run -- vis-script --script <path_to_script>
cargo run -- info --rule "<match_rule_glob>" # e.g. "wasm:opcode:br*"
```

To run a script that uses special var types, at the moment this is `map` and `report` variables, do the following:
Expand Down Expand Up @@ -126,9 +114,9 @@ To be added:
- `exception` throw/rethrow/catch events

Example:
`wasi:http:send_req:alt`
`wasm:bytecode:call:alt`
`wasm:fn:enter:before`
- `wasi:http:send_req:alt`
- `wasm:opcode:call:alt`
- `wasm:fn:enter:before`

# The book #

Expand Down
20 changes: 13 additions & 7 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,23 +13,29 @@
- [WIP - Tuples](intro/syntax/tuples.md)
- [WIP - Maps](intro/syntax/maps.md)
- [WIP - Functions](intro/syntax/functions.md)
- [Report Variables](intro/syntax/report_vars.md)
- [Unshared Variables](intro/syntax/unshared_vars.md)
- [Shared Variables](intro/syntax/shared_vars.md)
- [Probes](intro/syntax/probes.md)
- [Scripts](intro/syntax/scripts.md)
- [Events](intro/events.md)
- [WIP - Libraries](intro/libraries.md)
- [WIP - Testing](intro/testing.md)
- [Injection Strategies](intro/injection_strategies.md)

- [WIP - Example Use Cases](examples/intro.md)
- [WIP - Branch Monitor](examples/branch_monitor.md)
- [Example Use Cases](examples/intro.md)
- [Branch Monitor](examples/branch_monitor.md)

- [Developers](devs/intro.md)
- [Phases of Compilation](devs/compiler_phases.md)
- [Parse](devs/parsing.md)
- [TODO - Core Library](devs/core_lib.md)
- [Verify](devs/verifying.md)
- [Encode as a `BehaviorTree`](devs/behavior_tree.md)
- [Emit](devs/emitting.md)
- [TODO - CLI](devs/cli.md)
- [TODO - Testing](devs/testing.md)
- [TODO - Error Handling](devs/error_handling.md)
- [TODO - Translate](devs/translate.md)
- [Emit](devs/emit/emitting.md)
- [Rewriting](devs/emit/rewriting_target.md)
- [TODO - Engine](devs/emit/engine_target.md)
- [CLI](devs/cli.md)
- [Testing](devs/testing.md)
- [Error Handling](devs/error_handling.md)
- [Contributors to `whamm!`](devs/contributors.md)
55 changes: 0 additions & 55 deletions docs/src/devs/behavior_tree.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/src/devs/cli.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# The `whamm!` CLI #

TODO
The CLI is defined using the `clap` Rust crate.
19 changes: 10 additions & 9 deletions docs/src/devs/compiler_phases.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,19 @@

First, what is meant by the term "compilation" depends on the selected injection strategy.

For **bytecode rewriting**, compilation means generating a new _instrumented_ variation of the passed program.
For **bytecode rewriting**, compilation means generating a new _instrumented_ variation of the application's bytecode.

For **direct engine support**, compilation means compiling the `.mm` script to a `.v3` program that interfaces with an engine to instrument the program dynamically.
The original program _is not touched_ when using this strategy.
For **direct engine support**, compilation means compiling the `.mm` script to a new Wasm module that interfaces with an engine to instrument the program dynamically.
The original program is _not touched_ **and** _not provided_ when using this strategy.

The first three phases of `whamm!` compilation are identical for both strategies.
The final `emit` phase is where the variation lies.
This is because "emitting" for **bytecode rewriting** means using the `walrus` library to insert new instructions into the program.
Whereas "emitting" for **direct engine support** means emitting `Virgil` code to specify the instrumentation probes in a new format that leverages the target engine's instrumentation API.
The `translate` and `emit` phases vary between injection strategy.
This is because "emitting" for **bytecode rewriting** means using the `orcs` library to insert new instructions into the program.
Whereas "emitting" for **direct engine support** means emitting a Wasm module encoding _where to instrument_ and the callbacks to attach at the probed sites by interfacing with the engine at application runtime.

These are the four phases of compilation:
1. [Parse](parsing.md)
2. [Verify](verifying.md)
3. [Encode as a `BehaviorTree`](behavior_tree.md)
4. [Emit](emitting.md)
2. Configure the `Whamm!` [Core Library](./core_lib.md) (if needed)
3. [Verify](verifying.md)
4. [Translate](translate.md) AST into the injection strategy's representation
5. [Emit](emit/emitting.md)
2 changes: 2 additions & 0 deletions docs/src/devs/contributors.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@ These are the people to thank when you're using `whamm!`...either genuinely or s
[**Elizabeth Gilbert**](https://se-phd.s3d.cmu.edu/People/students/student-bios/gilbert-elizabeth.html), PhD student at **Carnegie Mellon University (CMU)**.

[**Alex Bai**](https://www.eecs.tufts.edu/~abai02/), undergrad student at **Tufts University**.

[**Wavid Bowman**](), undergrad student at **University of Florida**.
3 changes: 3 additions & 0 deletions docs/src/devs/core_lib.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Phase 2: Configure The `Whamm!` Core Library #

TODO
21 changes: 21 additions & 0 deletions docs/src/devs/emit/emitting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Phase 5: Emit #

Here is documentation describing how we _emit_ `.mm` scripts.

## Some Helpful Concepts ##

**What is a `generator`?**
A `generator` is used to traverse some representation of logic in an abstract way.
It then calls the `emitter` when appropriate to actually emit the code in the target representation.

**What is an `emitter`?**
The `emitter` exposes an API that can be called to emit code in the target representation.
There will be as many emitters as there are target representations supported by the language.

## The Injection Strategies ##

The code that is emitted, and the methodology in which emitting happens, depends on the injection strategy specified by the user.

There are currently two supported injection strategies:
1. [Bytecode Rewriting](./engine_target.md)
2. Interfacing with an [engine](./rewriting_target.md)
3 changes: 3 additions & 0 deletions docs/src/devs/emit/engine_target.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Interfacing with an Engine #

TODO
Original file line number Diff line number Diff line change
@@ -1,48 +1,34 @@
# Phase 4: Emit #
# Bytecode Rewriting #

Here is documentation describing how we _emit_ `.mm` scripts.

## Some Helpful Concepts ##

**What is a `generator`?**
A `generator` is used to traverse some representation of logic in an abstract way.
It then calls the `emitter` when appropriate to actually emit the code in the target representation.

**What is an `emitter`?**
The `emitter` exposes an API that can be called to emit code in the target representation.
There will be as many emitters as there are target representations supported by the language.

In the context of `whamm!`, there are two `generator`s.
For bytecode rewriting, there are two `generator`s.
Each of these generators are used for a specific reason while emitting instrumentation.
The `InitGenerator` is run first to emit the parts of the `.mm` script that need to exist _before_ any probe actions are emitted, such as functions and global state.
The `InstrGenerator` is run second to emit the probes.
The `InstrGenerator` is run second to emit the probes while visiting the `app.wasm` bytecode (represented as an in-memory IR).

Both of these generators use the `emitter` that emits Wasm code.
The `emitter` uses utilities that centralize the Wasm emitting logic found at [`utils.rs`]

Both of these generators use the `emitter` that emits instrumentation as configured by the end-user (either via _bytecode rewriting_ or emitting a `.v3` file that interfaces with an engine with direct support for instrumentation).
[`utils.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/emitter/utils.rs

## 4.1 `InitGenerator` ##
## 1. `InitGenerator` ##

The [`init_generator.rs`] traverses the AST to emit functions and globals that need to exist before emitting probes.
The `run` function is the entrypoint for this generator.
This follows the visitor software design pattern.
There are great resources online that teach about the visitor pattern if that is helpful for any readers.

Consider _bytecode rewriting_.
This generator emits new Wasm functions and globals into the program with associated Wasm IDs.
These IDs are stored in the `SymbolTable` for use while running the `InstrGenerator`.
When emitting an instruction that either calls an emitted function or does some operation with an emitted global, the name of that symbol is looked up in the `SymbolTable` to then use the saved ID in the emitted instruction.

[`init_generator.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/generator/init_generator.rs
[`init_generator.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/generator/rewriting/init_generator.rs

## 4.2 `InstrGenerator` ##

The [`instr_generator.rs`] traverses the `BehaviorTree` which encodes the logic of the instrumentation to emit.
The `run` function is the entrypoint for this generator.
This follows the visitor software design pattern.
There are great resources online that teach about the visitor pattern if that is helpful for any readers.
## 2. `InstrGenerator` ##

This `generator` calls into the `emitter` to gradually traverse the program in search for the locations corresponding to each probe.
The [`instr_generator.rs`] calls into the `emitter` to gradually traverse the application in search for the locations that correspond to probe events in the `.mm`'s AST.
When a probed location is found, the `generator` emits Wasm code into the application at that point through `emitter` utilities.

[`instr_generator.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/generator/instr_generator.rs
[`instr_generator.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/generator/rewriting/instr_generator.rs

### Constant Propagation and Folding!! ###

Expand All @@ -51,14 +37,14 @@ There are lots of resources online explaining these concepts if that would be us

The `whamm info` command helps users see various globals that are in scope when using various probe match rules.
All of these global variables are defined by `whamm!`'s compiler and _should only be emitted as constant literals_.
If the variable were ever emitted into an instrumented program or `.v3` monitor, the program would fail to execute since the variable _would not be defined_.
If the variable were ever directly emitted into an instrumented program, with no compiler-provided definition, the program would fail to execute since the variable _would not be defined_.

`whamm!` uses constant propagation and folding to remedy this situation!

The `define_*` functions in [`emitters.rs`] are examples of **how compiler constants are defined**.
The `define` function in [`visiting_emitter.rs`] is **how compiler constants are defined** while traversing the application bytecode.
These specific globals are defined in the emitter since their definitions are tied to locations in the Wasm program being instrumented.

The `ExprFolder` in [`types.rs`] performs constant propagation and folding on expressions.
The `ExprFolder` in [`folding.rs`] performs constant propagation and folding on expressions.

When considering a _predicated probe_, this behavior can be quite interesting.
Take the following probe definition for example:
Expand Down Expand Up @@ -96,5 +82,5 @@ However, this time the actions emitted _will retain a conditional_, but it will

Pretty cool, right??

[`emitters.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/generator/emitters.rs
[`types.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/generator/types.rs
[`visiting_emitter.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/emitter/rewriting/*.rs
[`folding.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/generator/types.rs
5 changes: 4 additions & 1 deletion docs/src/devs/error_handling.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# Error Handling #

TODO
Errors are added to `ErrorGen`, defined in [`error.rs`], and reported between compiler phases.
If an error occurs during a compilation phase, the errors are reported and the compilation is aborted.

[`error.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/common/error.rs
8 changes: 5 additions & 3 deletions docs/src/devs/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,13 @@ Parsing:
- The [Pest book](https://pest.rs/book/)

## `whamm!` Implementation Concepts ##

The [_four phases_ of compilation](compiler_phases.md):
1. [Parse](parsing.md)
2. [Verify](verifying.md)
3. [Encode as a `BehaviorTree`](behavior_tree.md)
4. [Emit](emitting.md)
2. Configure the `Whamm!` [Core Library](./core_lib.md) (if needed)
3. [Verify](verifying.md)
4. [Translate](translate.md) AST into the injection strategy's representation
5. [Emit](emit/emitting.md)

Other helpful concepts:
- The `whamm!` [CLI](cli.md)
Expand Down
7 changes: 3 additions & 4 deletions docs/src/devs/parsing.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,7 @@ This AST is leveraged in different ways for each of the subsequent compiler phas

During [**verification**](verifying.md), the AST is used to build the `SymbolTable` and perform type checking.

While [**building the behavior tree**](behavior_tree.md), the AST is used to inform what the behavior should be as instrumentation is being injected into the target program (for bytecode rewriting).
Since the AST encodes the events utilized by the instrumentation and the predicates that must be partially evaluated during injection, the built behavior tree encodes a flow of actions customized to the instrumentation to be emitted.
While building the behavior tree, a _simpler variation of the AST_ is created to optimize the lookup of information that is relevant during the emit phase.
While [**translating the AST**](translate.md) into the injection strategy's representation, the AST is visited and restructured in a way that is simpler to compile for each strategy.
Each node contains new data unique to each strategy that is helpful while emitting.

While [**emitting**](emitting.md), the _simpler AST variation_ mentioned above is used to lookup global statements and iterate over probe definitions to inject them into locations-of-interest in the Wasm program.
While [**emitting**](emit/emitting.md), the _simpler AST variation_ mentioned above is used to lookup global statements and iterate over probe definitions to inject them into locations-of-interest in the Wasm program.
3 changes: 3 additions & 0 deletions docs/src/devs/translate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Phase 4: AST Translation #

TODO
7 changes: 4 additions & 3 deletions docs/src/devs/verifying.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Phase 2: Verify #
# Phase 3: Verify #

Here is documentation describing how we _verify_ `.mm` scripts.

Expand Down Expand Up @@ -32,7 +32,7 @@ See the [probes syntax documentation] for a helpful CLI tool that enables the us


[`verifier/types.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/verifier/types.rs
[`InitGenerator` documentation]: emitting.md#parta-initgenerator
[`InitGenerator` documentation]: emit/emitting.md#parta-initgenerator
[probes syntax documentation]: ../intro/syntax/probes.md#helpful-info-in-cli
[`whamm_parser.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/parser/whamm_parser.rs

Expand All @@ -56,4 +56,5 @@ There are great resources online that teach about the visitor pattern if that is

## The `TypeChecker` ##

NOTE: This functionality hasn't been fully implemented! More docs to come post-implementation!
The type checker then visits the AST and uses the `SymbolTable` to verify that variable usage is appropriate.
It can find out-of-scope usages, invalid method invocations, misused types, etc.
Loading

0 comments on commit 462e14f

Please sign in to comment.