Update the whamm documentation (#174)

* update the README * high level pass at updating Rust mdbook * Flesh out the documentation * mark unfinished pages with TODO
ejrgilbert · Nov 11, 2024 · 462e14f · 462e14f
1 parent 8f1d32a
commit 462e14f
Show file tree

Hide file tree

Showing 33 changed files with 500 additions and 427 deletions.
diff --git a/README.md b/README.md
@@ -17,17 +17,6 @@ Take a look at the official [`whamm!` book](https://ejrgilbert.github.io/whamm/i
 
 ### Build ###
 
-Get [`orca`](https://github.com/thesuhas/orca), should be in parent directory at `../orca` (see Cargo.toml):
-```shell
-# Inside base directory of this project
-cd ..
-git clone git@github.com:thesuhas/orca.git
-cd orca
-git checkout DO_NOT_DELETE/whamm_dependency
-cd ../whamm
-```
-
-To run basic build:
 ```shell
 cargo build
 ```
@@ -37,7 +26,6 @@ cargo build
 In order to run the tests, a WebAssembly interpreter must be configured.
 The supported interpreters are:
 1. the Wizard engine interpreter. https://github.com/titzer/wizard-engine/tree/master
-   - Note that the Wizard interpreter does not run on Macs (yet...), so the Wasm reference interpreter will need to be configured in this context.
 2. the Wasm reference interpreter. https://github.com/WebAssembly/spec/tree/main/interpreter
 
 **How to build the [Wizard GH project]() to acquire these binaries:**
@@ -83,9 +71,9 @@ To specify log level:
 RUST_LOG={ error | warn | info | debug | trace | off } cargo run -- --app <path_to_app_wasm> --script <path_to_script> <path_for_compiled_output>
 ```
 
-To visually debug the decision tree used during Wasm bytecode emission:
+To use the utility that provides information about match rule globals/functions that can be leveraged by a probe's logic/predicate:
 ```shell
-cargo run -- vis-script --script <path_to_script>
+cargo run -- info --rule "<match_rule_glob>" # e.g. "wasm:opcode:br*"
 ```
 
 To run a script that uses special var types, at the moment this is `map` and `report` variables, do the following:
@@ -126,9 +114,9 @@ To be added:
 - `exception` throw/rethrow/catch events
 
 Example:
-`wasi:http:send_req:alt`
-`wasm:bytecode:call:alt`
-`wasm:fn:enter:before`
+- `wasi:http:send_req:alt`
+- `wasm:opcode:call:alt`
+- `wasm:fn:enter:before`
 
 # The book #
 

diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md
@@ -13,23 +13,29 @@
     - [WIP - Tuples](intro/syntax/tuples.md)
     - [WIP - Maps](intro/syntax/maps.md)
     - [WIP - Functions](intro/syntax/functions.md)
+    - [Report Variables](intro/syntax/report_vars.md)
+    - [Unshared Variables](intro/syntax/unshared_vars.md)
+    - [Shared Variables](intro/syntax/shared_vars.md)
     - [Probes](intro/syntax/probes.md)
     - [Scripts](intro/syntax/scripts.md)
   - [Events](intro/events.md)
   - [WIP - Libraries](intro/libraries.md)
   - [WIP - Testing](intro/testing.md)
   - [Injection Strategies](intro/injection_strategies.md)
 
-- [WIP - Example Use Cases](examples/intro.md)
-  - [WIP - Branch Monitor](examples/branch_monitor.md)
+- [Example Use Cases](examples/intro.md)
+  - [Branch Monitor](examples/branch_monitor.md)
 
 - [Developers](devs/intro.md)
   - [Phases of Compilation](devs/compiler_phases.md)
     - [Parse](devs/parsing.md)
+    - [TODO - Core Library](devs/core_lib.md)
     - [Verify](devs/verifying.md)
-    - [Encode as a `BehaviorTree`](devs/behavior_tree.md)
-    - [Emit](devs/emitting.md)
-  - [TODO - CLI](devs/cli.md)
-  - [TODO - Testing](devs/testing.md)
-  - [TODO - Error Handling](devs/error_handling.md)
+    - [TODO - Translate](devs/translate.md)
+    - [Emit](devs/emit/emitting.md)
+      - [Rewriting](devs/emit/rewriting_target.md)
+      - [TODO - Engine](devs/emit/engine_target.md)
+  - [CLI](devs/cli.md)
+  - [Testing](devs/testing.md)
+  - [Error Handling](devs/error_handling.md)
   - [Contributors to `whamm!`](devs/contributors.md)
diff --git a/docs/src/devs/behavior_tree.md b/docs/src/devs/behavior_tree.md
diff --git a/docs/src/devs/cli.md b/docs/src/devs/cli.md
@@ -1,3 +1,3 @@
 # The `whamm!` CLI #
 
-TODO
+The CLI is defined using the `clap` Rust crate.
diff --git a/docs/src/devs/compiler_phases.md b/docs/src/devs/compiler_phases.md
@@ -2,18 +2,19 @@
 
 First, what is meant by the term "compilation" depends on the selected injection strategy.
 
-For **bytecode rewriting**, compilation means generating a new _instrumented_ variation of the passed program.
+For **bytecode rewriting**, compilation means generating a new _instrumented_ variation of the application's bytecode.
 
-For **direct engine support**, compilation means compiling the `.mm` script to a `.v3` program that interfaces with an engine to instrument the program dynamically.
-The original program _is not touched_ when using this strategy.
+For **direct engine support**, compilation means compiling the `.mm` script to a new Wasm module that interfaces with an engine to instrument the program dynamically.
+The original program is _not touched_ **and** _not provided_ when using this strategy.
 
 The first three phases of `whamm!` compilation are identical for both strategies.
-The final `emit` phase is where the variation lies.
-This is because "emitting" for **bytecode rewriting** means using the `walrus` library to insert new instructions into the program.
-Whereas "emitting" for **direct engine support** means emitting `Virgil` code to specify the instrumentation probes in a new format that leverages the target engine's instrumentation API.
+The `translate` and `emit` phases vary between injection strategy.
+This is because "emitting" for **bytecode rewriting** means using the `orcs` library to insert new instructions into the program.
+Whereas "emitting" for **direct engine support** means emitting a Wasm module encoding _where to instrument_ and the callbacks to attach at the probed sites by interfacing with the engine at application runtime.
 
 These are the four phases of compilation:
 1. [Parse](parsing.md)
-2. [Verify](verifying.md)
-3. [Encode as a `BehaviorTree`](behavior_tree.md)
-4. [Emit](emitting.md)
+2. Configure the `Whamm!` [Core Library](./core_lib.md) (if needed)
+3. [Verify](verifying.md)
+4. [Translate](translate.md) AST into the injection strategy's representation
+5. [Emit](emit/emitting.md)
diff --git a/docs/src/devs/contributors.md b/docs/src/devs/contributors.md
@@ -5,3 +5,5 @@ These are the people to thank when you're using `whamm!`...either genuinely or s
 [**Elizabeth Gilbert**](https://se-phd.s3d.cmu.edu/People/students/student-bios/gilbert-elizabeth.html), PhD student at **Carnegie Mellon University (CMU)**.
 
 [**Alex Bai**](https://www.eecs.tufts.edu/~abai02/), undergrad student at **Tufts University**.
+
+[**Wavid Bowman**](), undergrad student at **University of Florida**.
diff --git a/docs/src/devs/core_lib.md b/docs/src/devs/core_lib.md
@@ -0,0 +1,3 @@
+# Phase 2: Configure The `Whamm!` Core Library #
+
+TODO
diff --git a/docs/src/devs/emit/emitting.md b/docs/src/devs/emit/emitting.md
@@ -0,0 +1,21 @@
+# Phase 5: Emit #
+
+Here is documentation describing how we _emit_ `.mm` scripts.
+
+## Some Helpful Concepts ##
+
+**What is a `generator`?**
+A `generator` is used to traverse some representation of logic in an abstract way.
+It then calls the `emitter` when appropriate to actually emit the code in the target representation.
+
+**What is an `emitter`?**
+The `emitter` exposes an API that can be called to emit code in the target representation.
+There will be as many emitters as there are target representations supported by the language.
+
+## The Injection Strategies ##
+
+The code that is emitted, and the methodology in which emitting happens, depends on the injection strategy specified by the user.
+
+There are currently two supported injection strategies:
+1. [Bytecode Rewriting](./engine_target.md)
+2. Interfacing with an [engine](./rewriting_target.md)
diff --git a/docs/src/devs/emit/engine_target.md b/docs/src/devs/emit/engine_target.md
@@ -0,0 +1,3 @@
+# Interfacing with an Engine #
+
+TODO
diff --git a/docs/src/devs/emitting.md → docs/src/devs/emit/rewriting_target.md b/docs/src/devs/emitting.md → docs/src/devs/emit/rewriting_target.md
@@ -1,48 +1,34 @@
-# Phase 4: Emit #
+# Bytecode Rewriting #
 
-Here is documentation describing how we _emit_ `.mm` scripts.
-
-## Some Helpful Concepts ##
-
-**What is a `generator`?**
-A `generator` is used to traverse some representation of logic in an abstract way.
-It then calls the `emitter` when appropriate to actually emit the code in the target representation.
-
-**What is an `emitter`?**
-The `emitter` exposes an API that can be called to emit code in the target representation.
-There will be as many emitters as there are target representations supported by the language.
-
-In the context of `whamm!`, there are two `generator`s.
+For bytecode rewriting, there are two `generator`s.
 Each of these generators are used for a specific reason while emitting instrumentation.
 The `InitGenerator` is run first to emit the parts of the `.mm` script that need to exist _before_ any probe actions are emitted, such as functions and global state.
-The `InstrGenerator` is run second to emit the probes. 
+The `InstrGenerator` is run second to emit the probes while visiting the `app.wasm` bytecode (represented as an in-memory IR).
+
+Both of these generators use the `emitter` that emits Wasm code.
+The `emitter` uses utilities that centralize the Wasm emitting logic found at [`utils.rs`]
 
-Both of these generators use the `emitter` that emits instrumentation as configured by the end-user (either via _bytecode rewriting_ or emitting a `.v3` file that interfaces with an engine with direct support for instrumentation).
+[`utils.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/emitter/utils.rs
 
-## 4.1 `InitGenerator` ##
+## 1. `InitGenerator` ##
 
 The [`init_generator.rs`] traverses the AST to emit functions and globals that need to exist before emitting probes.
 The `run` function is the entrypoint for this generator.
 This follows the visitor software design pattern.
 There are great resources online that teach about the visitor pattern if that is helpful for any readers.
 
-Consider _bytecode rewriting_.
 This generator emits new Wasm functions and globals into the program with associated Wasm IDs.
 These IDs are stored in the `SymbolTable` for use while running the `InstrGenerator`.
 When emitting an instruction that either calls an emitted function or does some operation with an emitted global, the name of that symbol is looked up in the `SymbolTable` to then use the saved ID in the emitted instruction.
 
-[`init_generator.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/generator/init_generator.rs
+[`init_generator.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/generator/rewriting/init_generator.rs
 
-## 4.2 `InstrGenerator` ##
-
-The [`instr_generator.rs`] traverses the `BehaviorTree` which encodes the logic of the instrumentation to emit.
-The `run` function is the entrypoint for this generator.
-This follows the visitor software design pattern.
-There are great resources online that teach about the visitor pattern if that is helpful for any readers.
+## 2. `InstrGenerator` ##
 
-This `generator` calls into the `emitter` to gradually traverse the program in search for the locations corresponding to each probe.
+The [`instr_generator.rs`] calls into the `emitter` to gradually traverse the application in search for the locations that correspond to probe events in the `.mm`'s AST.
+When a probed location is found, the `generator` emits Wasm code into the application at that point through `emitter` utilities.
 
-[`instr_generator.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/generator/instr_generator.rs
+[`instr_generator.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/generator/rewriting/instr_generator.rs
 
 ### Constant Propagation and Folding!! ###
 
@@ -51,14 +37,14 @@ There are lots of resources online explaining these concepts if that would be us
 
 The `whamm info` command helps users see various globals that are in scope when using various probe match rules.
 All of these global variables are defined by `whamm!`'s compiler and _should only be emitted as constant literals_.
-If the variable were ever emitted into an instrumented program or `.v3` monitor, the program would fail to execute since the variable _would not be defined_.
+If the variable were ever directly emitted into an instrumented program, with no compiler-provided definition, the program would fail to execute since the variable _would not be defined_.
 
 `whamm!` uses constant propagation and folding to remedy this situation!
 
-The `define_*` functions in [`emitters.rs`] are examples of **how compiler constants are defined**.
+The `define` function in [`visiting_emitter.rs`] is **how compiler constants are defined** while traversing the application bytecode.
 These specific globals are defined in the emitter since their definitions are tied to locations in the Wasm program being instrumented.
 
-The `ExprFolder` in [`types.rs`] performs constant propagation and folding on expressions.
+The `ExprFolder` in [`folding.rs`] performs constant propagation and folding on expressions.
 
 When considering a _predicated probe_, this behavior can be quite interesting.
 Take the following probe definition for example:
@@ -96,5 +82,5 @@ However, this time the actions emitted _will retain a conditional_, but it will
 
 Pretty cool, right??
 
-[`emitters.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/generator/emitters.rs
-[`types.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/generator/types.rs
+[`visiting_emitter.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/emitter/rewriting/*.rs
+[`folding.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/generator/types.rs
diff --git a/docs/src/devs/error_handling.md b/docs/src/devs/error_handling.md
@@ -1,3 +1,6 @@
 # Error Handling #
 
-TODO
+Errors are added to `ErrorGen`, defined in [`error.rs`], and reported between compiler phases.
+If an error occurs during a compilation phase, the errors are reported and the compilation is aborted.
+
+[`error.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/common/error.rs
diff --git a/docs/src/devs/intro.md b/docs/src/devs/intro.md
@@ -9,11 +9,13 @@ Parsing:
 - The [Pest book](https://pest.rs/book/)
 
 ## `whamm!` Implementation Concepts ##
+
 The [_four phases_ of compilation](compiler_phases.md):
 1. [Parse](parsing.md)
-2. [Verify](verifying.md)
-3. [Encode as a `BehaviorTree`](behavior_tree.md)
-4. [Emit](emitting.md)
+2. Configure the `Whamm!` [Core Library](./core_lib.md) (if needed)
+3. [Verify](verifying.md)
+4. [Translate](translate.md) AST into the injection strategy's representation
+5. [Emit](emit/emitting.md)
 
 Other helpful concepts:
 - The `whamm!` [CLI](cli.md)

diff --git a/docs/src/devs/parsing.md b/docs/src/devs/parsing.md
@@ -25,8 +25,7 @@ This AST is leveraged in different ways for each of the subsequent compiler phas
 
 During [**verification**](verifying.md), the AST is used to build the `SymbolTable` and perform type checking.
 
-While [**building the behavior tree**](behavior_tree.md), the AST is used to inform what the behavior should be as instrumentation is being injected into the target program (for bytecode rewriting).
-Since the AST encodes the events utilized by the instrumentation and the predicates that must be partially evaluated during injection, the built behavior tree encodes a flow of actions customized to the instrumentation to be emitted.
-While building the behavior tree, a _simpler variation of the AST_ is created to optimize the lookup of information that is relevant during the emit phase.
+While [**translating the AST**](translate.md) into the injection strategy's representation, the AST is visited and restructured in a way that is simpler to compile for each strategy.
+Each node contains new data unique to each strategy that is helpful while emitting.
 
-While [**emitting**](emitting.md), the _simpler AST variation_ mentioned above is used to lookup global statements and iterate over probe definitions to inject them into locations-of-interest in the Wasm program.
+While [**emitting**](emit/emitting.md), the _simpler AST variation_ mentioned above is used to lookup global statements and iterate over probe definitions to inject them into locations-of-interest in the Wasm program.
diff --git a/docs/src/devs/translate.md b/docs/src/devs/translate.md
@@ -0,0 +1,3 @@
+# Phase 4: AST Translation #
+
+TODO
diff --git a/docs/src/devs/verifying.md b/docs/src/devs/verifying.md
@@ -1,4 +1,4 @@
-# Phase 2: Verify #
+# Phase 3: Verify #
 
 Here is documentation describing how we _verify_ `.mm` scripts.
 
@@ -32,7 +32,7 @@ See the [probes syntax documentation] for a helpful CLI tool that enables the us
 
 
 [`verifier/types.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/verifier/types.rs
-[`InitGenerator` documentation]: emitting.md#parta-initgenerator
+[`InitGenerator` documentation]: emit/emitting.md#parta-initgenerator
 [probes syntax documentation]: ../intro/syntax/probes.md#helpful-info-in-cli
 [`whamm_parser.rs`]: https://github.com/ejrgilbert/whamm/blob/master/src/parser/whamm_parser.rs
 
@@ -56,4 +56,5 @@ There are great resources online that teach about the visitor pattern if that is
 
 ## The `TypeChecker` ##
 
-NOTE: This functionality hasn't been fully implemented! More docs to come post-implementation!
+The type checker then visits the AST and uses the `SymbolTable` to verify that variable usage is appropriate.
+It can find out-of-scope usages, invalid method invocations, misused types, etc.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -5,3 +5,5 @@ These are the people to thank when you're using `whamm!`...either genuinely or s
		[Elizabeth Gilbert](https://se-phd.s3d.cmu.edu/People/students/student-bios/gilbert-elizabeth.html), PhD student at Carnegie Mellon University (CMU).

		[Alex Bai](https://www.eecs.tufts.edu/~abai02/), undergrad student at Tufts University.

		[Wavid Bowman](), undergrad student at University of Florida.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# Phase 2: Configure The `Whamm!` Core Library #

		TODO