-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: infra #183
docs: infra #183
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# General Overview | ||
|
||
Firstly, we aim to implement the following types of tests and benchmarks: | ||
- Wasm spectest: Programs generated from Wasm spectest. | ||
- Benchmarks: Programs we manually write in Wasm or Rust and compile to Wasm. | ||
|
||
Both types of programs will undergo the same pipeline: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's true that so far, these two types of programs have been handled similarly, but going forward, this is likely to change to give a better experience with both of these tools. So in this document, I would focus specifically on our goals for the test suite and leave benchmarks out of scope. |
||
- Compile to zkasm. | ||
- Run the generated zkasm & measure performance. | ||
- Track changes. | ||
- Alert in GitHub PRs in case of degradation. | ||
|
||
We use the snapshot-testing pattern in many places (link). | ||
|
||
Performance data is collected from both tests and benchmarks. | ||
Benchmarks also serve as global tests. | ||
|
||
## Test Creation and Generation | ||
We use a snapshot-based pattern for generating Wasm from spectest wast files. Benchmarks are added manually. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should re-consider this design decision and explore what our other options are:
We need to understand what are the pros and cons of each approach and see which one is the most appropriate for our goals. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I created an issue where I suggest to discuss if we should get rid of snapshot testing: #188 |
||
|
||
Our goal is to simplify adding benchmarks (simply add a `.wat` file to the benchmarks directory, and it automatically goes through the pipeline) and to automatically pull all updates from Wasm spectests. | ||
|
||
## Wasm Compiling | ||
We aim to have a single test that compiles all `.wat` files to zkasm and writes results to a CSV file in a snapshot-based pattern. This script should be capable of: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean by "single test" here? A single Rust test case? Or a single shell script to run tests? |
||
- Compiling a single file with verbose compiler messages. | ||
- Recompiling all files and saving them. | ||
- Compiling all files and checking if the results are the same as the committed files (used in CI). | ||
|
||
## Running Compiled zkasm | ||
We intend to implement a single command to run all zkasm and update the state table in a snapshot-based pattern. This script should be capable of: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again, let's focus on tests here. We want a command that runs all tests, but not necessarily the (slow) benchmarks. |
||
- Running a single file with verbose zkasm processor messages. | ||
- Rerunning all files and updating state files. | ||
- Running all files and checking that the state is the same as in the state files (used in CI). | ||
|
||
## Developer Tooling | ||
We aim for easy, intuitive, and fixed commands for the following tasks: | ||
- Recompile a single `.wat` file. | ||
- Recompile all `.wat` files and update zkasm files. | ||
- Compile all `.wat` files and check if the results are the same as in the zkasm files. | ||
- Run a single zkasm file. | ||
- Run all zkasm files and check the diff with the state. | ||
- Run all zkasm files and update the state. | ||
- Perform the entire CI pipeline locally. | ||
|
||
## Testing Wasm | ||
We have a script that tests all tests and benchmarks to ensure they are correct programs. It sets up the environment, runs Wasm, and checks if all programs finish with code 0. | ||
|
||
## Accepting PRs | ||
Our CI checks that tests and benchmarks are correct, that committed zkasm files correspond to files generated by the compiler, and that state files contain actual data. | ||
|
||
We plan to implement a tool that automatically checks for any degradation in state files, such as performance slowdowns or failing tests that previously passed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's ignore the performance aspect for now and focus on correctness. Right now, the review of changes to CSV snapshots is a manual process done by human reviewers. We want to move to a world where this part of the review is done automatically. Can we try to summarize here the rules that such a bot would use to better understand which information we need to track? This bot should also handle a few exceptional situations:
|
||
|
||
# Steps to Achieve "Finalized" Infrastructure | ||
- [ ] Rewrite wast parsing to a generic approach. | ||
- [ ] Reorder `.wat` and zkasm files: move all state.csv to one directory, consider removing hand-written tests, and possibly merge some tests into a single file. | ||
- [ ] Introduce asserts as host functions in zkasm. | ||
- [ ] Separate different parts of the infrastructure into different files. | ||
- [ ] Add npm caching in CI. | ||
- [ ] Disable unused tests in CI. | ||
- [ ] Create a developer CLI tool and aim to stabilize it. | ||
- [ ] Implement a GitHub bot that alerts if any of our state.csv files degrade. | ||
|
||
# Technical Details | ||
Here we will provide a description of the implemented infrastructure and instructions on how to add new hand-written tests or benchmarks. TODO: This section will be updated as the major part of the infrastructure will be rewritten. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep benchmarks out of the scope of this proposal and delegate it to #182.