Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: infra #183

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions docs/zkasm/infra.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# General Overview

Firstly, we aim to implement the following types of tests and benchmarks:
- Wasm spectest: Programs generated from Wasm spectest.
- Benchmarks: Programs we manually write in Wasm or Rust and compile to Wasm.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep benchmarks out of the scope of this proposal and delegate it to #182.


Both types of programs will undergo the same pipeline:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that so far, these two types of programs have been handled similarly, but going forward, this is likely to change to give a better experience with both of these tools.
For example, tests must be easy and fast to run and must present aggregated reports across many tests. Benchmarks, on the other hand, can take a long time to run, will be compiled with different optimization flags, and will likely become parameterized by the size of the input (e.g. N-th Fibonacci number with N = 1..10^6).

So in this document, I would focus specifically on our goals for the test suite and leave benchmarks out of scope.

- Compile to zkasm.
- Run the generated zkasm & measure performance.
- Track changes.
- Alert in GitHub PRs in case of degradation.

We use the snapshot-testing pattern in many places (link).

Performance data is collected from both tests and benchmarks.
Benchmarks also serve as global tests.

## Test Creation and Generation
We use a snapshot-based pattern for generating Wasm from spectest wast files. Benchmarks are added manually.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should re-consider this design decision and explore what our other options are:

  • [Current] Generate WAT files from WAST files and run them through WAT -> ZKASM pipeline
  • Generate ZKASM files directly from WAST files and don't commit WAT files to Git
  • Don't commit ZKASM files to Git and instead pass generated ZKASM code directly to JS test runner

We need to understand what are the pros and cons of each approach and see which one is the most appropriate for our goals.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue where I suggest to discuss if we should get rid of snapshot testing: #188


Our goal is to simplify adding benchmarks (simply add a `.wat` file to the benchmarks directory, and it automatically goes through the pipeline) and to automatically pull all updates from Wasm spectests.

## Wasm Compiling
We aim to have a single test that compiles all `.wat` files to zkasm and writes results to a CSV file in a snapshot-based pattern. This script should be capable of:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "single test" here? A single Rust test case? Or a single shell script to run tests?

- Compiling a single file with verbose compiler messages.
- Recompiling all files and saving them.
- Compiling all files and checking if the results are the same as the committed files (used in CI).

## Running Compiled zkasm
We intend to implement a single command to run all zkasm and update the state table in a snapshot-based pattern. This script should be capable of:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, let's focus on tests here. We want a command that runs all tests, but not necessarily the (slow) benchmarks.

- Running a single file with verbose zkasm processor messages.
- Rerunning all files and updating state files.
- Running all files and checking that the state is the same as in the state files (used in CI).

## Developer Tooling
We aim for easy, intuitive, and fixed commands for the following tasks:
- Recompile a single `.wat` file.
- Recompile all `.wat` files and update zkasm files.
- Compile all `.wat` files and check if the results are the same as in the zkasm files.
- Run a single zkasm file.
- Run all zkasm files and check the diff with the state.
- Run all zkasm files and update the state.
- Perform the entire CI pipeline locally.

## Testing Wasm
We have a script that tests all tests and benchmarks to ensure they are correct programs. It sets up the environment, runs Wasm, and checks if all programs finish with code 0.

## Accepting PRs
Our CI checks that tests and benchmarks are correct, that committed zkasm files correspond to files generated by the compiler, and that state files contain actual data.

We plan to implement a tool that automatically checks for any degradation in state files, such as performance slowdowns or failing tests that previously passed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ignore the performance aspect for now and focus on correctness.

Right now, the review of changes to CSV snapshots is a manual process done by human reviewers. We want to move to a world where this part of the review is done automatically. Can we try to summarize here the rules that such a bot would use to better understand which information we need to track?

This bot should also handle a few exceptional situations:

  • Adding a new spec test or manual WAT test (with possibly broken tests in it)
  • Dealing with temporarily broken tests (e.g. the case we had during i32 migration or when we add a new spec test that has some sub-tests passing, but that subset changes as we implement the necessary features)
  • Removing a manual WAT test


# Steps to Achieve "Finalized" Infrastructure
- [ ] Rewrite wast parsing to a generic approach.
- [ ] Reorder `.wat` and zkasm files: move all state.csv to one directory, consider removing hand-written tests, and possibly merge some tests into a single file.
- [ ] Introduce asserts as host functions in zkasm.
- [ ] Separate different parts of the infrastructure into different files.
- [ ] Add npm caching in CI.
- [ ] Disable unused tests in CI.
- [ ] Create a developer CLI tool and aim to stabilize it.
- [ ] Implement a GitHub bot that alerts if any of our state.csv files degrade.

# Technical Details
Here we will provide a description of the implemented infrastructure and instructions on how to add new hand-written tests or benchmarks. TODO: This section will be updated as the major part of the infrastructure will be rewritten.