-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce performance benchmarks for Stage 1 ZK WASM implementation #90
Comments
I took a simple program computing SHA256 of a constant string: https://github.com/akashin/bench/blob/sha256/src/main.rs And generated WASM for it https://gist.github.com/akashin/b038a9e9fcc9bb4ccf0dc5b990f5d9df with following commands: cargo build --target wasm32-unknown-unknown --release
# Before strip: 1.8M
# After strip: 37K
wasm-strip cranelift-bench.wasm The final program has ~16k WASM ops.
|
Another very simple program computing fibonacci: https://github.com/akashin/bench/blob/fibo/src/main.rs Here is generated WAT: https://gist.github.com/akashin/326d3273ec98436e80813ad83f24dfda It contains ~10k WASM ops, which is much more than I expected. UPD: The actual fibonacci code is relatively small, so we need to find a way to trim the rest functions that we don't need: (func (;4;) (type 0)
(local i32 i64 i64 i32)
global.get 0
i32.const 32
i32.sub
local.tee 0
global.set 0
i64.const 0
local.set 1
i64.const 1
local.set 2
i32.const 10000
local.set 3
loop ;; label = @1
local.get 2
local.get 1
local.get 2
i64.add
local.tee 1
i64.add
local.tee 2
local.get 1
local.get 2
i64.add
local.tee 2
i64.add
local.tee 1
local.get 2
local.get 1
i64.add
local.tee 2
i64.add
local.tee 1
local.get 2
local.get 1
i64.add
local.tee 2
i64.add
local.tee 1
local.get 2
local.get 1
i64.add
local.tee 1
i64.add
local.set 2
local.get 3
i32.const -10
i32.add
local.tee 3
br_if 0 (;@1;)
end
local.get 0
local.get 1
i64.store
block ;; label = @1
local.get 1
i64.const -2872092127636481573
i64.ne
br_if 0 (;@1;)
local.get 0
i32.const 32
i32.add
global.set 0
return
end
local.get 0
i32.const 0
i32.store offset=8
i32.const 0
local.get 0
i32.const 1048592
local.get 0
i32.const 8
i32.add
i32.const 1048612
call 3
unreachable) |
I've managed to convince Rust to generate the small program for fibonacci by using #![no_main]
#![no_std]
use panic_halt as _;
extern "C" {
fn assert_eq(actual: i64, expected: i64);
}
#[no_mangle]
pub fn main() {
let mut a = 0i64;
let mut b = 1i64;
for _ in 0..10000 {
let c = a.wrapping_add(b);
a = b;
b = c;
}
unsafe {
assert_eq(a, -2872092127636481573);
}
} The generated WAT file is 61 lines long and only contains the relevant Fibonacci implementation: (module
(type (;0;) (func (param i64 i64)))
(type (;1;) (func))
(import "env" "assert_eq" (func (;0;) (type 0)))
(func (;1;) (type 1)
(local i64 i64 i32)
i64.const 1
local.set 0
i64.const 0
local.set 1
i32.const 10000
local.set 2
loop ;; label = @1
local.get 0
local.get 1
i64.add
local.tee 1
local.get 0
i64.add
local.tee 0
local.get 1
i64.add
local.tee 1
local.get 0
i64.add
local.tee 0
local.get 1
i64.add
local.tee 1
local.get 0
i64.add
local.tee 0
local.get 1
i64.add
local.tee 1
local.get 0
i64.add
local.tee 0
local.get 1
i64.add
local.tee 1
local.get 0
i64.add
local.set 0
local.get 2
i32.const -10
i32.add
local.tee 2
br_if 0 (;@1;)
end
local.get 1
i64.const -2872092127636481573
call 0)
(memory (;0;) 16)
(global (;0;) (mut i32) (i32.const 1048576))
(global (;1;) i32 (i32.const 1048576))
(global (;2;) i32 (i32.const 1048576))
(export "memory" (memory 0))
(export "main" (func 1))
(export "__data_end" (global 1))
(export "__heap_base" (global 2))) I think this would be good enough for our first benchmark and I believe we already support all necessary opcodes. As a next step, I will see if I can reuse the same techniques to build SHA2 benchmark and see which instructions it uses. Related to Fibonacci benchmark there is one outstanding questions: How to generate a I've seen |
There’s a What you will need to do in this case is to look at the |
Makes sense, we can always do this as a part of |
I think for first benchmarks it will be easier to use only 64-bit operations, it will be much easier to implement. i32 now have a problem with jumps, and maybe will contain more. Maybe it is even worth it to write benchmark in |
Also, on which opcodes should we focus now? From list above shifts and rotations are still not implemented, should we start work on them or chose another benchmark? Should we better ship 32-bit computations ASAP? |
I think we should start with this Fibonacci benchmark in #90 (comment). It only uses basic opcodes with i32 and i64 and would be a good stepping stone for a more complex ECDSA benchmark. There is no arguing that it would be easier to use i64 for counter in this specific case, but we will very soon need basic i32 operations anyway for other benchmarks, so I think we should include this in Stage 1. To sum up, for the first Fibonacci benchmark generated from Rust we need:
|
Here is SHA256 benchmark with no_std approach + using existing crate sha2: https://github.com/akashin/bench/blob/9cf8fc4c25e685f8b842db5ad2e57f9df2b0b125/src/main.rs It results in 5.5k lines of WAT, with a few new instructions on top of the set for Fibonacci:
This seems like a sizable enough target to me to declare as a cutoff for the instruction set for Stage 1 as it touches most of the integer operations and also involves memory loads and stores. I'll repeat the same experiment with hand-written SHA256 to see if this helps trim the program size and look at a similar Keccac benchmark for which we already have dedicated instruction in ZK ASM. |
Handwritten SHA256 benchmark: https://github.com/akashin/bench/blob/21264dfaad1e0b4788e19bb25b5a0314bc879eb9/src/main.rs and associated WAT: https://gist.github.com/akashin/9804635dc299f7fdc727eaefeb94f1df Very similar results to using It would actually be really interesting how the actual number of ZK ASM counters differs between these two programs, given that SHA2 is supposed to be more "optimized", even though it has more lines in WAT. |
This PR changes how run-tests.zkasm.js communicates results to the zkasm-result.py. Before, we used to parse log lines, now we pass a json object with a test result for each of the tests. This will allow storing more detailed information like performance counters for #90 and error messages.
This PR changes how run-tests.zkasm.js communicates results to the zkasm-result.py. Before, we used to parse log lines, now we pass a json object with a test result for each of the tests. This will allow storing more detailed information like performance counters for #90 and error messages.
Here is the final version of sha256 that I'm importing: #![no_main]
#![no_std]
use panic_halt as _;
use sha2::{Sha256, Digest};
extern "C" {
fn assert_eq(actual: u64, expected: u64);
}
// fn assert_eq(actual: u64, expected: u64) {
// assert_eq!(actual, expected);
// }
fn subslice_to_u64(slice: &[u8], begin: usize, end: usize) -> u64 {
let mut result = 0u64;
for i in begin..end {
result <<= 8;
result += slice[i] as u64;
}
result
}
#[no_mangle]
pub fn main() {
let mut hasher = Sha256::new();
hasher.update(b"hello world");
let result = hasher.finalize();
unsafe {
assert_eq(subslice_to_u64(&result, 0, 8), 13352372148217134600);
assert_eq(subslice_to_u64(&result, 8, 16), 11902541952223915002);
assert_eq(subslice_to_u64(&result, 16, 24), 14160706888648589550);
assert_eq(subslice_to_u64(&result, 24, 32), 10414846460208074217);
}
} The hashes were generated with import hashlib
def subslice_to_u64(s, begin, end):
result = 0
for i in range(begin, end):
result <<= 8;
result += s[i]
return result
s = b"hello world"
h = hashlib.sha256(s)
d = h.digest()
print(subslice_to_u64(d, 0, 8))
print(subslice_to_u64(d, 8, 16))
print(subslice_to_u64(d, 16, 24))
print(subslice_to_u64(d, 24, 32)) |
@Akashin Is the |
Yes, you can find it here: https://github.com/akashin/bench/tree/sha256_final |
As a part of Stage 1 plan we aimed for at most 1000x slowdown compared to native execution.
We need to introduce a few representative benchmarks that allow to:
A few ideas:
The text was updated successfully, but these errors were encountered: