Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Loads and Stores for SHA256 benchmark #185

Merged
merged 5 commits into from
Feb 6, 2024
Merged

Fix Loads and Stores for SHA256 benchmark #185

merged 5 commits into from
Feb 6, 2024

Conversation

aborg-dev
Copy link

@aborg-dev aborg-dev commented Jan 9, 2024

This PR adds the remaining features necessary to run SHA256 benchmark:

  • Misaligned loads and stores
  • Loads and stores of size < 64 bit

There instructions are generated by rustc compiler for wasm32 target and are quite common.

The current implementation is not verifiable as it uses external (free) inputs. It turned out to be quite challenging to implement them in a verifiable fashion because they involve many bitwise operations which are quite expensive.

For now, there is value in having a non-verifiable implementation that makes sure we can run tests and benchmarks related to memory. I've linked a few TODOs to do a verifiable implementation, but first, we would need a proper design for that part. Most likely, we will need to modify the zkAsm processor to support these operations efficiently.

Implementation-wise, there are three steps:

  • Conversion from address + offset to slot + offset
  • Read/write the value at the correct offset
  • Narrowing down the value to the desired type width

@aborg-dev aborg-dev mentioned this pull request Jan 9, 2024
@aborg-dev aborg-dev changed the base branch from main to stack_fixes January 9, 2024 13:29
Base automatically changed from stack_fixes to main January 9, 2024 16:43
@aborg-dev aborg-dev force-pushed the sha_full branch 2 times, most recently from 988b67f to 776625c Compare January 10, 2024 12:04
@aborg-dev aborg-dev force-pushed the sha_full branch 10 times, most recently from e481d12 to e232698 Compare January 29, 2024 14:54
@aborg-dev aborg-dev changed the title SHA256 now fully runs Fix Loads and Stores for SHA256 benchmark Jan 29, 2024
@aborg-dev aborg-dev marked this pull request as ready for review January 29, 2024 15:00
cranelift/zkasm_data/benchmarks/sha256/from_rust.wat Outdated Show resolved Hide resolved
}

// Handle the case when read spans two slots.
if rem + width > 8 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you really rely on this? What happens if wasm goes like i32.load offset=0 (i32.const 7)? I think there's no choice but to check alignment of the final computed (byte) address at runtime and/or unconditionally load two slots.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, I'm not sure whether I can rely on the fact that the address in the load/store register will be word-aligned. So we at least need an assert to check this.

Given that this is a non-trivial amount of work that we might need to redo anyway soon, I'll start with an assert in the code to check that the address is word-aligned. If it is not triggered by the Rust benchmarks that we use, that would be interesting evidence to investigate how exactly these addresses are generated. If it is triggered, we'll need to address it when we have the test/benchmark that exercises it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this was very well spotted. I've added an assert and it triggered. I'll work on fixing it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW for a minimal reproducer something like read_unaligned would do. In architectures that require aligned loads the compiler backend would responsibly split the load into aligned loads, but WebAssembly specifically does not impose that sort of requirement, so the backend doesn't need to worry about unaligned addresses at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took some effort, but I rewrote the code without the assumption that the address is word-aligned.
I ended up unconditionally reading and writing to two slots, though I think in the future it would not be that hard to optimize the code to skip the second write when possible using a jump.

@aborg-dev aborg-dev force-pushed the sha_full branch 2 times, most recently from ebed326 to 2484725 Compare January 30, 2024 14:38
@aborg-dev aborg-dev changed the base branch from main to sha_code January 30, 2024 14:38
Base automatically changed from sha_code to main January 31, 2024 16:55
@aborg-dev aborg-dev marked this pull request as draft February 1, 2024 10:09
@aborg-dev
Copy link
Author

Converted this back to draft while I figure out the way to fix unaligned register addresses.

@aborg-dev aborg-dev marked this pull request as ready for review February 2, 2024 16:44
@aborg-dev aborg-dev requested a review from nagisa February 2, 2024 16:45
@aborg-dev aborg-dev requested review from MCJOHN974 and mooori February 6, 2024 11:08
This PR adds the remaining features necessary to run SHA256 benchmark:

    Misaligned loads and stores
    Loads and stores of size < 64 bit

There instructions are generated by rustc compiler for wasm32 target and are quite common.

The current implementation is not verifiable as it uses external (free) inputs. It turned out to be quite challenging to implement them in a verifiable fashion because they involve many bitwise operations which are quite expensive.

For now, there is value in having a non-verifiable implementation that makes sure we can run tests and benchmarks related to memory. I've linked a few TODOs to do a verifiable implementation, but first, we would need a proper design for that part. Most likely, we will need to modify the zkAsm processor to support these operations efficiently.

Implementation-wise, there are three steps:

    Conversion from address + offset to slot + offset
    Read/write the value at the correct offset
    Narrowing down the value to the desired type width
The statement A + 3 > 8 seems to be parsed as A + (3 > 8) after the
translaction by zkAsm intrepreter, so adding parenthesis to disambiguate
this.
Comment on lines +22 to +28
${ (E) % 8 } => A
${ (E) / 8 } => E
$ => D :MLOAD(MEM:E + 1)
$ => B :MLOAD(MEM:E)
${ B >> (8 * A) } => B
${ (D << (128 - 8 * (A + 1))) | B } => B
${ B & ((1 << 8) - 1) } => B
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems alright, although knowing zkasm, I think a quick improvement in steps in the future is going to involve JNZ on the result of % 8 and otherwise execute a single load only. Probably similar with stores as well.

@aborg-dev aborg-dev added this pull request to the merge queue Feb 6, 2024
Merged via the queue into main with commit 3d0d0a7 Feb 6, 2024
21 checks passed
@aborg-dev aborg-dev deleted the sha_full branch February 6, 2024 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants