-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Loads and Stores for SHA256 benchmark #185
Conversation
988b67f
to
776625c
Compare
e481d12
to
e232698
Compare
} | ||
|
||
// Handle the case when read spans two slots. | ||
if rem + width > 8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you really rely on this? What happens if wasm goes like i32.load offset=0 (i32.const 7)
? I think there's no choice but to check alignment of the final computed (byte) address at runtime and/or unconditionally load two slots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, I'm not sure whether I can rely on the fact that the address in the load/store register will be word-aligned. So we at least need an assert to check this.
Given that this is a non-trivial amount of work that we might need to redo anyway soon, I'll start with an assert in the code to check that the address is word-aligned. If it is not triggered by the Rust benchmarks that we use, that would be interesting evidence to investigate how exactly these addresses are generated. If it is triggered, we'll need to address it when we have the test/benchmark that exercises it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this was very well spotted. I've added an assert and it triggered. I'll work on fixing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW for a minimal reproducer something like read_unaligned
would do. In architectures that require aligned loads the compiler backend would responsibly split the load into aligned loads, but WebAssembly specifically does not impose that sort of requirement, so the backend doesn't need to worry about unaligned addresses at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took some effort, but I rewrote the code without the assumption that the address is word-aligned.
I ended up unconditionally reading and writing to two slots, though I think in the future it would not be that hard to optimize the code to skip the second write when possible using a jump.
ebed326
to
2484725
Compare
Converted this back to draft while I figure out the way to fix unaligned register addresses. |
This PR adds the remaining features necessary to run SHA256 benchmark: Misaligned loads and stores Loads and stores of size < 64 bit There instructions are generated by rustc compiler for wasm32 target and are quite common. The current implementation is not verifiable as it uses external (free) inputs. It turned out to be quite challenging to implement them in a verifiable fashion because they involve many bitwise operations which are quite expensive. For now, there is value in having a non-verifiable implementation that makes sure we can run tests and benchmarks related to memory. I've linked a few TODOs to do a verifiable implementation, but first, we would need a proper design for that part. Most likely, we will need to modify the zkAsm processor to support these operations efficiently. Implementation-wise, there are three steps: Conversion from address + offset to slot + offset Read/write the value at the correct offset Narrowing down the value to the desired type width
The statement A + 3 > 8 seems to be parsed as A + (3 > 8) after the translaction by zkAsm intrepreter, so adding parenthesis to disambiguate this.
${ (E) % 8 } => A | ||
${ (E) / 8 } => E | ||
$ => D :MLOAD(MEM:E + 1) | ||
$ => B :MLOAD(MEM:E) | ||
${ B >> (8 * A) } => B | ||
${ (D << (128 - 8 * (A + 1))) | B } => B | ||
${ B & ((1 << 8) - 1) } => B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems alright, although knowing zkasm, I think a quick improvement in steps in the future is going to involve JNZ
on the result of % 8
and otherwise execute a single load only. Probably similar with stores as well.
This PR adds the remaining features necessary to run SHA256 benchmark:
There instructions are generated by
rustc
compiler forwasm32
target and are quite common.The current implementation is not verifiable as it uses external (free) inputs. It turned out to be quite challenging to implement them in a verifiable fashion because they involve many bitwise operations which are quite expensive.
For now, there is value in having a non-verifiable implementation that makes sure we can run tests and benchmarks related to memory. I've linked a few TODOs to do a verifiable implementation, but first, we would need a proper design for that part. Most likely, we will need to modify the zkAsm processor to support these operations efficiently.
Implementation-wise, there are three steps: