Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] - fix: issue #977 - debug build, panic symbol, failed run #985

Closed
wants to merge 1 commit into from

Conversation

reshmem
Copy link
Contributor

@reshmem reshmem commented Dec 6, 2023

Solves #977

@vivekvpandya
Copy link
Contributor

Does this also slove issues when running example? Also it feels odd to disable overflow checks in dev profile. Better we create new profile like dev_no_oc or something like that for this.
And I hope this check is disabled by default for release profile.

Also this does not slove 977 completely as we would want things to work on any combination of optimisation/code gen flags.

@eightfilms
Copy link
Contributor

eightfilms commented Dec 7, 2023

@vivekvpandya Yeah by default overflow checks are disabled in release. Source: https://doc.rust-lang.org/cargo/reference/profiles.html#release

Digging a bit further in the issue @dimdumon linked in the PR, is another related issue: rust-lang/compiler-builtins#347

Looking at the comments it seems like various people experiencing the same issue had to tweak their dev profiles in slightly different ways for the workaround.

In our case, actually, overflow-checks = false alone wasn't enough to fix the bug:

bash-5.2$ cat Cargo.toml 
[workspace]
members = ["fibonacci", "min-max", "memory-access", "panic", "poseidon2", "static-mem-access", "sha2", "rkyv-serialization", "stdin", "fibonacci-input", "merkleproof-trustedroot"]
resolver = "2"

[profile.release]
codegen-units = 1
lto = "fat" 
opt-level = "z"

[profile.dev]
overflow-checks = false
bash-5.2$ cd rkyv-serialization && cargo build && cd .. && ../target/debug/mozak-cli run -vv target/riscv32im-mozak-zkvm-elf/debug/rkyv-serialization iotape.txt iotape.txt
    Finished dev [unoptimized + debuginfo] target(s) in 0.12s
thread 'main' panicked at /Users/bing/code/mozak/mozak-vm/runner/src/vm.rs:170:9:
VM panicked with msg: panicked at <redacted>:0:0:
unsafe precondition(s) violated: ptr::read requires that the pointer argument is aligned and non-null
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

What fixed it actually was opt-level = 1, which means this comment in the above issue is relevant for our case:

The issue seems to be that when libcore/compiler-builtins is built with LTO, compiler-builtins still needs to reference some functions from libcore (like core::ptr::const_ptr::offset or core::panicking::panic). However, these functions get eliminated from the main binary due to LTO, so compiler-builtins ends up with undefined references. This doesn't happen with opt-level = 1 as the calls from compiler-builtins to libcore are inlined, removing any LTO related issues.

Here's the output for the working version:

bash-5.2$ cat Cargo.toml 
[workspace]
members = ["fibonacci", "min-max", "memory-access", "panic", "poseidon2", "static-mem-access", "sha2", "rkyv-serialization", "stdin", "fibonacci-input", "merkleproof-trustedroot"]
resolver = "2"

[profile.release]
codegen-units = 1
lto = "fat" 
opt-level = "z"

[profile.dev]
opt-level = 1 
bash-5.2$ cd rkyv-serialization && cargo build && cd .. && ../target/debug/mozak-cli run -vvv target/riscv32im-mozak-zkvm-elf/debug/rkyv-serialization iotape.txt iotape.txt
    Finished dev [optimized + debuginfo] target(s) in 0.05s
[2023-12-07T03:50:09Z DEBUG mozak_cli] Read 5413296 of ELF data.
[2023-12-07T03:50:09Z DEBUG mozak_cli] Read 2 of io_tape data.
[2023-12-07T03:50:09Z DEBUG mozak_cli] Read 2 of io_tape data.
[2023-12-07T03:50:09Z DEBUG mozak_cli] [0, 88916, 4294967295, 0, 0, 967978916, 89044, 4217682812, 0, 0, 0, 77, 4, 0, 123240, 56, 123176, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3813019162, 591793690, 47996, 620298240]

So it seems like it might be a linker issue rather than the overflow checks? We probably need to dig a bit more.

@eightfilms
Copy link
Contributor

I'm wondering if there's a requirement that I'm not thinking of that makes us have to support being able to run debug builds. A fix would be great but wondering if forcing release builds on users would be an untenable solution.

[profile.dev]
codegen-units = 1
lto = "fat"
opt-level = "z"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be in the dev build.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #985 (comment) for a more minimal version of how the dev profile could be altered.

opt-level = "z"
# this one needs to be turned off, please refer to this resource: https://github.com/Rahix/avr-hal/issues/131
# Yeah, this is a known behavior, unfortunately.
# The problem is that in debug builds, overflow-checks are enabled.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the root cause. Ideally, we want overflow checks to be enabled.

The problem is that overflow checks cause us trouble.

lto = "fat"
opt-level = "z"
# this one needs to be turned off, please refer to this resource: https://github.com/Rahix/avr-hal/issues/131
# Yeah, this is a known behavior, unfortunately.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it obvious that you are quoting here.


[profile.dev]
codegen-units = 1
lto = "fat"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with thin or even without lto?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, will check it too

Copy link
Contributor

@vivekvpandya vivekvpandya Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rkyv examples works fine with

+   6 [profile.dev]                                                                                                                                                                                                                                                                                                                                                                                                                 
+   3 opt-level = "z"

Or better as @bingcicle already noted that even opt-level=1 solves the original issue (not the issue at linking time).

With thin LTO we don't get linker error.

+   6 [profile.dev]                                                                                                                                                                                                         
+   5 codegen-units = 1                                                                                                                                                                                                     
+   4 lto = "thin"                                                                                                                                                                                                          
+   3 opt-level = "z" 

however as our examples are small so we should always prefer fat lto.

And based on rust-lang/wg-cargo-std-aware#62 following is minimal way to make linker error go away with fat LTO

+   6 [profile.dev]                                                                                                                                                                                                         
+   5 codegen-units = 1                                                                                                                                                                                                     
+   4 lto = "fat"                                                                                                                                                                                                           
+   3 opt-level = "z"                                                                                                                                                                                                       
+   2                                                                                                                                                                                                                       
+   1 [profile.dev.package.compiler_builtins]                                                                                                                                                                               
+ 29  overflow-checks = false 

@@ -6,3 +6,27 @@ resolver = "2"
codegen-units = 1
lto = "fat"
opt-level = "z"

[profile.dev]
codegen-units = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we go with the default, or how does it fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default here means nothing specified for dev profile, I checked its fails. So I will debug that separately.

Copy link
Collaborator

@matthiasgoergens matthiasgoergens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments.

@reshmem reshmem changed the title fix: issue #977 - debug build, panic symbol, failed run [WIP] - fix: issue #977 - debug build, panic symbol, failed run Dec 7, 2023
@reshmem
Copy link
Contributor Author

reshmem commented Dec 7, 2023

Looks like it stoped to solve this issue in by branch (linker-scripts)... investigating ...

@reshmem
Copy link
Contributor Author

reshmem commented Dec 7, 2023

I still have an issue with fibonacci-input: ( @vivekvpandya , @bingcicle , @codeblooded1729 )

[profile.dev]
opt-level = 1
user@fly mozak-vm % ./debug_fi_main.sh 
[2023-12-07T21:01:08Z DEBUG mozak_cli] Read 8516604 of ELF data.
[2023-12-07T21:01:08Z DEBUG mozak_cli] Read 4 of io_tape data.
[2023-12-07T21:01:08Z DEBUG mozak_cli] Read 4 of io_tape data.
[2023-12-07T21:01:08Z DEBUG mozak_runner::vm] ecall_io_read: 0xffffffdf, 4, [40, 0, 0, 0] 
[2023-12-07T21:01:08Z DEBUG mozak_runner::vm] ecall_io_read: 0xffffffe3, 4, [162, 14, 197, 3] 
thread 'main' panicked at /Users/user/Source/mozak/mozak-vm/runner/src/vm.rs:375:13:
Looped for longer than MOZAK_MAX_LOOPS
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
user@fly mozak-vm % cat debug_fi_main.sh 
RUST_LOG=debug 
RUST_BACKTRACE=1 
MOZAK_STARK_DEBUG=TRUE
./target/debug/mozak-cli -vvv run examples/target/riscv32im-mozak-zkvm-elf/debug/fibonacci-input examples/fibonacci-input/iotape_private examples/fibonacci-input/iotape_public
 

vs release that it works well 👍

user@fly mozak-vm % ./release_fi_main.sh 
[2023-12-07T21:03:58Z DEBUG mozak_cli] Read 33224 of ELF data.
[2023-12-07T21:03:58Z DEBUG mozak_cli] Read 4 of io_tape data.
[2023-12-07T21:03:58Z DEBUG mozak_cli] Read 4 of io_tape data.
[2023-12-07T21:03:58Z DEBUG mozak_runner::vm] ecall_io_read: 0xffffffcf, 4, [40, 0, 0, 0] 
[2023-12-07T21:03:58Z DEBUG mozak_runner::vm] ecall_io_read: 0xffffffd3, 4, [162, 14, 197, 3] 
[2023-12-07T21:03:58Z DEBUG mozak_cli] [0, 83808, 4294967279, 0, 0, 0, 0, 0, 0, 0, 0, 32, 39088169, 63245986, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
user@fly mozak-vm % cat release_fi_main.sh 
RUST_LOG=debug 
RUST_BACKTRACE=1 
MOZAK_STARK_DEBUG=TRUE
./target/debug/mozak-cli -vvv run examples/target/riscv32im-mozak-zkvm-elf/release/fibonacci-input examples/fibonacci-input/iotape_private examples/fibonacci-input/iotape_public
user@fly mozak-vm % 

@vivekvpandya
Copy link
Contributor

vivekvpandya commented Dec 14, 2023

I still have an issue with fibonacci-input: ( @vivekvpandya , @bingcicle , @codeblooded1729 )

[profile.dev]
opt-level = 1
user@fly mozak-vm % ./debug_fi_main.sh 
[2023-12-07T21:01:08Z DEBUG mozak_cli] Read 8516604 of ELF data.
[2023-12-07T21:01:08Z DEBUG mozak_cli] Read 4 of io_tape data.
[2023-12-07T21:01:08Z DEBUG mozak_cli] Read 4 of io_tape data.
[2023-12-07T21:01:08Z DEBUG mozak_runner::vm] ecall_io_read: 0xffffffdf, 4, [40, 0, 0, 0] 
[2023-12-07T21:01:08Z DEBUG mozak_runner::vm] ecall_io_read: 0xffffffe3, 4, [162, 14, 197, 3] 
thread 'main' panicked at /Users/user/Source/mozak/mozak-vm/runner/src/vm.rs:375:13:
Looped for longer than MOZAK_MAX_LOOPS
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
user@fly mozak-vm % cat debug_fi_main.sh 
RUST_LOG=debug 
RUST_BACKTRACE=1 
MOZAK_STARK_DEBUG=TRUE
./target/debug/mozak-cli -vvv run examples/target/riscv32im-mozak-zkvm-elf/debug/fibonacci-input examples/fibonacci-input/iotape_private examples/fibonacci-input/iotape_public
 

vs release that it works well 👍

user@fly mozak-vm % ./release_fi_main.sh 
[2023-12-07T21:03:58Z DEBUG mozak_cli] Read 33224 of ELF data.
[2023-12-07T21:03:58Z DEBUG mozak_cli] Read 4 of io_tape data.
[2023-12-07T21:03:58Z DEBUG mozak_cli] Read 4 of io_tape data.
[2023-12-07T21:03:58Z DEBUG mozak_runner::vm] ecall_io_read: 0xffffffcf, 4, [40, 0, 0, 0] 
[2023-12-07T21:03:58Z DEBUG mozak_runner::vm] ecall_io_read: 0xffffffd3, 4, [162, 14, 197, 3] 
[2023-12-07T21:03:58Z DEBUG mozak_cli] [0, 83808, 4294967279, 0, 0, 0, 0, 0, 0, 0, 0, 32, 39088169, 63245986, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
user@fly mozak-vm % cat release_fi_main.sh 
RUST_LOG=debug 
RUST_BACKTRACE=1 
MOZAK_STARK_DEBUG=TRUE
./target/debug/mozak-cli -vvv run examples/target/riscv32im-mozak-zkvm-elf/release/fibonacci-input examples/fibonacci-input/iotape_private examples/fibonacci-input/iotape_public
user@fly mozak-vm % 

@dimdumon could you share your script to test fibonacci-input 😅 ?
script is already shown.
I would like to try it with opt-level=z

@eightfilms
Copy link
Contributor

Can confirm also that with main merged, declaring opt-level=1 is enough for the debug build of fibonacci-input to pass.

@vivekvpandya
Copy link
Contributor

@dimdumon since #977 is already fix, should we close this now?

@matthiasgoergens
Copy link
Collaborator

What's the status of this PR? @dimdumon

@matthiasgoergens
Copy link
Collaborator

@dimdumon I'm closing this one for house-keeping. Please re-open, if you are still working on this. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants