Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in cairo tests (test_u256_div_mod_n) #498

Closed

Conversation

wugalde19
Copy link
Contributor

@wugalde19 wugalde19 commented Mar 25, 2024

This PR fixes the bug reported in #472

The issue was a problem with the nz function.
It was an issue with the usage of .unwrap() in cases where it failed and with the Display and Debug traits on that function.

I did modify the implementation of that function and seems to be working now.

Checklist

  • Linked to Github Issue
  • Unit tests added
  • Integration tests added.
  • This change requires new documentation.
    • Documentation has been added/updated.

Screenshots
image
image

@wugalde19
Copy link
Contributor Author

@edg-l feel free to review this PR now

@igaray igaray added the odhack label Mar 25, 2024
@Oppen
Copy link
Contributor

Oppen commented Mar 25, 2024

A few comments:

  1. The logic doesn't look essentially different to me. unwrap panics on None, and the match panics on None.
  2. I'm not sure the original bug was in the nz function. The comment seems to point to the assert failing (note the message it reports) rather than the nz call, which happens before that. How did you reach the conclusion the issue was with nz?
  3. Did you try to confirm the issue before changing nz? Maybe the comment and the #[ignore] were obsolete and the current logic worked correctly, and that's why the newer version doing essentially the same thing also works.

@Oppen
Copy link
Contributor

Oppen commented Mar 25, 2024

@wugalde19
Copy link
Contributor Author

A few comments:

  1. The logic doesn't look essentially different to me. unwrap panics on None, and the match panics on None.
  2. I'm not sure the original bug was in the nz function. The comment seems to point to the assert failing (note the message it reports) rather than the nz call, which happens before that. How did you reach the conclusion the issue was with nz?
  3. Did you try to confirm the issue before changing nz? Maybe the comment and the #[ignore] were obsolete and the current logic worked correctly, and that's why the newer version doing essentially the same thing also works.

A few comments:

  1. The logic doesn't look essentially different to me. unwrap panics on None, and the match panics on None.
  2. I'm not sure the original bug was in the nz function. The comment seems to point to the assert failing (note the message it reports) rather than the nz call, which happens before that. How did you reach the conclusion the issue was with nz?
  3. Did you try to confirm the issue before changing nz? Maybe the comment and the #[ignore] were obsolete and the current logic worked correctly, and that's why the newer version doing essentially the same thing also works.

Hi @Oppen
So, let me walk you through my journey

The first thing I noticed was that assert panics if the == condition is not met, so if the result of u256_div_mod_n is not equal to the Some::Option value, then it will panic. For that reason, and for debugging purposes I decided to replace assert with assert!

Then using assert! on that specific test it panicked again but at least this time wasn't because of the logic that we have in corelib/src/lib.cairo -> assert (when the condition is not met)
Here's the panic error of assert!

failures:
   math_test::math_test::test_u256_div_mod_n - Panicked with "assertion failed: `math::u256_div_mod_n(6, 2, nz(7)) == Option::Some(3)`."

It wasn't still clear to me what the issue was so decided to look into the test_u256_div_mod_n function implementation but could find anything there.

I still needed to debug so I decided to take a look at the nz function
I need to print values there so I modified the function to look like this

// Helper for making a non-zero value.
fn nz<N, +TryInto<N, NonZero<N>>, +Copy<N>, +Display<N>, +Drop<N>, +Debug<N>>(n: N) -> NonZero<N>{
    let value = n.try_into();
    println!("value after try_into: {:?}", value);
    let value2 = value.unwrap();
    value2
}

and that's where I saw this error message

math_test::math_test::test_u256_div_mod_n - Panicked with 0x526573756c743a3a756e77726170206661696c65642e ('Result::unwrap failed.')

I thought I was making progress and the issue was with the unwrap method.
That's why I decided to come up with a new implementation of nz

For some reason after that the tests passed

Here's a recent screenshot of all the math_test.cairo tests passing so that could mean that the error happens when we run all the tests only OR that it randomly happens
image

So that was basically my process.
But since the test still failing and revisiting this again, the Result::unwrap failed. could have been an error in the println! that I added to nz which made me think the problem was that one.

At this point, I'm realizing it does not make sense and the issue has to be somewhere else.

I'll continue researching and trying it and see if I can fix it

What still not clear to me is why is it that the tests pass sometimes

Please let me know if you have any suggestion or comment about this

@wugalde19
Copy link
Contributor Author

I even changed the math function signature to accept u256 and not a NonZero<u256> and after debugging it tells me that the result should be Option::Some(1)

image

But when I run the test with the following:

assert!(math::u256_div_mod_n(6, 2, 7) == Option::Some(1));

it panics again

failures:
   math_test::math_test::test_u256_div_mod_n - Panicked with 0x526573756c743a3a756e77726170206661696c65642e ('Result::unwrap failed.').

@edg-l
Copy link
Collaborator

edg-l commented Apr 2, 2024

The test must not be changed, it's taken from the corelib tests, so if it doesn't pass it's an error we must fix on native, not adapt the test for it to pass.

@wugalde19
Copy link
Contributor Author

The test must not be changed, it's taken from the corelib tests, so if it doesn't pass it's an error we must fix on native, not adapt the test for it to pass.

@edg-l That's something that I noticed this past weekend. The issue seems to be on the math.cairo file
I've been studying more about Cairo and testing different things to see if I can get this done this week at most.
If some of you have some time to answer a few questions please let me know.

@wugalde19 wugalde19 force-pushed the fix-test_u256_div_mod_n-test branch from d6db794 to 716490c Compare April 10, 2024 04:12
@wugalde19
Copy link
Contributor Author

wugalde19 commented Apr 13, 2024

HI guys @edg-l @Oppen @azteca1998
I wanted to share that after many hours of study, reading the code multiple times, and debugging many functions I found out that the problem seems to be on the u256_inv_mod function in the math.cairo file.

I did change the logic of that function locally and seems like I made it work.

I made use of another function that we have on the same file to find the inverse of a modulo n named inv_mod.

Now, since this file (math.cairo) is part of the core Cairo code I don't know how to proceed 😅 .

Now, I also noticed that the problem in this project (cairo_native) could be the way build_u256_guarantee_inv_mod_n is being implemented in src/libfuncs/uint256.rs, which causes u256_guarantee_inv_mod_n used into u256_inv_mod (math.cairo) does not return the expected result.
@igaray should know more about this since we added that logic on this other PR: #412

I'm still looking into this last one but it's taking me longer to understand that logic

I would appreciate some guidance, comments (because that might not be the best approach), and help.

Please let me know if the change makes sense and If I have to proceed and apply the change to the core Cairo lib.

Thanks

Screenshot 2024-04-13 at 12 55 44 PM

@azteca1998
Copy link
Collaborator

Hello, and thanks for your time.

Those tests are passing when ran through the VM, which is our reference implementation and therefore considered the correct behaviour. Cairo native must match that behaviour. The fact that the same code executed with Cairo Native makes the test fail means that we have a bug, which is what this issue is for. Ideally, everything that works in the VM should work here too.

Regardless of whether the corelib is correct or not we're not the ones in charge of that. We cannot accept changes made there because those files aren't ours, they're from the Cairo language's standard library (corelib).

@wugalde19 wugalde19 force-pushed the fix-test_u256_div_mod_n-test branch from 716490c to ee75915 Compare April 25, 2024 05:53
@wugalde19
Copy link
Contributor Author

Hello, and thanks for your time.

Those tests are passing when ran through the VM, which is our reference implementation and therefore considered the correct behaviour. Cairo native must match that behaviour. The fact that the same code executed with Cairo Native makes the test fail means that we have a bug, which is what this issue is for. Ideally, everything that works in the VM should work here too.

Regardless of whether the corelib is correct or not we're not the ones in charge of that. We cannot accept changes made there because those files aren't ours, they're from the Cairo language's standard library (corelib).

Hi @azteca1998
Ok knowing that makes sense what I mentioned before and seems like the issue is here
https://github.com/lambdaclass/cairo_native/blob/main/src/libfuncs/uint256.rs#L656

And you are right, the corelib is something different and I shouldn't focus on that
I just wanted to share that a change there made the tests pass but was more a comment to point out that the real problem was u256_guarantee_inv_mod_n

I'm trying to add tests to refactor the build_u256_guarantee_inv_mod_n function in the libfuncs and fix the issue

@tcoratger
Copy link
Collaborator

I think #569 solves the u256_guarantee_inv_mod_n issue.

@wugalde19
Copy link
Contributor Author

I think #569 solves the u256_guarantee_inv_mod_n issue.

Hi @tcoratger
Let me check a few other scenarios but you are right, seems like it fixes the issue 🎉

@edg-l
Copy link
Collaborator

edg-l commented May 9, 2024

This is already fixed in main by another pr, closing this, but thanks for your time!

@edg-l edg-l closed this May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants