-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unroll biginteger loops and reduce copies #205
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed the changes. Most of them look reasonable. I will wait for CI to do the careful check.
I still don't understand the purpose of this PR. Why are the changes to the non assign ops made? I was trying to establish that these changes have no worth and are in fact detractive, however, these complaints have not been addressed. One can already achieve everything one would like in the new versions of the non-assigning ops with the assigning versions. Previously, non-assigning ops acted as syntactic sugar. Now, I am unconvinced one should ever use them. |
Why would you need to assign the result of sub to a new variable, when you have the old variable denoted by self that holds that very value? That makes absolutely no sense. And yet this got an approving review, whereas a PR that I made that did not have such absurd changes and had been thoroughly benched did not get any review. |
I apologize that I was more focusing on the changes that clean up nonnative, poly-commit, ... I will surely take a look at the min const generics PR soon. It is also related to the number of changes---small PRs are easier to reconcile. Our PRs in poly-commit and marlin, which add the constraints for Marlin verifier, are still there---because we want to avoid conflicts with other PRs and projects. |
Hi ok sorry, anw I wasn't referring to min-const-generics, I think that one will be binned, and it's low priority. I'm referring to #204 which is a very small PR. It's just there seems no reason why some people have more authority than others to push to branches, approve reviews, assign reviewers/request reviews, put labels. Makes little sense. For me, it adds salt to the wound that the entire arkworks migration was carried out without asking for much input from the community, and also that numerous weird changes were made without any input. To me I'm just wondering how much you'd like to involve others or just want to do everything in your own circle. |
It has one thing to do with the challenging balance between an open-sourced platform and an academic prototype---sometimes we need to wrap up something within a time limit---and there is probably why breaking changes are happening more often than a stable library should. It is a slightly challenging balance that we are maintaining. I want to assure you that when I approve this one, I was just being requested for a review, and I just did a sanity check. You may also request a review or @ us. We are sort of overwhelmed with many projects---and for me, beyond arkworks, I am also working on other non-zero-knowledge stuff, so pardon me for being only able to maintain a slight balance. |
I think there is something that we can improve---for example, as you point out, the ability to request a review should be open. Again please pardon that it is also a preliminary step, and many things can be changed to improve it! |
Hi yes, I definitely understand that, anw I think Ive not been too communicative, I will try to communicate more. Actually I do appreciate the work from the arkworks team of course. Anw I should chat more and figure out some solutions. Anw I think this is a case of just pratyush and I developing parallel work, which is confusing, I should chat more with pratyush in real time on telegram let's say since exchanging comments/PRs on issues can be a bit stilted. Since there are some fundamental differences in the scopes of the successive PRs. |
Yes. I feel it is related to the situation that Pratyush is unaware that you will be working on it. On the one hand, we are trying to get more people involved via It is fortunate that we are working on zero-knowledge proofs that have a larger community. I do receive emails for all the PRs but I usually focus on those that I am more familiar with (or bug reports). So, I am not perfect. |
Hmm ok, but if it hasn't been reviewed carefully, I don't think it should be approved, no? |
I was just doing a sanity check---I am not an expert in asm, and the review process is mainly to prevent errors, add comments on things that are unclear. I agree that we should have let you review this and reconcile with #204. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Reconcile with #204 as there seem to be some subtle improvements.
See the ongoing discussion for more tuning.
FWIW, both this PR and 204 were made while I was asleep lol. I don't think sub-24 review turnarounds are a reasonable expectation to hold as people are strewn across many timezones and have multiple projects in parallel... (Nor do other open source projects give any such guarantee. Projects I've worked on in the past such as https://github.com/tendermint/tendermint/ definitely don't) |
Are you talking about
This seems like a great optimization given that the base is already copied when this function is called? |
Well that's definitely not what I was expected, what really irked me is that this PR had an immediate approving review, even though my overlapping PR was posted earlier, further, there were many unexplained choices like the ones I am disputing. So it seemed to me like a bit of group-think and disregard for evaluating PRs on their merits, and also neglect for others' work in favour of those of the maintainers. |
Well, in that case, one would simply do:
Alternatively, if if is to copy to a new variable as you said, one does The current way of writing, which has been fine until now, and no one has brought up any reason to change, is y is now a new variable containing that result, and p is a variable we have yet to modify. So there is a clear and meaningful separation of duties between the assign and non assigning ops. None of the above affects performance as they are just syntactic sugar, the copies must happen anyway, so they are not by any means "optimisations" as you call them. Further, pratyush already confirmed there is no performance reason to change the way the group addition formulas are written and in fact I show with the changes to the biginteger ops, that for groups over Fp, they are near the performance of gurvy, wherein the remaining discrepancies I believe lie in things like uneccessary data movement at function/block boundaries (whereas gurvy generates much more optimal assembly for these via scripts). Suggesting that there are few things apart from writing assembly to improve things further. Hence, my conclusion is that there has been 0 argument for why these changes make any sense at all. Rather, the motivation for these changes seem like an artifact from pratyush's earlier attempts to optimize the way the formulas were written. |
If one truly wanted to extend the current syntactic sugar, the appropriate function to modify is to append |
Can you assume some more positive intent on our part? We're talking about a time difference in review of 5 hours between two PRs, and where for one of the PRs the reviewer was explicitly requested for review. Otherwise, the flow is that for a PR /issue not made by a maintainer, a maintainer has to add tags & request relevant people for review. Thank you for disputing the unexplained choice! Calling it unexplained is a bit untrue though, as there were two explanations in the prior PR #199 (comment), #199 (comment) . My understanding of your last comment is that you think there is an error in these explanations, I have not as of yet looked into the docs / read the latest edits of the last two comment to understand this. |
I see, I guess I am wrong about the semantics, and I missed Pratyush's point as there was no explaining comment, so much so that you had to ask for further clarification, and at the time a bigger overarching issue in discussion. The explanation in your reviewing comment is sound but easy to miss. So I would say that efforts at exposition, both in this PR and #204 should be improved by summarising contentious points of the PR. I think it is bad to rely on voluminous earlier discussion and courteous to rather provide a succint description. I also stand behind my complaint that the approving reviewer did not do due process for the PR by asking the same clarifying questions. |
@Pratyush ,while these changes only help to reduce a simple impl by one line, firstly, it's potentially confusing (requiring myself and dev to ask clarifying questions), and second, it doesn't appear to actually make any performance difference, hence I am not in favour of them. If the self term were not to be used further, I believe either rustc or llvm will optimize the copy away in the current impl anyway, so there's no improvement from move v.s. copy semantics. |
For the authority to push to branches etc, I think that's a standard practice where maintainers have more permissions, no? I can look into the issue about reviews and labels though, I didn't know only certain users can add labels/request reviews.
IMO the code is clearer and shorter, and also the compiler has less work to do in optimizing the code; move semantics are a core and inherent part of Rust, so I don't think it's unreasonable to expect future readers to know that. I can add some comments in line to clarify that. As for this PR vs #204, I wanted to separate out the unrolling changes from the intrinsics changes, just to make the PRs and performance effects smaller. Maybe we can repurpose #204 to only have the intrinsics changes? That would involve rebasing #204 on this. Does that seem reasonable @jon-chuang? |
Regarding the |
Co-authored-by: Jon Chuang <jon-chuang@users.noreply.github.com>
62b580b
to
4e79b1f
Compare
@@ -12,40 +12,71 @@ macro_rules! bigint_impl { | |||
impl BigInteger for $name { | |||
const NUM_LIMBS: usize = $num_limbs; | |||
|
|||
#[inline] | |||
#[ark_ff_asm::unroll_for_loops] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be a before/after benchmark for field arithmetic on a 256, 384 and 768 bit fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding one now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, holding off on an approval until all changes are benchmarked on the varying field sizes.
I don't share the reservations around sub(mut self, ...
and the like. Per godbolt, it seems that its just what looks more like idiomatic rust, not a performance question. I don't see how the one line shorter variant isn't idiomatic, but I'm not particularly familiar with this.
Benchmarks: BN254
BLS12-381
BW6-761
The only slow down is in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. Base field Inverse speeding up seems like a solid reason to assume this is not just noise.
ok @jon-chuang it seems the last thing that we need to account for is the speed of doubling; can we just implement it as fn double_in_place(&mut self) {
#[cfg(all(target_arch = "x86_64", feature = "asm"))]
{ *self += &*self; }
#[cfg(not(feature = "asm"))]
{
//existing code
}
} |
This line wasnt converted from #205, potentially more werent as well: https://github.com/arkworks-rs/algebra/pull/204/files#diff-26dabf22b47ebe1097e18d25b10c8252331b1255b142bf0f29cdc9f2ae28a27fL29 |
I added the assert; the primary thing left is to decide whether to switch doubling to use addition as well. |
Well, I know it's faster when using the intrinsics, potentially it's slower when not. Btw does that code compile? I had to do some unsafe magic to try to get the mut and non-mut reference to work together? Well I decided not to trust the Fq benchmarks on this, when I looked at the g1 benchmarks there was about a 3-4% improvement overall. |
@@ -12,40 +12,71 @@ macro_rules! bigint_impl { | |||
impl BigInteger for $name { | |||
const NUM_LIMBS: usize = $num_limbs; | |||
|
|||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing inline? I think this should be given an inline hint no?
ff/src/biginteger/macros.rs
Outdated
*a = sbb!(*a, *b, &mut borrow); | ||
for i in 0..$num_limbs { | ||
#[cfg(all(target_arch = "x86_64", feature = "asm"))] | ||
#[cfg_attr(all(target_arch = "x86_64", feature = "asm"), allow(unsafe_code))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is unnecessarily verbose since it only compiles if the cfg is true. So previous version is fine.
I've pushed some changes, which help the PR reach the previous baseline perf. and address some of the review comments I made. |
3699752
to
5dfe4ee
Compare
…into faster-arithmetic-2
Ok I think this is ready to merge! |
Description
Replaces #199, motivation from #198
Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.
Pending
section inCHANGELOG.md
Files changed
in the Github PR explorer