Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unroll loops and use in-place ops for faster ff and ec arithmetic #199

Closed
wants to merge 8 commits into from

Conversation

Pratyush
Copy link
Member

@Pratyush Pratyush commented Feb 4, 2021

Description

  • BigInteger: unroll loops in add_nocarry, sub_noborrow, mul2, div2, and cmp
  • SW & TE: use in-place ops in mixed addition and doubling
  • Extension fields: use in-place ops in mul and square

Overall, these changes provide a ~10% speedup to the relevant ops.


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (master)
  • Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
  • Wrote unit tests
  • Updated relevant documentation in the code
  • Added a relevant changelog entry to the Pending section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer

@ValarDragon
Copy link
Member

ValarDragon commented Feb 4, 2021

LGTM. For the extension fields, inverse_in_place could have this optimization applied to it, if the inverse logic is moved to inverse_in_place.

@Pratyush
Copy link
Member Author

Pratyush commented Feb 4, 2021

This is ready for review.

ec/src/models/short_weierstrass_jacobian.rs Show resolved Hide resolved
ec/src/models/short_weierstrass_jacobian.rs Show resolved Hide resolved
@@ -260,7 +265,6 @@ impl<P: Parameters> Default for GroupAffine<P> {
#[derivative(
Copy(bound = "P: Parameters"),
Clone(bound = "P: Parameters"),
Eq(bound = "P: Parameters"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this get removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got moved to a proper impl, because you shouldn't have a manual impl of PartialEq and a derived impl of Eq.

ff/src/fields/models/cubic_extension.rs Show resolved Hide resolved
@jon-chuang
Copy link
Contributor

jon-chuang commented Feb 5, 2021

Hi @Pratyush , thanks for this quick follow up.

I did a bench of this PR (name: assign) against simply implementing the unroll to big integer (just_unroll), here are the results:
Screenshot from 2021-02-05 08-55-09

Thanks for catching cmp as being a helpful change. This improved things further over my original changes.

However, I question the changes to the semantics of the non-assigning versions of the ops, which are changed to mutate the underlying variable. From an API standpoint, I think this is confusing, further, it achieves nothing over just x.op_assign(); let y = x;. At the most, I would remove .clone(), as was done for modulus. Further, it appears, according to the benchmark, that changing the formulas to use only assigning versions of the ops has little or possibly even negative effect.

@Pratyush Would you mind giving me partial access to this repo so I could push commits to a branch on the repo? It would simplify things for me so I wouldn't have to keep switching my git remote to target different urls.

@Pratyush
Copy link
Member Author

Pratyush commented Feb 5, 2021

However, I question the changes to the semantics of the non-assigning versions of the ops. At the most, I would remove .clone(), as was done for modulus.

The semantics didn't change; I only removed unnecessary copies. In particular, the non-assigning versions take self by value, not by reference.

Further, it appears, according to the benchmark, that changing the formulas to use only assigning versions of the ops has little or possibly even negative effect.

That's surprising, I definitely benchmarked the assign versions against the old versions, and found a non-negligible difference.

@Pratyush Would you mind giving me partial access to this repo so I could push commits to a branch on the repo? It would simplify things for me so I wouldn't have to keep switching my git remote to target different urls.

Let me figure that out. In the mean time a simpler way to handle it would be to just change the patch location

@jon-chuang
Copy link
Contributor

jon-chuang commented Feb 5, 2021

Let me figure that out. In the mean time a simpler way to handle it would be to just change the patch location

Hmm actually this is orthogonal to that issue (since deps target master, and one still has to make changes to target branches). I'm just thinking in terms of friction in PR workflow. It's a small thing.

Wrt the cross-dependency issue, I think patch doesn't fix the problem, I'll investigate if there is a good way to fix it.

That's surprising, I definitely benchmarked the assign versions against the old versions, and found a non-negligible difference.

Are you saying that the assign changes without the biginteger unroll changes produced an improvement?

The semantics didn't change; I only removed unnecessary copies. In particular, the non-assigning versions take self by value, not by reference.

Could you clarify this? I meant that the semantics have changed due to mutating the underlying variable now.

To clarify:

I question the changes to the semantics of the non-assigning versions of the ops, which are changed to mutate the underlying variable. From an API standpoint, I think this is confusing, further, it achieves nothing over just x.op_assign(); let y = x;

My point essentially is that the old non assigning APIs don't have to be changed, even if one were to convert all group formulas to use assigning versions.

Orthogonally, those conversions don't appear to help, at least according to the benchmarks I performed.

@Pratyush
Copy link
Member Author

Pratyush commented Feb 5, 2021

Hmm actually this is orthogonal to that issue (since deps target master, and one still has to make changes to target branches). I'm just thinking in terms of friction in PR workflow. It's a small thing.

Ah I just point the patch to a local clone of the relevant repos, so that I don't have to change branches and such. That way I can switch out the paths easily.

Are you saying that the assign changes without the biginteger unroll changes produced an improvement?

Yes

Could you clarify this? I meant that the semantics have changed due to mutating the underlying variable now.

The semantics to users are the same:

let a = 2;
let b = 3;
let c = a.add(&b); // c = 5
println!(a); // will print 2

@ValarDragon
Copy link
Member

Here is my patch in curves, which works quite nice for me:

[patch.'https://github.com/arkworks-rs/algebra']
ark-ff = { path = '../algebra/ff' }
ark-ec = { path = '../algebra/ec' }
ark-serialize = { path = '../algebra/serialize' }

@ValarDragon
Copy link
Member

Also I re-benchmarked this, and confirmed that I only saw operation times go down for all operations benchmarked. (Between 2 and 10% reduction depending on the operation)

Copy link
Member

@ValarDragon ValarDragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM sans comment requests. Thanks for updating this!

@Pratyush Pratyush mentioned this pull request Feb 5, 2021
6 tasks
@ValarDragon
Copy link
Member

wait what happened in this force push

@Pratyush
Copy link
Member Author

Pratyush commented Feb 5, 2021

Sorry, I just rebased on master, though I will be making a clean up of the history also soon, to separate out commits that change just the unrolling and commits that additionally use in_place ops.

(Don't worry, I added your comment requests =P)

@Pratyush
Copy link
Member Author

Pratyush commented Feb 5, 2021

Ok @jon-chuang @ValarDragon seems like the in place ops don't actually get you a big benefit; I can't reproduce whatever I had earlier (which might well have been wrong). In light of that, I think it makes sense keep only the first three commits (loop unrolling and reducing copies), as the in place ops make the code more difficult to read.

Fields ops
 name                                    only_unroll ns/iter  in_place_ops ns/iter  diff ns/iter   diff %  speedup 
 bls12_381::fq12::add_assign             144                  142                             -2   -1.39%   x 1.01 
 bls12_381::fq12::deser                  835                  859                             24    2.87%   x 0.97 
 bls12_381::fq12::deser_unchecked        835                  859                             24    2.87%   x 0.97 
 bls12_381::fq12::double                 123                  128                              5    4.07%   x 0.96 
 bls12_381::fq12::inverse                17,497               18,103                         606    3.46%   x 0.97 
 bls12_381::fq12::mul_assign             3,960                4,039                           79    1.99%   x 0.98 
 bls12_381::fq12::negate                 141                  137                             -4   -2.84%   x 1.03 
 bls12_381::fq12::ser                    506                  504                             -2   -0.40%   x 1.00 
 bls12_381::fq12::ser_unchecked          520                  509                            -11   -2.12%   x 1.02 
 bls12_381::fq12::square                 2,788                2,744                          -44   -1.58%   x 1.02 
 bls12_381::fq12::sub_assign             146                  140                             -6   -4.11%   x 1.04 
 bls12_381::fq2::add_assign              13                   13                               0    0.00%   x 1.00 
 bls12_381::fq2::deser                   127                  127                              0    0.00%   x 1.00 
 bls12_381::fq2::deser_unchecked         126                  124                             -2   -1.59%   x 1.02 
 bls12_381::fq2::double                  12                   13                               1    8.33%   x 0.92 
 bls12_381::fq2::inverse                 11,427               11,210                        -217   -1.90%   x 1.02 
 bls12_381::fq2::mul_assign              132                  131                             -1   -0.76%   x 1.01 
 bls12_381::fq2::negate                  15                   13                              -2  -13.33%   x 1.15 
 bls12_381::fq2::ser                     87                   84                              -3   -3.45%   x 1.04 
 bls12_381::fq2::ser_unchecked           85                   84                              -1   -1.18%   x 1.01 
 bls12_381::fq2::sqrt                    78,645               77,164                      -1,481   -1.88%   x 1.02 
 bls12_381::fq2::square                  105                  107                              2    1.90%   x 0.98 
 bls12_381::fq2::sub_assign              15                   15                               0    0.00%   x 1.00 
 bls12_381::fq::add_assign               8                    8                                0    0.00%   x 1.00 
 bls12_381::fq::deser                    61                   61                               0    0.00%   x 1.00 
 bls12_381::fq::deser_unchecked          61                   61                               0    0.00%   x 1.00 
 bls12_381::fq::double                   6                    6                                0    0.00%   x 1.00 
 bls12_381::fq::from_repr                40                   39                              -1   -2.50%   x 1.03 
 bls12_381::fq::into_repr                29                   29                               0    0.00%   x 1.00 
 bls12_381::fq::inverse                  10,880               10,880                           0    0.00%   x 1.00 
 bls12_381::fq::mul_assign               37                   37                               0    0.00%   x 1.00 
 bls12_381::fq::negate                   7                    7                                0    0.00%   x 1.00 
 bls12_381::fq::repr_add_nocarry         5                    5                                0    0.00%   x 1.00 
 bls12_381::fq::repr_div2                2                    2                                0    0.00%   x 1.00 
 bls12_381::fq::repr_mul2                2                    2                                0    0.00%   x 1.00 
 bls12_381::fq::repr_num_bits            2                    2                                0    0.00%   x 1.00 
 bls12_381::fq::repr_sub_noborrow        2                    3                                1   50.00%   x 0.67 
 bls12_381::fq::ser                      43                   43                               0    0.00%   x 1.00 
 bls12_381::fq::ser_unchecked            43                   43                               0    0.00%   x 1.00 
 bls12_381::fq::sqrt                     19,079               19,263                         184    0.96%   x 0.99 
 bls12_381::fq::square                   37                   37                               0    0.00%   x 1.00 
 bls12_381::fq::sub_assign               9                    8                               -1  -11.11%   x 1.12 
 ed_on_bls12_381::fq::add_assign         4                    4                                0    0.00%   x 1.00 
 ed_on_bls12_381::fq::deser              30                   31                               1    3.33%   x 0.97 
 ed_on_bls12_381::fq::deser_unchecked    30                   31                               1    3.33%   x 0.97 
 ed_on_bls12_381::fq::double             4                    4                                0    0.00%   x 1.00 
 ed_on_bls12_381::fq::from_repr          24                   24                               0    0.00%   x 1.00 
 ed_on_bls12_381::fq::into_repr          4                    4                                0    0.00%   x 1.00 
 ed_on_bls12_381::fq::inverse            5,405                5,098                         -307   -5.68%   x 1.06 
 ed_on_bls12_381::fq::mul_assign         21                   21                               0    0.00%   x 1.00 
 ed_on_bls12_381::fq::negate             5                    5                                0    0.00%   x 1.00 
 ed_on_bls12_381::fq::repr_add_nocarry   4                    4                                0    0.00%   x 1.00 
 ed_on_bls12_381::fq::repr_div2          2                    2                                0    0.00%   x 1.00 
 ed_on_bls12_381::fq::repr_mul2          2                    3                                1   50.00%   x 0.67 
 ed_on_bls12_381::fq::repr_num_bits      2                    3                                1   50.00%   x 0.67 
 ed_on_bls12_381::fq::repr_sub_noborrow  2                    2                                0    0.00%   x 1.00 
 ed_on_bls12_381::fq::ser                18                   18                               0    0.00%   x 1.00 
 ed_on_bls12_381::fq::ser_unchecked      19                   18                              -1   -5.26%   x 1.06 
 ed_on_bls12_381::fq::sqrt               12,672               12,640                         -32   -0.25%   x 1.00 
 ed_on_bls12_381::fq::square             21                   20                              -1   -4.76%   x 1.05 
 ed_on_bls12_381::fq::sub_assign         6                    5                               -1  -16.67%   x 1.20 
Group ops
name                             only_unroll_g1 ns/iter  in_place_ops_g1 ns/iter  diff ns/iter  diff %  speedup 
 bls12_381::g1::add_assign        655                     661                                 6   0.92%   x 0.99 
 bls12_381::g1::add_assign_mixed  486                     481                                -5  -1.03%   x 1.01 
 bls12_381::g1::deser             148,522                 149,175                           653   0.44%   x 1.00 
 bls12_381::g1::deser_unchecked   131                     132                                 1   0.76%   x 0.99 
 bls12_381::g1::double            333                     343                                10   3.00%   x 0.97 
 bls12_381::g1::msm_131072        1,342,187,320           1,334,606,002              -7,581,318  -0.56%   x 1.01 
 bls12_381::g1::mul_assign        166,342                 175,056                         8,714   5.24%   x 0.95 
 bls12_381::g1::rand              103,346                 105,790                         2,444   2.36%   x 0.98 
 bls12_381::g1::ser               105                     108                                 3   2.86%   x 0.97 
 bls12_381::g1::ser_unchecked     84                      86                                  2   2.38%   x 0.98 

@Pratyush Pratyush closed this Feb 5, 2021
@Pratyush Pratyush mentioned this pull request Sep 12, 2022
6 tasks
@Pratyush Pratyush deleted the faster-arithmetic branch October 26, 2022 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants