Improved k256 point-scalar multiplication #82

fjarri · 2020-07-18T08:24:22Z

May fix issue #24, or at least come close to fixing it. On my machine, the multiplication speed without endomorphism is 104us, and with endomorphism 75us.

Current version uses a simple windowed multiplication. libsecp256k1 has the following improvements on top of that:

a use of the endomorphism

This is implemented, and enabled by endomorphism-mul. I have some doubts about the validity of the approximation used in libsecp256k1, see my comment in mul.rs ("@fjarri").

a more advanced windowing algorithm, requiring precomputation of only odd multiples of the point

I checked it out, and it performs the same as the one from Ristretto, with the latter being considerably simpler. So I used the Ristretto one. (It does have a hardcoded window size, but generalizing to other window sizes is trivial, if needed).

a use of isomorphism to calculate precomputed multiples in the affine form, and use add_mixed() later to add them to the (projective) accumulator

This introduces non-constant-timeness wrt the point, and requires a check for the point not being infinite. Not for this PR.

codecov-commenter · 2020-07-18T08:32:52Z

Codecov Report

Merging #82 into master will decrease coverage by 1.71%.
The diff coverage is 25.47%.

@@            Coverage Diff             @@
##           master      #82      +/-   ##
==========================================
- Coverage   54.40%   52.68%   -1.72%     
==========================================
  Files          16       17       +1     
  Lines        2998     3124     +126     
==========================================
+ Hits         1631     1646      +15     
- Misses       1367     1478     +111

Impacted Files	Coverage Δ
k256/src/arithmetic/field.rs	`92.07% <0.00%> (ø)`
k256/src/arithmetic/field/field_10x26.rs	`0.00% <0.00%> (ø)`
k256/src/arithmetic/field/field_impl.rs	`95.31% <0.00%> (ø)`
k256/src/arithmetic/field/field_montgomery.rs	`0.00% <0.00%> (ø)`
k256/src/arithmetic/scalar.rs	`76.69% <0.00%> (-5.12%)`	⬇️
k256/src/arithmetic/scalar/scalar_4x64.rs	`76.19% <0.00%> (-14.37%)`	⬇️
k256/src/arithmetic/scalar/scalar_8x32.rs	`0.00% <0.00%> (ø)`
k256/src/arithmetic.rs	`85.18% <20.00%> (+0.59%)`	⬆️
k256/src/mul.rs	`88.37% <88.37%> (ø)`
k256/src/arithmetic/field/field_5x52.rs	`92.88% <100.00%> (ø)`
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1db2618...3d19153. Read the comment docs.

tarcieri · 2020-07-18T14:49:57Z

My understanding is the libsecp256k1 C library (and its Rust wrapper) disable this optimization by-default out of concern for patents, notably https://patentimages.storage.googleapis.com/bb/75/9e/a4353a2158ea1a/US7995752.pdf (which looks like it might expire this September, but might also get renewed).

While I think it's okay to include in the codebase, I think like libsecp256k1 it needs to be gated by a Cargo feature and disabled by-default such that it can only be used in jurisdictions where the patent does not apply, at least until such time as the patent expires.

fjarri · 2020-07-18T17:38:41Z

Hm, what exactly is protected by this patent? The whole usage of the endomorphism, or just the specific method of decomposing the scalar (as described in the paper of one of the patent holders, R. J. Lambert - which was published in 2001, so I don't know how this patent, filed in 2005, is even valid)? In the latter case, I would prefer not to use that anyway, since, as I mentioned earlier, it doesn't actually have a bound on its results. There is an alternative method, published in 2002 - would it be free of the patent?

tarcieri · 2020-07-18T17:42:35Z

I’m certainly not a patent lawyer. All I know is the libsecp256k1 developers have been concerned about this particular patent and how it applies to endomorphism optimizations.

fjarri · 2020-07-19T06:37:42Z

Some more investigation results:

While the patent does describe an older GLV method, libsecp256k1 uses the more reliable PJKL method of scalar decomposition.

Now I'm not a patent lawyer myself, but I find it highly objectionable that a patent could be filed several years after the research is published, and cover not just a specific algorithm, but the general idea of doing a decomposition (otherwise libsecp256k1 would be free of it). I wonder how enforceable it actually is.

The decomposition method requires the computation of a couple of expressions like round(k * a / modulus) where k is the scalar, and a is a known constant. In libsecp256k1 they are approximated as round(k * g / 2^n), where g = a * 2^n // modulus (floor division). The code references this paper, but it doesn't provide any clues on how n was chosen. I did some experiments, and for the n=272 used in the code it is quite common to get off-by-1 errors.

It does not lead to the breach of decomposition contract, since the code only calculates one part (k2) and calculates the other part as k1 = k - lambda * k2 mod modulus, instead of calculating k1 as the original algorithm suggests. But an off-by-1 in k1 can lead to the upper bound on k1 not being sqrt(modulus) anymore (which will lead to the multiplication failure because it assumes that k1 and k2 both fit into 128 bits). Not sure what was the reason for this choice - 1) a mistake; 2) the probability of the error is low enough to be negligible; 3) I am misunderstanding something. According to my calculations, to eliminate off-by-1 completely, one would have to pick something like n=512.

tarcieri · 2020-07-19T16:22:55Z

Now I'm not a patent lawyer myself, but I find it highly objectionable that a patent could be filed several years after the research is published

That's interesting. I do recall this claim from djb's patent research:

https://cr.yp.to/patents/us/4200770.html

Under United States case law, a document has been published if it ``has been disseminated or otherwise made available to the extent that persons interested and ordinarily skilled in the subject matter or art, exercising reasonable diligence, can locate it.'' A patent is automatically invalid if the patented invention was published more than a year before the patent's filing date.

tarcieri · 2020-07-19T17:14:47Z

This introduces non-constant-timeness wrt the point, and requires a check for the point not being infinite. Not sure if this should be implemented or not.

A variable time scalar mult which is faster than the constant time version is still useful for signature verification

tarcieri · 2020-07-20T18:32:02Z

k256/src/arithmetic/scalar.rs

@@ -65,6 +65,12 @@ impl Scalar {
        self.0.truncate_to_u32()
    }

+    /// Attempts to parse the given byte array as a scalar.
+    /// Does not check the result for being in the correct range.
+    pub const fn from_bytes_unchecked(bytes: &[u8; 32]) -> Self {


Is there a specific reason why this needs to be pub as opposed to pub(crate)?

Not really, I didn't think about what should be exposed to the user. This method is useful for defining Scalar constants, which I suppose may be needed sometimes? But if you want to preserve the current public API, this can be set to pub(crate) for now.

Given that this method can violate the invariant of what a Scalar is supposed to contain, I'd suggest making it private for now.

I think ideally a pub const fn from_bytes(...) method would be possible.

Yes, I think from_bytes() can be easily made into a const fn (by using the existing unchecked code). Probably can wait for another PR though.

…Ristretto

tarcieri · 2020-07-21T14:00:15Z

k256/src/arithmetic/scalar/scalar_4x64.rs

+    pub fn mul_shift_var(&self, b: &Self, shift: usize) -> Self {
+        debug_assert!(shift >= 256);
+
+        fn ifelse(c: bool, x: u64, y: u64) -> u64 { if c {x} else {y} }


That's uhh... interesting 😉

tarcieri · 2020-07-21T14:01:20Z

k256/src/lib.rs

+#[cfg(feature = "arithmetic")]
+mod mul;


I feel like there's maybe a larger discussion to be had here about refactoring the arithmetic module, but we can save that for another PR.

elichai · 2021-10-20T09:26:26Z

@fjarri About your doubts,

@roconnor-blockstream wrote here a proof that 2^272 suffices: https://github.com/roconnor-blockstream/secp256k1/blob/7f4ba006e5c1495f7601b64d4466d8dcf69e15cf/endomorphism-proof.pdf
This proof is very similiar to the one here proving 2^384 suffices https://github.com/bitcoin-core/secp256k1/blob/9526874d1406a13193743c605ba64358d55a8785/src/scalar_impl.h#L160

Would love to hear your thoughts on this :)

fjarri force-pushed the k256-mul branch 2 times, most recently from 9a7add9 to ff4016a Compare July 20, 2020 05:58

fjarri changed the title ~~[WIP] Improved k256 point-scalar multiplication~~ Improved k256 point-scalar multiplication Jul 20, 2020

fjarri marked this pull request as ready for review July 20, 2020 06:02

tarcieri reviewed Jul 20, 2020

View reviewed changes

fjarri added 2 commits July 20, 2020 22:47

Move point-scalar multiplication to its own module

4dc4129

Add endomorphism multiplication and an advanced windowed method from …

2d18ea1

…Ristretto

fjarri force-pushed the k256-mul branch from d468dfd to ee31e24 Compare July 21, 2020 05:47

Make from_bytes_unchecked() for Scalar and Field pribate to the crate

3d19153

fjarri force-pushed the k256-mul branch from ee31e24 to 3d19153 Compare July 21, 2020 05:55

tarcieri reviewed Jul 21, 2020

View reviewed changes

tarcieri merged commit 0586693 into RustCrypto:master Jul 21, 2020

tarcieri mentioned this pull request Aug 11, 2020

k256 v0.4.0 #128

Merged

fjarri deleted the k256-mul branch August 18, 2020 00:45

elichai mentioned this pull request Oct 19, 2021

cross-testing bitcoin-core/secp256k1#691

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved k256 point-scalar multiplication #82

Improved k256 point-scalar multiplication #82

fjarri commented Jul 18, 2020 •

edited

Loading

codecov-commenter commented Jul 18, 2020 •

edited

Loading

tarcieri commented Jul 18, 2020 •

edited

Loading

fjarri commented Jul 18, 2020 •

edited

Loading

tarcieri commented Jul 18, 2020

fjarri commented Jul 19, 2020 •

edited

Loading

tarcieri commented Jul 19, 2020

tarcieri commented Jul 19, 2020

tarcieri Jul 20, 2020

fjarri Jul 20, 2020

tarcieri Jul 20, 2020

fjarri Jul 21, 2020

tarcieri Jul 21, 2020

tarcieri Jul 21, 2020

elichai commented Oct 20, 2021 •

edited

Loading

Improved k256 point-scalar multiplication #82

Improved k256 point-scalar multiplication #82

Conversation

fjarri commented Jul 18, 2020 • edited Loading

codecov-commenter commented Jul 18, 2020 • edited Loading

Codecov Report

tarcieri commented Jul 18, 2020 • edited Loading

fjarri commented Jul 18, 2020 • edited Loading

tarcieri commented Jul 18, 2020

fjarri commented Jul 19, 2020 • edited Loading

tarcieri commented Jul 19, 2020

tarcieri commented Jul 19, 2020

tarcieri Jul 20, 2020

Choose a reason for hiding this comment

fjarri Jul 20, 2020

Choose a reason for hiding this comment

tarcieri Jul 20, 2020

Choose a reason for hiding this comment

fjarri Jul 21, 2020

Choose a reason for hiding this comment

tarcieri Jul 21, 2020

Choose a reason for hiding this comment

tarcieri Jul 21, 2020

Choose a reason for hiding this comment

elichai commented Oct 20, 2021 • edited Loading

fjarri commented Jul 18, 2020 •

edited

Loading

codecov-commenter commented Jul 18, 2020 •

edited

Loading

tarcieri commented Jul 18, 2020 •

edited

Loading

fjarri commented Jul 18, 2020 •

edited

Loading

fjarri commented Jul 19, 2020 •

edited

Loading

elichai commented Oct 20, 2021 •

edited

Loading