Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved k256 point-scalar multiplication #82

Merged
merged 3 commits into from
Jul 21, 2020

Conversation

fjarri
Copy link
Contributor

@fjarri fjarri commented Jul 18, 2020

May fix issue #24, or at least come close to fixing it. On my machine, the multiplication speed without endomorphism is 104us, and with endomorphism 75us.

Current version uses a simple windowed multiplication. libsecp256k1 has the following improvements on top of that:

  1. a use of the endomorphism

This is implemented, and enabled by endomorphism-mul. I have some doubts about the validity of the approximation used in libsecp256k1, see my comment in mul.rs ("@fjarri").

  1. a more advanced windowing algorithm, requiring precomputation of only odd multiples of the point

I checked it out, and it performs the same as the one from Ristretto, with the latter being considerably simpler. So I used the Ristretto one. (It does have a hardcoded window size, but generalizing to other window sizes is trivial, if needed).

  1. a use of isomorphism to calculate precomputed multiples in the affine form, and use add_mixed() later to add them to the (projective) accumulator

This introduces non-constant-timeness wrt the point, and requires a check for the point not being infinite. Not for this PR.

@codecov-commenter
Copy link

codecov-commenter commented Jul 18, 2020

Codecov Report

Merging #82 into master will decrease coverage by 1.71%.
The diff coverage is 25.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #82      +/-   ##
==========================================
- Coverage   54.40%   52.68%   -1.72%     
==========================================
  Files          16       17       +1     
  Lines        2998     3124     +126     
==========================================
+ Hits         1631     1646      +15     
- Misses       1367     1478     +111     
Impacted Files Coverage Δ
k256/src/arithmetic/field.rs 92.07% <0.00%> (ø)
k256/src/arithmetic/field/field_10x26.rs 0.00% <0.00%> (ø)
k256/src/arithmetic/field/field_impl.rs 95.31% <0.00%> (ø)
k256/src/arithmetic/field/field_montgomery.rs 0.00% <0.00%> (ø)
k256/src/arithmetic/scalar.rs 76.69% <0.00%> (-5.12%) ⬇️
k256/src/arithmetic/scalar/scalar_4x64.rs 76.19% <0.00%> (-14.37%) ⬇️
k256/src/arithmetic/scalar/scalar_8x32.rs 0.00% <0.00%> (ø)
k256/src/arithmetic.rs 85.18% <20.00%> (+0.59%) ⬆️
k256/src/mul.rs 88.37% <88.37%> (ø)
k256/src/arithmetic/field/field_5x52.rs 92.88% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1db2618...3d19153. Read the comment docs.

@tarcieri
Copy link
Member

tarcieri commented Jul 18, 2020

My understanding is the libsecp256k1 C library (and its Rust wrapper) disable this optimization by-default out of concern for patents, notably https://patentimages.storage.googleapis.com/bb/75/9e/a4353a2158ea1a/US7995752.pdf (which looks like it might expire this September, but might also get renewed).

While I think it's okay to include in the codebase, I think like libsecp256k1 it needs to be gated by a Cargo feature and disabled by-default such that it can only be used in jurisdictions where the patent does not apply, at least until such time as the patent expires.

@fjarri
Copy link
Contributor Author

fjarri commented Jul 18, 2020

Hm, what exactly is protected by this patent? The whole usage of the endomorphism, or just the specific method of decomposing the scalar (as described in the paper of one of the patent holders, R. J. Lambert - which was published in 2001, so I don't know how this patent, filed in 2005, is even valid)? In the latter case, I would prefer not to use that anyway, since, as I mentioned earlier, it doesn't actually have a bound on its results. There is an alternative method, published in 2002 - would it be free of the patent?

@tarcieri
Copy link
Member

I’m certainly not a patent lawyer. All I know is the libsecp256k1 developers have been concerned about this particular patent and how it applies to endomorphism optimizations.

@fjarri
Copy link
Contributor Author

fjarri commented Jul 19, 2020

Some more investigation results:

  • While the patent does describe an older GLV method, libsecp256k1 uses the more reliable PJKL method of scalar decomposition.

Now I'm not a patent lawyer myself, but I find it highly objectionable that a patent could be filed several years after the research is published, and cover not just a specific algorithm, but the general idea of doing a decomposition (otherwise libsecp256k1 would be free of it). I wonder how enforceable it actually is.

  • The decomposition method requires the computation of a couple of expressions like round(k * a / modulus) where k is the scalar, and a is a known constant. In libsecp256k1 they are approximated as round(k * g / 2^n), where g = a * 2^n // modulus (floor division). The code references this paper, but it doesn't provide any clues on how n was chosen. I did some experiments, and for the n=272 used in the code it is quite common to get off-by-1 errors.

It does not lead to the breach of decomposition contract, since the code only calculates one part (k2) and calculates the other part as k1 = k - lambda * k2 mod modulus, instead of calculating k1 as the original algorithm suggests. But an off-by-1 in k1 can lead to the upper bound on k1 not being sqrt(modulus) anymore (which will lead to the multiplication failure because it assumes that k1 and k2 both fit into 128 bits). Not sure what was the reason for this choice - 1) a mistake; 2) the probability of the error is low enough to be negligible; 3) I am misunderstanding something. According to my calculations, to eliminate off-by-1 completely, one would have to pick something like n=512.

@tarcieri
Copy link
Member

Now I'm not a patent lawyer myself, but I find it highly objectionable that a patent could be filed several years after the research is published

That's interesting. I do recall this claim from djb's patent research:

https://cr.yp.to/patents/us/4200770.html

Under United States case law, a document has been published if it ``has been disseminated or otherwise made available to the extent that persons interested and ordinarily skilled in the subject matter or art, exercising reasonable diligence, can locate it.'' A patent is automatically invalid if the patented invention was published more than a year before the patent's filing date.

@tarcieri
Copy link
Member

This introduces non-constant-timeness wrt the point, and requires a check for the point not being infinite. Not sure if this should be implemented or not.

A variable time scalar mult which is faster than the constant time version is still useful for signature verification

@fjarri fjarri force-pushed the k256-mul branch 2 times, most recently from 9a7add9 to ff4016a Compare July 20, 2020 05:58
@fjarri fjarri changed the title [WIP] Improved k256 point-scalar multiplication Improved k256 point-scalar multiplication Jul 20, 2020
@fjarri fjarri marked this pull request as ready for review July 20, 2020 06:02
@@ -65,6 +65,12 @@ impl Scalar {
self.0.truncate_to_u32()
}

/// Attempts to parse the given byte array as a scalar.
/// Does not check the result for being in the correct range.
pub const fn from_bytes_unchecked(bytes: &[u8; 32]) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason why this needs to be pub as opposed to pub(crate)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, I didn't think about what should be exposed to the user. This method is useful for defining Scalar constants, which I suppose may be needed sometimes? But if you want to preserve the current public API, this can be set to pub(crate) for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this method can violate the invariant of what a Scalar is supposed to contain, I'd suggest making it private for now.

I think ideally a pub const fn from_bytes(...) method would be possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think from_bytes() can be easily made into a const fn (by using the existing unchecked code). Probably can wait for another PR though.

pub fn mul_shift_var(&self, b: &Self, shift: usize) -> Self {
debug_assert!(shift >= 256);

fn ifelse(c: bool, x: u64, y: u64) -> u64 { if c {x} else {y} }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's uhh... interesting 😉

Comment on lines +19 to +20
#[cfg(feature = "arithmetic")]
mod mul;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there's maybe a larger discussion to be had here about refactoring the arithmetic module, but we can save that for another PR.

@tarcieri tarcieri merged commit 0586693 into RustCrypto:master Jul 21, 2020
@tarcieri tarcieri mentioned this pull request Aug 11, 2020
@fjarri fjarri deleted the k256-mul branch August 18, 2020 00:45
@elichai
Copy link

elichai commented Oct 20, 2021

@fjarri About your doubts,

@roconnor-blockstream wrote here a proof that 2^272 suffices: https://github.com/roconnor-blockstream/secp256k1/blob/7f4ba006e5c1495f7601b64d4466d8dcf69e15cf/endomorphism-proof.pdf
This proof is very similiar to the one here proving 2^384 suffices https://github.com/bitcoin-core/secp256k1/blob/9526874d1406a13193743c605ba64358d55a8785/src/scalar_impl.h#L160

Would love to hear your thoughts on this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants