-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized linear combination of points #380
Conversation
Codecov Report
@@ Coverage Diff @@
## master #380 +/- ##
==========================================
- Coverage 57.92% 57.54% -0.38%
==========================================
Files 29 29
Lines 4036 4146 +110
==========================================
+ Hits 2338 2386 +48
- Misses 1698 1760 +62
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something like this is fine to get started.
- how should we expose it to the user (and should we expose it at all?). As a function or as a "LinearCombination" trait of ProjectivePoint? In a submodule? Gated by a feature?
I think initially it's fine to just have an internal implementation specific to this use case.
It would be interesting to expose eventually, but raises a lot of questions. For example: should a trait like LinearCombination
go in the group
crate?
I would prefer to cross that bridge when we get there.
- a generalization for n-term linear combination (even just joining mul_windowed() and lincomb() would be nice, they have pretty much identical code)
The latter seems worth exploring if it can be done without impacting performance
- in verify.rs and recoverable.rs one of the points of the linear combination is a generator. So it can be sped up even more by pre-calculating generator powers. Should it be done in this PR or in another one?
I'd prefer to look at that in a separate PR. It definitely sounds like a good idea, but that's an area where it might make sense to feature gate it, as I expect there will be a tradeoff in code size which would impact embedded users who might prefer smaller code size over better performance.
Two traits that could be useful as design inspiration are the IME there's pretty minimal cost to making an arbitrary-length multiscalar multiplication API over a double-base scalar multiplication API. Basically the only downside is that the lookup tables have to be heap-allocated, but this is a pretty minimal cost even for the double-base case. This is just my opinion, but I think it's preferable to have only the arbitrary-length version, for reduced API surface. If you're going to have a specialized double-base scmul function, it's useful to have it hardcode one of the points (so it can use a bigger, statically precomputed lookup table). |
@fjarri hope you don't mind but I flipped this PR out of draft I think it's a reasonable self-contained first step. |
I just wanted to emphasize that it shouldn't be merged yet :) So, I added a generic version of high-level operations/lincomb via mul+add [157.31 us 157.39 us 157.48 us] A strange effect is that Do you think the generic version overcomplicates things? (I wish Rust had maps and zips defined for static arrays, but alas...) |
I wouldn't worry too much about a modicum of performance lost to the generic version.
It should retain all of the information necessary to optimize those as part of |
Yes, but as far as I know it is not possible to
Well, that's basically what's happening in |
Aah yes, that's been the long-desired one which continues to be an ongoing source of controversy as to how to implement it. I think this is the most up-to-date tracking issue on where things are at with that: |
One somewhat dirty option which could unlock those sort of use cases we could potentially consider is copying and pasting the existing (unstable) functionality from libcore which provides this into some shared crate: We do have the I wouldn't suggest blocking this PR on it though, but it is definitely something we could use in a lot of other places (particularly symmetric cryptography). The eventually we could migrate to similarly-shaped libcore functionality when it's eventually stabilized. |
So, do you think it's ok to use the current approach with |
Absolutely, that's fine. As I said earlier, we use more imperative patterns in a lot of other places because LLVM seems to do a better job optimizing them. Hopefully that situation changes as const generics improve, but for now I think it's much more straightforward (and also friendlier to C programmers who may not be familiar with more functional patterns) |
Got it, I'm going to clean it up then. |
2a6865f
to
108218f
Compare
Ok, done. I just exposed it publicly as As for a trait that @hdevalence suggested, I think it's a good idea, but I don't know how I feel about using iterators instead of fixed-size arrays. Maybe we can leave this PR at just a function and add a trait next (sometime before the next release). |
There are various and somewhat inconsistent ways that low-level internals are exposed across crates in the RustCrypto org.
It might be good to come up with a single naming convention for this, but doing so is a breaking change. I can try to open a sort of meta issue on this so we can pick a single name for this sort of thing and use it consistently. |
@fjarri thanks for implementing this! Would definitely be interested in precomputed powers for the generator point if you're interested in implementing that as well:
|
I thought about it, and the biggest problem here is precomputing and storing the required table. Rust is pretty rigid when it comes to constants; one way would be to precompute and hardcode the byte representations of required points, but that will increase the binary size, as you mentioned, and in general it's a rather roundabout way. |
Another performance improvement (perhaps not very significant, but still) is possible if one of the scalars in the linear combination is not secret. Then we can use regular table lookup instead of constant-time lookup. |
Any thoughts on We could feature gate it so it doesn't impact embedded targets.
That'd be great for ECDSA. Also generally I've wondered if wNAF would be nice here, although I'm not sure if the benefits are a wash given secp256k1 endomorphisms. Are you familiar with what bitcoin-core/libsecp256k1 does here? (I am not, perhaps I should look) |
In the stable Rust it won't work because trait methods will be involved (e.g. for arithmetic), and they can't be Another possibility is a build macro, but that will be hard to write and hard to maintain.
So, to make sure I understand it correctly, in the places where we currently use
I actually have a rough port of |
...do you mean "secret" as in could leak the message digest, which is potentially preimagable and could therefore leak the message, which could be bad if the message is secret? I guess this is a debatable tradeoff, but the risk is leaking low-entropy secrets, as otherwise they wouldn't be preimagable from the digest via brute force. Otherwise yes, I'd say all factors involved in verification can be considered public.
Interesting |
One way to address this for now is to hoist those trait methods out into |
You're right, I think I got a wrong impression. I suppose it's not secret either (although I'd rather not make a decision on that myself, my cryptography knowledge is very superficial). In that case the current
I guess, but that seems like a lot of work. I think I'll see first how much speedup is actually possible with a precomputed table. |
There's perhaps a case to be made there, especially for overall sidechannel resistance, but if the inputs are high-entropy secrets they aren't preimagable and so a constant-time hash function suffices, so perhaps this is a tradeoff which should target performance initially.
I think right now all of the cases that matter are non-secret. Concretely those are ECDSA verification, public key recovery from an ECDSA signature, and perhaps things like Schnorr/Taproot (#381). |
When will this PR be applied to P256? |
There aren't currently any plans to apply this optimization to other curves. Currently the trait impl used by elliptic-curves/primeorder/src/projective.rs Lines 330 to 335 in d8e44b0
|
Pedersen commitments are one such use case |
The goal of this PR is to optimize a commonly used operation
(x * k + y * l)
wherex
andy
are curve points, andk
andl
are scalars. It is currently used inverify.rs
andrecoverable.rs
.On my machine it speeds up ECDSA verification from 176us to 145us (and the newly added benchmark shows speed-up from 157us to 124us for linear combination alone).
Currently there PR contains a
lincomb_generic()
that takes arrays, and it has two aliases:mul()
(internal, used in operator traits) andlincomb()
(public, for two-point linear combinations).