-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant-time EC point multiplication (Montgomery ladder) implementation #325
base: main
Are you sure you want to change the base?
Constant-time EC point multiplication (Montgomery ladder) implementation #325
Conversation
Co-authored-by: Vivian Plasencia <v.pcalana@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool, @ChinoCribioli. Thanks for taking it on.
I feel underqualified to review the correctness of the cryptographic algorithm, and how well it meets the goal of resistance to timing attacks. Perhaps @cedoor is willing to do so, or can suggest someone to double-check here. I mostly trust the unit tests elsewhere (particularly the compatibility tests in eddsa-poseidon comparing results to circomlibjs) to ensure the results of this calculation are correct.
I do feel competent to comment on some aspects of the code, and potential micro-optimizations, so I've done that here. One thing I'm concerned about is relative performance of this algorithm compared to the old one. Can you do some testing of the relative speed of this raw operation, and/or higher-level operations which depend on it (like eddsa-poseidon sign/verify). Just timing operations in a loop and reporting results in the PR would be sufficient here. I don't think this needs an automated performance test in the test suite.
@@ -65,25 +65,43 @@ export function addPoint(p1: Point<bigint>, p2: Point<bigint>): Point<bigint> { | |||
/** | |||
* Performs a scalar multiplication by starting from the 'base' point and 'adding' | |||
* it to itself 'e' times. | |||
* This works given the following invariant: At each step, R0 will be r_0*base where r_0 is the prefix of e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest additionally referring to the named algorithm being used here (Montgomery Ladder) perhaps with a link to a description of the algorithm (Wikipedia or some other source).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Will do.
return [BigInt(0), BigInt(1)] | ||
} | ||
const eBits: Array<boolean> = [] | ||
while (!scalar.isZero(e)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the goal constant time? This loop doesn't seem constant time to me, since it'll exit early if high bits of the input are zero. As a result the length of eBits is also variable. If you want constant time, it seems like there has to be a hard-coded max bit length (254 I think, but someone should double-check me) somewhere.
If you wish, I think this loop can probably be skipped entirely with some bigint bitwise operations below. Something like this:
for (let bitMask = 1n, bitmask < 1n<<254n, bitmask <<= 1n) {
if (e & bitMask != 0) {
// ...
I'm not sure of the relative efficiency, but I'd guess that the bitmasks are cheaper than allocating memory for an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this sounds correct to me. If e
is 7
for example this will loop 3 times, whereas if it's like 4182498124124891489144
it will loop significantly more times.
Hardcoding an iteration count is a good approach, but we'll still have timing vulnerabilities due to the use of the javascript BigInt
type which is not safe against timing attacks.
It's also unclear if the scalar.isZero
function is constant time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this point. I changed the loop with a hardcoded for.
while (!scalar.isZero(rem)) { | ||
if (scalar.isOdd(rem)) { | ||
res = addPoint(res, exp) | ||
if (scalar.isZero(e)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't constant time if e is zero. Is that okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think although the exponent is marked as secret, it may be considered public if it's set to 0. The rationale in my mind is "no private key would ever be selected as the 0 value". That said, I don't think it's a good idea to short circuit here because it breaks the comment above of e
being secret in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also thought initially that we shouldn't worry too much about the zero case since it is irrelevant cryptographically. But I agree to change this to make the method more consistent.
Thanks @ChinoCribioli and @artwyman!
I'm also not competent enough to check the logic out here. I'll ask other people to review it. @ChinoCribioli Is this a breaking change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to see discussion about timing vulnerabilities, and a constant time ec multiplication implementation.
Unfortunately this function is still unsafe because of the use of the javascript BigInt
type. Internally a BigInt
is (likely) represented as the minimum number of bytes for the value. This means any operation we do here will be variable time based on the value of the BigInt
.
I've tried to think of how to convert from BigInt
to an alternate representation in constant time, but haven't found an approach that seems like it would be safe. Approaches i've considered:
- use
e.toString(2)
: this will return a binary string, but it will be of variable length based on the size of the value - use repeated division to extract le bytes: division is generally not a constant time operation based on the numerator. It would also be operating on a bigint which is not constant time
- do
e + (1n << 254n)
to get a ~constant size value then operate on that and discard bits above 254: this could work, and may be the most safe approach, but it's unclear if the addition is constant time (the implementation may have to backfill the bits between the value and 254). It's also unclear if bigints are variable time based on only the number of bits in the value, or based on something more specific like the number of 1's in the value, and the total number of bits
I think it's a good idea to include this implementation though (after the requested changes) to mitigate timing attacks as much as possible.
while (!scalar.isZero(rem)) { | ||
if (scalar.isOdd(rem)) { | ||
res = addPoint(res, exp) | ||
if (scalar.isZero(e)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think although the exponent is marked as secret, it may be considered public if it's set to 0. The rationale in my mind is "no private key would ever be selected as the 0 value". That said, I don't think it's a good idea to short circuit here because it breaks the comment above of e
being secret in all cases.
return [BigInt(0), BigInt(1)] | ||
} | ||
const eBits: Array<boolean> = [] | ||
while (!scalar.isZero(e)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this sounds correct to me. If e
is 7
for example this will loop 3 times, whereas if it's like 4182498124124891489144
it will loop significantly more times.
Hardcoding an iteration count is a good approach, but we'll still have timing vulnerabilities due to the use of the javascript BigInt
type which is not safe against timing attacks.
It's also unclear if the scalar.isZero
function is constant time.
} | ||
const eBits: Array<boolean> = [] | ||
while (!scalar.isZero(e)) { | ||
if (scalar.isOdd(e)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear if isOdd
is constant time
} | ||
e = scalar.shiftRight(e, BigInt(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A right shift on a BigInt
is almost certainly not constant time.
let R0: Point<bigint> = base | ||
let R1: Point<bigint> = addPoint(base, base) | ||
|
||
for (const bit of eBits.slice(0, -1).reverse()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This slice is a no-op, i think. It may be necessary to invoke the reverse
function though.
nvm, it returns all elements but the last one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments on the latest changes. I'm still curious about relative performance of this implementation vs. the previous one.
|
||
let R0: Point<bigint> = base | ||
let R1: Point<bigint> = addPoint(base, base) | ||
let R0: Point<bigint> = [0n, 1n] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see there's a unit test covering multiply by zero, which is good to confirm this still behaves the same way as before without the special-case.
As a side suggestion, it might be clearer to make this a public const id
like is used in the unit tests, to make it clear that [0n, 1n] is the representation of the zero/id point in this curve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that. It is done.
@chancehudson I just pushed most of the changes you requested. My only doubt now regarding any possible remaining timing vulnerability lies on the lines |
The if statement should be fine as long as Again though, this is best effort. Regardless we're going to have some timing leakage from the |
I think this needs a more direct test of the affected operations (repeated in a loop to avoid fixed overhead and caching effects) to make the differences more clear. It does look to me like the test times are uniformly slower, though, which isn't very pleasing. I submitted an issue requesting to make it faster (which looks like it may be met by a WASM implementation implemented in Rust). The utest which targets small values in particular is expected to get slower for a constant-time algorithm, but I assume the eddsa-poseidon tests are operating on more typical values. I'm not entirely sure of the right trade-off here. Speaking for the Zupass use case, I think we probably care more about performance than timing attacks right now, but that's not a universal choice I expect to apply to everyone. |
Co-authored-by: Chance <jchancehud@gmail.com>
while (!scalar.isZero(rem)) { | ||
if (scalar.isOdd(rem)) { | ||
res = addPoint(res, exp) | ||
e %= order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, the original implementation doesn't make any assertions or modifications to this variable. This is an exponent, which is not necessarily a field element, so the reduction should be unnecessary.
I think if we remove this we have a pretty safe function. We just need to look more deeply at the performance hit and decide if it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An exponent doesn't have to be a field element, but the curve is cyclic with its order, so a value greater than the order is redundant compared to the modular reduction of the same value (either leads to the same output). And the constant-ish time algorithm below works only if you can fix the number of bits in the input. If we reduce the modular reduction, then I think this function becomes incorrect for any value greater than the order. Modular reduction is not the same as cutting off high bits (because the order is not a power of 2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. As Andrew says, some tests break when you remove the reduction because you incorrectly handle the case where the exponent passed as input is greater than 254 bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right because the loop count is hardcoded, i see
I get that for some projects performance is more important than
I get that for some use cases performance is more important than safety, I honestly don't know which are the main projects that use this library. However, I think that any implementation of a rather complex cryptographic protocol must have a strong focus on security and, as you said, the performant variant may not be the best option for a lot of uses this library might have. |
I think security should have priority over performance, but I would wait for a full review by the audit team. One solution might be to release a new major if the security benefits are substantial and continue to maintain version 1 anyway. |
@chancehudson @artwyman any thoughts? |
I have thoughts in two directions. First general thoughts on maintaining 2 versions: It's workable, but definitely has a cost in maintenance burden and confusion. Either both versions need to be kept up to date with future patches, or else v1 will slowly become less useful, and potentially even insecure. I historically have a preference for "develop on main" workflows since the code everyone looks at and uses the code which will remain up-to-date and workable. I realize this is a preference, though, not a clear-cut definition of "best". If you're going to support multiple algorithms for a significant time period, it may be better to keep them both in the code separately. They could be separate packages, separate functions within the package, or configurable via an "options" parameter. That keeps them both "on main" so they should benefit from ongoing work, unit testing, etc. 2nd, thoughts on the specifics of the security vs. performance tradeoff here: Given bigint arithmetic isn't constant-time anyway, this algorithm isn't a complete solution to the security problem. That makes me personally de-prioritize it vs. performance, but that's a call reasonable people can make differently. I would defer to someone with more security expertise, and who has specific threat models in mind to make the call on whether a constant-time algorithm built on variable-time bigint math reduces timing attack risk by enough of a margin to be worth prioritizing. Calling the current implementation the "performant variant" is admittedly a bit of a stretch, since any pure-JS implementation of cryptography isn't going to be very performant. A preferable alternative would be to solve this problem fully as part of a more optimized implementation (probably wasm) which is fast enough to be an improvement to performance even with a constant-time algorithm. Such an implementation could improve both performance and security at once . I'm not sure how soon that's likely to happen, though, so there may need to be trade-offs or provide multiple options in the near term. |
Description
The
mulPointScalar
method is implemented with the regular square and multiply algorithm, which is prone to timing attacks due to the fact that the number of EC point additions depends on the number of 1's in the binary expression of the scalar.With this implementation (called Montgomery Ladder) this is avoided by maintaining the number of additions depending only on the length of the binary representation of the scalar.
As documented in my implementation of the method, the algorithm works because of the following invariant: At each step,
R0
will ber_0*base
wherer_0
is the prefix ofe
written in binary andR1
will be(r_0+1)*base
. In other words: at iterationi
of the loop,r_0
's binary representation will be the firsti+1
most significant bits ofe
. If the upcoming bit is a 0, we just have to doubleR0
and addR0
toR1
to maintain the invariant. If it is a 1, we have to doubleR0
and add1*base
(or addR1
, which is the same as(r_0+1)*base
), and doubleR1
to maintain the invariant.Related Issue(s)
Fixes #324
Other information
Aside from the new implementation of the
mulPointScalar
method, I added some tests to automatically check basic behaviors such as EC point additions, EC point multiplications, and the orders of the generating/base points used in the protocol.These tests are independent of the new implementation of the multiply method and are intended to make the test base more robust.
Checklist
yarn style
without getting any errors