[Code golfing] secp256k1 programming language benchmark #285

mratsim · 2023-10-17T14:34:02Z

Following https://forum.nim-lang.org/t/10550#70556, this outlines step to put Nim at the top of https://programming-language-benchmarks.vercel.app/problem/secp256k1

Benchmark the top implementations.
Rust and Go on your machine. Go is using the standard libsecp256k1 library, which is also wrapped in Nim: https://github.com/status-im/nim-secp256k1
Difficulty: Very easy

Implement addition chain for 21
Here:

constantine/constantine/math/arithmetic/finite_fields.nim

Lines 443 to 456 in 34baa74

    
           func `*=`*(a: var FF, b: static int) = 
        
             ## Multiplication by a small integer known at compile-time 
        
             # Implementation: 
        
             # We don't want to go convert the integer to the Montgomery domain (O(n²)) 
        
             # and then multiply by ``b`` (another O(n²) 
        
             # 
        
             # So we hardcode addition chains for small integer 
        
             # 
        
             # In terms of cost a doubling/addition is 3 passes over the data: 
        
             # - addition + check if > prime + conditional substraction 
        
             # A full multiplication, assuming b is projected to Montgomery domain beforehand is: 
        
             # - n² passes over the data, each of 5~6 elementary addition/multiplication 
        
             # - a conditional substraction 
        
             #

This is blocking for projective coordinates benchmarking due to this:

constantine/constantine/math/elliptic/ec_shortweierstrass_projective.nim

Line 231 in 34baa74

t2 *= b3 # 21. t₂ <- 3b t₂ t₂ = 3bZ₁Z₂

For secp256k1, curve equation y² = x³ + ax + b, with a=0 and b=7.
- Optimal shortest addition chains can be found here: https://wwwhomes.uni-bielefeld.de/achim/addition_chain.html
Difficulty: Very easy

Benchmark Constantine on the same computer to have a baseline.
CC=clang nimble bench_ec_g1_scalarmul
Projective coordinates will need to be commented out if previous step has been skipped, endomorphism acceleration will need to be commented out.
Difficulty: Very easy
Add endomorphism acceleration for secp256k1.
The constants can be generated from https://github.com/mratsim/constantine/blob/master/sage/derive_endomorphisms.sage. Note there is no G1 or G2 for non-pairing-friendly curves like secp256k1 or Banderwagon so the Sage code needs to be modified to handle both case. The constants can then be added to https://github.com/mratsim/constantine/tree/master/constantine/math/constants.
- Difficulty: Easy, mostly existing Python/sage code modification, for generating Nim code. The scaffolding exist.
- Expected speedup: 30% compared to baseline
- Reference paper: https://link.springer.com/content/pdf/10.1007/3-540-44647-8_11.pdf
Add Fixed-base scalar mul via LSB set encoding #73
- Difficulty: Very hard
- Expected speedup: 80% compared to endomorphism, 2.34x compared to baseline.
- Reference paper:
  Efficient and Secure Algorithms for GLV-Based Scalar
  Multiplication and their Implementation on GLV-GLS
  Curves (Extended Version)
  Armando Faz-Hernández, Patrick Longa, Ana H. Sánchez, 2013
  https://eprint.iacr.org/2013/158
- C implementation:
  - https://github.com/catid/snowshoe/blob/8ba3f57/src/ecmul.inc
  - https://github.com/catid/snowshoe/blob/8ba3f57/src/recode.inc#L290-L589

Add Finite field computation for moduli of special form #11

This will require refactoring of the core Fp type to indicate if they are generic or use a special-form. This may use either separate FpGeneric and FpPseudoMersenne or a static enum:

constantine/constantine/math/config/type_ff.nim

Lines 16 to 33 in 34baa74

    
           type 
        
             Fp*[C: static Curve] = object 
        
               ## All operations on a Fp field are modulo P 
        
               ## P being the prime modulus of the Curve C 
        
               ## Internally, data is stored in Montgomery n-residue form 
        
               ## with the magic constant chosen for convenient division (a power of 2 depending on P bitsize) 
        
               # TODO, pseudo mersenne primes like 2²⁵⁵-19 have very fast modular reduction 
        
               #       and don't need Montgomery representation 
        
               mres*: matchingBigInt(C) 
        
             Fr*[C: static Curve] = object 
        
               ## All operations on a field are modulo `r` 
        
               ## `r` being the prime curve order or subgroup order 
        
               ## Internally, data is stored in Montgomery n-residue form 
        
               ## with the magic constant chosen for convenient division (a power of 2 depending on P bitsize) 
        
               mres*: matchingOrderBigInt(C) 
        
             FF*[C: static Curve] = Fp[C] or Fr[C]

type FpGeneric[C: static Curve] = object
  mres*: matchingBigInt(C)
type FpPseudoMersenne[C: static Curve] = object
  mres*: matchingBigInt(C)

type Fp[C: static Curve] = FpGeneric[C] or FpPseudoMersenne[C]

or

type SpecialPrime = object
  kGeneric
  kGeneralizedMersenne
  kPseudoMersenne
  kGolden

type Fp[C: static Curve, SP: static SpecialPrime] = object
  mres*: matchingBigInt(C)

Additionally a special pseudo-mersenne reduction need to be added, similar to

constantine/constantine/mac/mac_poly1305.nim

Lines 37 to 46 in 34baa74

    
           func partialReduce_1305[N1, N2: static int](r: var Limbs[N1], a: Limbs[N2]) = 
        
             ## The prime 2¹³⁰-5 has a special form 2ᵐ-c 
        
             ## called "Crandall prime" or Pseudo-Mersenne Prime 
        
             ## in the litterature 
        
             ## which allows fast reduction from the fact that 
        
             ##        2ᵐ-c ≡  0     (mod p) 
        
             ##   <=>  2ᵐ   ≡  c     (mod p)   [1] 
        
             ##   <=> a2ᵐ+b ≡ ac + b (mod p) 
        
             ## 
        
             ## This partially reduces the input in range [0, 2¹³⁰)

Difficulty: Medium on math, Hard on refactoring
Expected speedup:
Currently, most implementation splits the field Fp over 5x52-bit limbs or 10x26-bit limbs. This predates the existence of MULX, ADOX, ADCX instructions introduced in 2015 for accelerating bigint multiplication. The issue is that cost scales with the square of number of limbs, with by about 3n² so between 4 and 5 limbs, it's 48 vs 75 operations.
Constantine already provides fast broadwell multiplication and special prime reductions are significantly faster than Montgomery reductions.
25% over baseline
Reference papers:
- Generalized Mersenne Numbers
  Jerome Solinas
  https://cacr.uwaterloo.ca/techreports/1999/corr99-39.pdf
- Montgomery-friendly primes and applications to cryptography
  Jean Claude Bajard and Sylvain Duquesne
  https://eprint.iacr.org/2020/665

Fused multiplication + special reduction for secp256k1
This fuses multiplication and reduction by moduli of special form
- Difficulty: Very Hard, need to translate algorithm to assembly and debug assembly
- Expected speedup: 15% to 20% over special reduction.
- Reference paper:
  Efficient Arithmetic In (Pseudo-)Mersenne Prime Order Fields
  Kaushik Nath and Palash Sarkar
  https://eprint.iacr.org/2018/985

Implement the high-level API for private key to public key for secp256k1

Difficulty: very easy

For the programming language benchmark competition, only private to public key is needed, which is the same as BLS signature, a scalar multiplication:

constantine/constantine/signatures/bls_signatures.nim

Lines 37 to 49 in 34baa74

    
           func derivePubkey*[Pubkey, SecKey](pubkey: var Pubkey, seckey: SecKey) = 
        
             ## Generates the public key associated with the input secret key. 
        
             ## 
        
             ## The secret key MUST be in range (0, curve order) 
        
             ## 0 is INVALID 
        
             const Group = Pubkey.G 
        
             type Field = Pubkey.F 
        
             const EC = Field.C 
        
             var pk {.noInit.}: ECP_ShortW_Jac[Field, Group] 
        
             pk.fromAffine(EC.getGenerator($Group)) 
        
             pk.scalarMul(seckey) 
        
             pubkey.affine(pk)

And high-level wrapper:

constantine/constantine/ethereum_bls_signatures.nim

Lines 224 to 228 in 34baa74

    
           func derive_pubkey*(public_key: var PublicKey, secret_key: SecretKey) {.libPrefix: prefix_ffi.} = 
        
             ## Derive the public key matching with a secret key 
        
             ## 
        
             ## The secret_key MUST be validated 
        
             public_key.raw.derivePubkey(secret_key.raw)

Stretch, implement the full libsecp256k1 API
- Difficulty: medium
- All the primitives of ECDSA are present and the API is well known, so the difficulty lies in combining the building blocks and doing tests.

The text was updated successfully, but these errors were encountered:

mratsim added enhancement New feature or request performance 🏁 labels Oct 17, 2023

mratsim mentioned this issue Jul 26, 2024

feat(secp256k1): add endomorphism acceleration #444

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Code golfing] secp256k1 programming language benchmark #285

[Code golfing] secp256k1 programming language benchmark #285

mratsim commented Oct 17, 2023 •

edited

Loading

[Code golfing] secp256k1 programming language benchmark #285

[Code golfing] secp256k1 programming language benchmark #285

Comments

mratsim commented Oct 17, 2023 • edited Loading

mratsim commented Oct 17, 2023 •

edited

Loading