Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize: batch BLS verification #1632

Merged
merged 44 commits into from
Apr 25, 2020

Conversation

Nashatyrev
Copy link
Contributor

@Nashatyrev Nashatyrev commented Apr 21, 2020

PR Description

  • Extract block verification into a separate class
  • Optimize BLS signatures verification by batching them: https://ethresear.ch/t/fast-verification-of-multiple-bls-signatures/5407
  • Optimize batch BLS with parallelization
  • Optimize batch BLS with doing ate2 instead of ate (not very big gain and only for large batches)
  • Optimize batch BLS with eliminating one G2 scalar multiplication for the first element (factor is 1)

Benchmarks:

In summary this optimization yields ~x1.5 block processing speedup

Before:

Benchmark                              (validatorsCount)  Mode  Cnt  Score   Error  Units
TransitionBenchmark.Block.importBlock              16384    ss  250  0,121 ± 0,002   s/op
TransitionBenchmark.Block.importBlock              32768    ss  250  0,146 ± 0,004   s/op
TransitionBenchmark.Epoch.importBlock              16384    ss  100  0,266 ± 0,007   s/op
TransitionBenchmark.Epoch.importBlock              32768    ss  100  0,452 ± 0,009   s/op

After:

Benchmark                              (validatorsCount)  Mode  Cnt  Score   Error  Units
TransitionBenchmark.Block.importBlock              16384    ss  250  0,072 ± 0,002   s/op
TransitionBenchmark.Block.importBlock              32768    ss  250  0,104 ± 0,004   s/op
TransitionBenchmark.Epoch.importBlock              16384    ss  100  0,214 ± 0,007   s/op
TransitionBenchmark.Epoch.importBlock              32768    ss  100  0,409 ± 0,009   s/op

Other benchmarks

Below is the comparison of different variants for different signature batch sizes (ops/sec, larger is better):

Benchmark 1 2 4 8 16 32 64 128
Simple 38,428 18,995 9,429 4,825 2,409 1,201 0,587 0,297
BatchedNonParallelDoublePairing 28,973 19,119 11,351 6,332 3,288 1,672 0,841 0,435
BatchedNonParallelSinglePairing 29,387 18,675 10,826 6,024 3,042 1,586 0,818 0,421
BatchedParallelDoublePairing 29,505 19,270 17,489 14,503 10,405 5,275 2,980 1,517
BatchedParallelSinglePairing 29,341 24,971 22,318 16,174 9,551 5,459 2,853 1,467

Here is the BLS primitives benchmark results just for referencing them later:

Benchmark                            Mode  Cnt      Score     Error  Units
BLSPrimitivesBenchmark.ate1          avgt   10   4934,493 ± 184,762  us/op
BLSPrimitivesBenchmark.ate1x2Mul     avgt   10   9990,881 ± 342,553  us/op
BLSPrimitivesBenchmark.ate2          avgt   10   8597,902 ± 218,734  us/op
BLSPrimitivesBenchmark.bigRandomX64  avgt   10      9,290 ±   0,669  us/op
BLSPrimitivesBenchmark.fexp          avgt   10   7301,422 ± 419,305  us/op
BLSPrimitivesBenchmark.g2Mul32       avgt   10    879,418 ±  51,346  us/op
BLSPrimitivesBenchmark.g2Mul64       avgt   10   1283,652 ±  36,820  us/op
BLSPrimitivesBenchmark.gtProduct     avgt   10     44,110 ±   1,688  us/op
BLSPrimitivesBenchmark.hToG2         avgt   10  11058,118 ± 429,275  us/op

…ring. BLS.batchVerify() chooses the fastest option
…lidate, those which process but not validate, defaults methods which do both
@Nashatyrev Nashatyrev marked this pull request as ready for review April 23, 2020 16:44
@Nashatyrev
Copy link
Contributor Author

Still to be added:

  • unit tests
  • Refactor BeaconBlock to remerkables to cache object root (@cemozerr optimization idea)

@Nashatyrev
Copy link
Contributor Author

Nashatyrev commented Apr 24, 2020

Refactor BeaconBlock to remerkables to cache object root (@cemozerr optimization idea)

@cemozerr I would do it as a separate task, since it didn't occurred to be so easy.
Created BC-392 for this

Copy link
Contributor

@benjaminion benjaminion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been through and checked the detailed logic, and conformance with the technique's description, and it all looks good to me.

My only suggestion would be to split batchVerifyTest() into four separate tests (one for each assert) to be more in line with the spirit of unit tests. But it's not a big deal and I am happy to approve without this change.

Caveat: at the time of reviewing some CI tests are failing. I haven't looked into this.

@Nashatyrev
Copy link
Contributor Author

Looks we are hitting this bug, due to which batch BLS hangs on random generation: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6521844
I think we may use regular Random for now since a potential attack has a very limited application and is not feasible.

@Nashatyrev
Copy link
Contributor Author

My only suggestion would be to split batchVerifyTest() into four separate tests (one for each assert) to be more in line with the spirit of unit tests.

Done. Thanks for review!

@Nashatyrev Nashatyrev merged commit 381b9b5 into Consensys:master Apr 25, 2020
Comment on lines +54 to +57
// https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6521844
// A potential attack here has a very limited application and is not feasible
// Thus using non-secure random doesn't significantly mitigate the security
return ThreadLocalRandom.current();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That linux bug was fixed in JDK 7 which we don't support anymore. Java SecureRandom now uses /dev/urandom.

Copy link
Contributor Author

@Nashatyrev Nashatyrev Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, mixed up Updated and Resolved dates. Thought it just was recently fixed.
However while running the test with SecureRandom on Linux (I reproduced on Ubuntu 18, the same behavior was on the Travis) it stuck for 30 mins with this stack trace:

    java.lang.Thread.State: RUNNABLE
        at java.io.FileInputStream.readBytes(java.base@11.0.7/Native Method)
        at java.io.FileInputStream.read(java.base@11.0.7/FileInputStream.java:279)
        at java.io.FilterInputStream.read(java.base@11.0.7/FilterInputStream.java:133)
        at sun.security.provider.NativePRNG$RandomIO.readFully(java.base@11.0.7/NativePRNG.java:424)
        at sun.security.provider.NativePRNG$RandomIO.ensureBufferValid(java.base@11.0.7/NativePRNG.java:526)
        at sun.security.provider.NativePRNG$RandomIO.implNextBytes(java.base@11.0.7/NativePRNG.java:545)
        - locked <0x00000007004ac3c0> (a java.lang.Object)
        at sun.security.provider.NativePRNG$Blocking.engineNextBytes(java.base@11.0.7/NativePRNG.java:268)
        at java.security.SecureRandom.nextBytes(java.base@11.0.7/SecureRandom.java:742)
        at java.security.SecureRandom.next(java.base@11.0.7/SecureRandom.java:799)
        at java.util.Random.nextLong(java.base@11.0.7/Random.java:424)
        at tech.pegasys.artemis.bls.mikuli.BLS12381.nextBatchRandomMultiplier(BLS12381.java:325)
        at tech.pegasys.artemis.bls.mikuli.BLS12381.prepareBatchVerify(BLS12381.java:240)
        at tech.pegasys.artemis.bls.mikuli.BLS12381Test.lambda$batchVerifyTest$4(BLS12381Test.java:178)
        at tech.pegasys.artemis.bls.mikuli.BLS12381Test$$Lambda$464/0x00000008401ec040.apply(Unknown Source)

Here are opened files:

ubuntu@ip-172-31-30-114:~/jvm-libp2p$ lsof -p 16745 | grep random
java    16745 ubuntu   96r      CHR                1,8       0t0     10 /dev/random
java    16745 ubuntu   97r      CHR                1,9       0t0     11 /dev/urandom

Also found another related JDK bug (which was also fixed) : https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8098581

As we discussed with @benjaminion this place is not critical for using secure random, but there is SecureRandomProvider which may potentially hit the same issue. Added Jira issue: BC-396

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the java version:

openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-2ubuntu218.04)
OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-2ubuntu218.04, mixed mode, sharing)

Here is the commit on which the issue is reproduced: c3e8c2e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, we went through this in a lot of detail as part of Pantheon's security audit and confirmed Java would use /dev/urandom. Originally Pantheon had overridden it to use /dev/random because that's "more secure" (the auditors disagreed, arguments ensued, /dev/urandom eventually won because it made the auditors happy and actually worked).

It looks a bit more complex than I remember it and is now trying to look up egdSource via security properties - specifically the securerandom.source security property. Interestingly AdoptOpenJDK 11 on Mac sets that to file:/dev/random so potentially Java has changed again. I'm a little surprised Besu hasn't run into this as it uses a lot of entropy (mostly from discovery messages IIRC).

I'm personally not too concerned about using plain Random here though I suspect we'll get dinged for it in the audit. And we're going to have to ensure people at least know how to select /dev/urandom as the source instead of /dev/random and possibly find a way to set it as the default.

NicolasMassart added a commit to NicolasMassart/teku that referenced this pull request Apr 27, 2020
commit 1cc3466
Author: Sally MacFarlane <sally.macfarlane@consensys.net>
Date:   Mon Apr 27 15:44:58 2020 +1000

    added subcommand in message (Consensys#1671)

    Signed-off-by: Sally MacFarlane <sally.macfarlane@consensys.net>

commit af1c82a
Author: bgravenorst <50852695+bgravenorst@users.noreply.github.com>
Date:   Mon Apr 27 11:38:15 2020 +1000

    Update /validator/block API description. (Consensys#1664)

    * Update API description.

    Signed-off-by: Byron Gravenorst <byron.gravenorst@consensys.net>

    * Address feedback.

    Signed-off-by: Byron Gravenorst <byron.gravenorst@consensys.net>

commit 6936481
Author: Paul Harris <paul.harris@consensys.net>
Date:   Mon Apr 27 10:35:52 2020 +1000

    network cli test cases (Consensys#1667)

    * added a test case to show network option taking a URL.

    * split out some other command line options tests into their own modules, and added config file parsing.

    Signed-off-by: Paul Harris <paul.harris@consensys.net>

commit f79f65f
Author: Adrian Sutton <adrian.sutton@consensys.net>
Date:   Mon Apr 27 07:15:18 2020 +1000

    Implement attestation gossip validation requirements (Consensys#1661)

commit 56a3b87
Author: Raw Pong Ghmoa <58883403+q9f@users.noreply.github.com>
Date:   Sun Apr 26 22:53:11 2020 +0200

    cli/deposit: significantly simplify validator keystore path (Consensys#1618)

commit 381b9b5
Author: Anton Nashatyrev <Nashatyrev@users.noreply.github.com>
Date:   Sat Apr 25 21:12:08 2020 +0300

    Optimize: batch BLS verification  (Consensys#1632)

    * Add batch BLS verification
    * Parallelize batch BLS

commit 340af2c
Author: Cem Ozer <cemozer2018@u.northwestern.edu>
Date:   Fri Apr 24 12:50:10 2020 -0400

    Remove headBockRoot from beaconBlocksByRange request (Consensys#1659)
NicolasMassart added a commit to NicolasMassart/teku that referenced this pull request Apr 27, 2020
commit 1cc3466
Author: Sally MacFarlane <sally.macfarlane@consensys.net>
Date:   Mon Apr 27 15:44:58 2020 +1000

    added subcommand in message (Consensys#1671)

    Signed-off-by: Sally MacFarlane <sally.macfarlane@consensys.net>

commit af1c82a
Author: bgravenorst <50852695+bgravenorst@users.noreply.github.com>
Date:   Mon Apr 27 11:38:15 2020 +1000

    Update /validator/block API description. (Consensys#1664)

    * Update API description.

    Signed-off-by: Byron Gravenorst <byron.gravenorst@consensys.net>

    * Address feedback.

    Signed-off-by: Byron Gravenorst <byron.gravenorst@consensys.net>

commit 6936481
Author: Paul Harris <paul.harris@consensys.net>
Date:   Mon Apr 27 10:35:52 2020 +1000

    network cli test cases (Consensys#1667)

    * added a test case to show network option taking a URL.

    * split out some other command line options tests into their own modules, and added config file parsing.

    Signed-off-by: Paul Harris <paul.harris@consensys.net>

commit f79f65f
Author: Adrian Sutton <adrian.sutton@consensys.net>
Date:   Mon Apr 27 07:15:18 2020 +1000

    Implement attestation gossip validation requirements (Consensys#1661)

commit 56a3b87
Author: Raw Pong Ghmoa <58883403+q9f@users.noreply.github.com>
Date:   Sun Apr 26 22:53:11 2020 +0200

    cli/deposit: significantly simplify validator keystore path (Consensys#1618)

commit 381b9b5
Author: Anton Nashatyrev <Nashatyrev@users.noreply.github.com>
Date:   Sat Apr 25 21:12:08 2020 +0300

    Optimize: batch BLS verification  (Consensys#1632)

    * Add batch BLS verification
    * Parallelize batch BLS

commit 340af2c
Author: Cem Ozer <cemozer2018@u.northwestern.edu>
Date:   Fri Apr 24 12:50:10 2020 -0400

    Remove headBockRoot from beaconBlocksByRange request (Consensys#1659)
@Nashatyrev Nashatyrev deleted the optimize-batch-bls branch October 30, 2020 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants