Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added barrett_reduction implementation into uintx #6768

Merged
merged 5 commits into from
Jul 12, 2024

Conversation

zac-williamson
Copy link
Contributor

@zac-williamson zac-williamson commented May 30, 2024

This PR adds a barrett_reduction method into unitx, a fast division algorithm when the divisor is known ahead of time such that precomputed factors can be determined.

barrett_reduction is used to speed up divmod for some important hardcoded moduli. Or particular relevance is the prime field associated with BN254 curve arithmetic, as expensive 1024-bit divmod operations are performed when computing witnesses within stdlib::bitfield - commonly used to perform non-native BN254 curve arithmetic.

Speeds up biggroup batch_mul 4x

@AztecBot
Copy link
Collaborator

Benchmark results

Metrics with a significant change:

  • proof_construction_time_poseidon_hash_ms (1): 236 (+203%)
  • proof_construction_time_poseidon_hash_ms (4): 80.0 (+135%)
  • proof_construction_time_poseidon_hash_ms (16): 58.0 (+71%)
  • proof_construction_time_poseidon_hash_ms (32): 94.0 (+62%)
  • proof_construction_time_poseidon_hash_ms (64): 149 (+69%)
  • proof_construction_time_poseidon_hash_100_ms (1): 11,358 (+98%)
  • proof_construction_time_poseidon_hash_100_ms (4): 3,125 (+98%)
  • proof_construction_time_poseidon_hash_100_ms (16): 1,395 (+93%)
  • proof_construction_time_poseidon_hash_100_ms (32): 1,412 (+83%)
  • proof_construction_time_poseidon_hash_100_ms (64): 1,450 (+82%)
  • proof_construction_time_poseidon_hash_30_ms (1): 5,740 (+279%)
  • proof_construction_time_poseidon_hash_30_ms (4): 1,575 (+277%)
  • proof_construction_time_poseidon_hash_30_ms (16): 725 (+261%)
  • proof_construction_time_poseidon_hash_30_ms (32): 776 (+253%)
  • proof_construction_time_poseidon_hash_30_ms (64): 797 (+194%)
  • app_circuit_witness_generation_time_in_ms (MultiCallEntrypoint:entrypoint): 1,528 (-18%)
  • app_circuit_witness_generation_time_in_ms (SchnorrAccount:constructor): 1,110 (-29%)
  • app_circuit_witness_generation_time_in_ms (SchnorrAccount:entrypoint): 2,195 (-17%)
  • app_circuit_witness_generation_time_in_ms (Token:transfer): 3,990 (-20%)
  • app_circuit_witness_generation_time_in_ms (Benchmarking:create_note): 985 (-30%)
  • protocol_circuit_simulation_time_in_ms (base-rollup): 774 (-91%)
  • protocol_circuit_simulation_time_in_ms (public-kernel-setup): 642 (-25%)
  • batch_insert_into_indexed_tree_20_depth_ms (4096): 4,935 (+80%)
  • batch_insert_into_indexed_tree_20_depth_hash_ms (4096): 0.880 (+84%)
  • protocol_circuit_witness_generation_time_in_ms (base-parity): 1,156 (+80%)
  • protocol_circuit_proving_time_in_ms (root-parity): 48,721 (+16%)
  • protocol_circuit_proving_time_in_ms (root-rollup): 20,632 (-16%)
  • l2_block_public_tx_process_time_in_ms (8): 25,373 (-33%)
  • l2_block_public_tx_process_time_in_ms (32): 112,279 (-31%)
  • l2_block_public_tx_process_time_in_ms (64): 222,259 (-34%)
  • l2_block_processing_time_in_ms (64): 12,552 (+32%)
Detailed results

All benchmarks are run on txs on the Benchmarking contract on the repository. Each tx consists of a batch call to create_note and increment_balance, which guarantees that each tx has a private call, a nested private call, a public call, and a nested public call, as well as an emitted private note, an unencrypted log, and public storage read and write.

This benchmark source data is available in JSON format on S3 here.

Proof generation

Each column represents the number of threads used in proof generation.

Metric 1 threads 4 threads 16 threads 32 threads 64 threads
proof_construction_time_sha256_ms 5,717 1,544 (-1%) 689 (-3%) 755 780 (+1%)
proof_construction_time_sha256_30_ms 11,738 (+3%) 3,145 (+3%) 1,410 (+3%) 1,448 (+2%) 1,467 (+2%)
proof_construction_time_sha256_100_ms 44,727 (+2%) 12,055 (+3%) 5,719 (+6%) 5,806 (+8%) 5,493 (+3%)
proof_construction_time_poseidon_hash_ms ⚠️ 236 (+203%) ⚠️ 80.0 (+135%) ⚠️ 58.0 (+71%) ⚠️ 94.0 (+62%) ⚠️ 149 (+69%)
proof_construction_time_poseidon_hash_30_ms ⚠️ 5,740 (+279%) ⚠️ 1,575 (+277%) ⚠️ 725 (+261%) ⚠️ 776 (+253%) ⚠️ 797 (+194%)
proof_construction_time_poseidon_hash_100_ms ⚠️ 11,358 (+98%) ⚠️ 3,125 (+98%) ⚠️ 1,395 (+93%) ⚠️ 1,412 (+83%) ⚠️ 1,450 (+82%)

L2 block published to L1

Each column represents the number of txs on an L2 block published to L1.

Metric 8 txs 32 txs 64 txs
l1_rollup_calldata_size_in_bytes 1,412 1,412 1,412
l1_rollup_calldata_gas 9,464 9,476 9,464
l1_rollup_execution_gas 616,105 616,117 616,105
l2_block_processing_time_in_ms 1,287 4,794 ⚠️ 12,552 (+32%)
l2_block_building_time_in_ms 46,962 (+8%) 192,827 (+12%) 388,595 (+13%)
l2_block_rollup_simulation_time_in_ms 46,789 (+8%) 192,177 (+13%) 387,307 (+13%)
l2_block_public_tx_process_time_in_ms ⚠️ 25,373 (-33%) ⚠️ 112,279 (-31%) ⚠️ 222,259 (-34%)

L2 chain processing

Each column represents the number of blocks on the L2 chain where each block has 16 txs.

Metric 3 blocks 5 blocks
node_history_sync_time_in_ms 9,475 (+1%) 14,452 (+1%)
node_database_size_in_bytes 14,475,344 21,348,432
pxe_database_size_in_bytes 18,071 29,868

Circuits stats

Stats on running time and I/O sizes collected for every kernel circuit run across all benchmarks.

Circuit simulation_time_in_ms witness_generation_time_in_ms proving_time_in_ms input_size_in_bytes output_size_in_bytes proof_size_in_bytes num_public_inputs size_in_gates
private-kernel-init 140 (+4%) 478 (+1%) 13,269 (-4%) 20,630 64,614 89,536 2,731 524,288
private-kernel-inner 431 (+5%) 945 (-1%) 45,491 (-8%) 92,318 64,614 89,536 2,731 2,097,152
private-kernel-tail 596 2,702 (+1%) 40,342 (-12%) 96,541 77,732 11,648 297 2,097,152
base-parity 6.56 (+2%) ⚠️ 1,156 (+80%) 2,857 (+2%) 128 64.0 2,208 2.00 131,072
root-parity 50.0 (+2%) 60.5 (-11%) ⚠️ 48,721 (+16%) 27,084 64.0 2,720 18.0 2,097,152
base-rollup ⚠️ 774 (-91%) 2,444 (+2%) 86,610 (+8%) 119,734 756 3,648 47.0 4,194,304
root-rollup 110 78.1 (+3%) ⚠️ 20,632 (-16%) 25,297 620 3,456 41.0 1,048,576
public-kernel-app-logic 553 (+1%) 3,107 (-14%) 50,811 (+10%) 105,253 (-1%) 86,550 116,768 3,582 2,097,152
public-kernel-tail 1,162 (+3%) 24,527 (+1%) 188,382 (+5%) 401,002 7,646 11,648 297 8,388,608
private-kernel-reset-small 616 (+3%) 1,947 (-14%) 50,277 (-2%) 120,733 64,614 89,536 2,731 2,097,152
merge-rollup 28.7 (-8%) N/A N/A 16,534 756 N/A N/A N/A
public-kernel-setup ⚠️ 642 (-25%) N/A N/A 105,253 (-1%) 86,550 N/A N/A N/A
public-kernel-teardown 542 (-15%) N/A N/A 105,253 (-1%) 86,550 N/A N/A N/A
private-kernel-tail-to-public N/A 8,223 (-1%) 101,751 (-3%) N/A N/A 116,768 3,582 4,194,304

Stats on running time collected for app circuits

Function input_size_in_bytes output_size_in_bytes witness_generation_time_in_ms proof_size_in_bytes proving_time_in_ms size_in_gates num_public_inputs
ContractClassRegisterer:register 1,344 9,944 466 (-1%) N/A N/A N/A N/A
ContractInstanceDeployer:deploy 1,408 9,944 41.1 (-2%) N/A N/A N/A N/A
MultiCallEntrypoint:entrypoint 1,920 9,944 ⚠️ 1,528 (-18%) N/A N/A N/A N/A
SchnorrAccount:constructor 1,312 9,944 ⚠️ 1,110 (-29%) N/A N/A N/A N/A
SchnorrAccount:entrypoint 2,304 9,944 ⚠️ 2,195 (-17%) 16,768 54,095 (-1%) 2,097,152 457
Token:privately_mint_private_note 1,280 9,944 1,409 (-10%) N/A N/A N/A N/A
Token:transfer 1,376 9,944 ⚠️ 3,990 (-20%) 16,768 57,416 (-1%) 2,097,152 457
Benchmarking:create_note 1,312 (-2%) 9,944 ⚠️ 985 (-30%) N/A N/A N/A N/A
FPC:fee_entrypoint_public 1,344 9,944 224 (-7%) N/A N/A N/A N/A
SchnorrAccount:spend_private_authwit 1,280 9,944 77.3 N/A N/A N/A N/A
Token:unshield 1,376 9,944 3,249 (-14%) N/A N/A N/A N/A
FPC:fee_entrypoint_private 1,376 9,944 4,018 (-14%) N/A N/A N/A N/A

Tree insertion stats

The duration to insert a fixed batch of leaves into each tree type.

Metric 1 leaves 16 leaves 64 leaves 128 leaves 512 leaves 1024 leaves 2048 leaves 4096 leaves 32 leaves
batch_insert_into_append_only_tree_16_depth_ms 10.5 (+1%) 17.1 (+1%) N/A N/A N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_16_depth_hash_count 16.7 31.8 N/A N/A N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_16_depth_hash_ms 0.607 (+1%) 0.524 (+1%) N/A N/A N/A N/A N/A N/A N/A
batch_insert_into_append_only_tree_32_depth_ms N/A N/A 50.0 (+3%) 76.3 (+1%) 247 476 929 1,848 (+1%) N/A
batch_insert_into_append_only_tree_32_depth_hash_count N/A N/A 95.9 159 543 1,055 2,079 4,127 N/A
batch_insert_into_append_only_tree_32_depth_hash_ms N/A N/A 0.510 (+3%) 0.470 (+1%) 0.449 0.444 0.440 0.441 (+1%) N/A
batch_insert_into_indexed_tree_20_depth_ms N/A N/A 60.0 (+3%) 113 (+1%) 357 (+1%) 701 1,386 ⚠️ 4,935 (+80%) N/A
batch_insert_into_indexed_tree_20_depth_hash_count N/A N/A 106 208 692 1,363 2,707 5,395 N/A
batch_insert_into_indexed_tree_20_depth_hash_ms N/A N/A 0.520 (+3%) 0.504 0.483 (+1%) 0.481 0.480 ⚠️ 0.880 (+84%) N/A
batch_insert_into_indexed_tree_40_depth_ms N/A N/A N/A N/A N/A N/A N/A N/A 65.0 (+4%)
batch_insert_into_indexed_tree_40_depth_hash_count N/A N/A N/A N/A N/A N/A N/A N/A 107
batch_insert_into_indexed_tree_40_depth_hash_ms N/A N/A N/A N/A N/A N/A N/A N/A 0.576 (+4%)

Miscellaneous

Transaction sizes based on how many contract classes are registered in the tx.

Metric 0 registered classes 1 registered classes
tx_size_in_bytes 84,050 665,267

Transaction size based on fee payment method

| Metric | |
| - | |

@Rumata888 Rumata888 changed the title [feat] added barrett_reduction implementation into uintx feat: added barrett_reduction implementation into uintx May 30, 2024
@Rumata888 Rumata888 added the T-optimisation Type: Optimisation. Making something faster / cheaper / smaller label Jun 6, 2024
@Rumata888 Rumata888 self-requested a review June 6, 2024 14:35
@Rumata888 Rumata888 enabled auto-merge (squash) June 6, 2024 14:36
@Rumata888 Rumata888 merged commit abced57 into master Jul 12, 2024
34 checks passed
@Rumata888 Rumata888 deleted the zw/barrett-reduction branch July 12, 2024 16:08
rahul-kothari pushed a commit that referenced this pull request Jul 15, 2024
🤖 I have created a release *beep* *boop*
---


<details><summary>aztec-package: 0.46.5</summary>

##
[0.46.5](aztec-package-v0.46.4...aztec-package-v0.46.5)
(2024-07-14)


### Miscellaneous

* **aztec-package:** Synchronize aztec-packages versions
</details>

<details><summary>barretenberg.js: 0.46.5</summary>

##
[0.46.5](barretenberg.js-v0.46.4...barretenberg.js-v0.46.5)
(2024-07-14)


### Miscellaneous

* **barretenberg.js:** Synchronize aztec-packages versions
</details>

<details><summary>aztec-packages: 0.46.5</summary>

##
[0.46.5](aztec-packages-v0.46.4...aztec-packages-v0.46.5)
(2024-07-14)


### Features

* Added barrett_reduction implementation into uintx
([#6768](#6768))
([abced57](abced57))
* Databus allows arbitrarily many reads per index
([#6524](#6524))
([f07200c](f07200c))
* Let LSP always work in a Noir workspace if there's any
(noir-lang/noir#5461)
([8403e84](8403e84))
* Multiple trace structuring configurations
([#7408](#7408))
([e4abe1d](e4abe1d))
* Verify ClientIVC proofs through Bb binary
([#7407](#7407))
([3760c64](3760c64))


### Bug Fixes

* Lagrange interpolation
([#7440](#7440))
([76bcd72](76bcd72))
* Move BigInt modulus checks to runtime in brillig
(noir-lang/noir#5374)
([8403e84](8403e84))
* Run macro processors in the elaborator
(noir-lang/noir#5472)
([8403e84](8403e84))


### Miscellaneous

* Keccak256 in Noir (noir-lang/noir#5316)
([8403e84](8403e84))
* Redo typo PR by omahs (noir-lang/noir#5487)
([8403e84](8403e84))
* Replace relative paths to noir-protocol-circuits
([e89bfd8](e89bfd8))
* Replace relative paths to noir-protocol-circuits
([fae353e](fae353e))


### Documentation

* Minor comments for private refunds/partial notes
([#7447](#7447))
([9bcbb6c](9bcbb6c))
</details>

<details><summary>barretenberg: 0.46.5</summary>

##
[0.46.5](barretenberg-v0.46.4...barretenberg-v0.46.5)
(2024-07-14)


### Features

* Added barrett_reduction implementation into uintx
([#6768](#6768))
([abced57](abced57))
* Databus allows arbitrarily many reads per index
([#6524](#6524))
([f07200c](f07200c))
* Multiple trace structuring configurations
([#7408](#7408))
([e4abe1d](e4abe1d))
* Verify ClientIVC proofs through Bb binary
([#7407](#7407))
([3760c64](3760c64))


### Bug Fixes

* Lagrange interpolation
([#7440](#7440))
([76bcd72](76bcd72))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
AztecBot added a commit to AztecProtocol/barretenberg that referenced this pull request Jul 16, 2024
🤖 I have created a release *beep* *boop*
---


<details><summary>aztec-package: 0.46.5</summary>

##
[0.46.5](AztecProtocol/aztec-packages@aztec-package-v0.46.4...aztec-package-v0.46.5)
(2024-07-14)


### Miscellaneous

* **aztec-package:** Synchronize aztec-packages versions
</details>

<details><summary>barretenberg.js: 0.46.5</summary>

##
[0.46.5](AztecProtocol/aztec-packages@barretenberg.js-v0.46.4...barretenberg.js-v0.46.5)
(2024-07-14)


### Miscellaneous

* **barretenberg.js:** Synchronize aztec-packages versions
</details>

<details><summary>aztec-packages: 0.46.5</summary>

##
[0.46.5](AztecProtocol/aztec-packages@aztec-packages-v0.46.4...aztec-packages-v0.46.5)
(2024-07-14)


### Features

* Added barrett_reduction implementation into uintx
([#6768](AztecProtocol/aztec-packages#6768))
([abced57](AztecProtocol/aztec-packages@abced57))
* Databus allows arbitrarily many reads per index
([#6524](AztecProtocol/aztec-packages#6524))
([f07200c](AztecProtocol/aztec-packages@f07200c))
* Let LSP always work in a Noir workspace if there's any
(noir-lang/noir#5461)
([8403e84](AztecProtocol/aztec-packages@8403e84))
* Multiple trace structuring configurations
([#7408](AztecProtocol/aztec-packages#7408))
([e4abe1d](AztecProtocol/aztec-packages@e4abe1d))
* Verify ClientIVC proofs through Bb binary
([#7407](AztecProtocol/aztec-packages#7407))
([3760c64](AztecProtocol/aztec-packages@3760c64))


### Bug Fixes

* Lagrange interpolation
([#7440](AztecProtocol/aztec-packages#7440))
([76bcd72](AztecProtocol/aztec-packages@76bcd72))
* Move BigInt modulus checks to runtime in brillig
(noir-lang/noir#5374)
([8403e84](AztecProtocol/aztec-packages@8403e84))
* Run macro processors in the elaborator
(noir-lang/noir#5472)
([8403e84](AztecProtocol/aztec-packages@8403e84))


### Miscellaneous

* Keccak256 in Noir (noir-lang/noir#5316)
([8403e84](AztecProtocol/aztec-packages@8403e84))
* Redo typo PR by omahs (noir-lang/noir#5487)
([8403e84](AztecProtocol/aztec-packages@8403e84))
* Replace relative paths to noir-protocol-circuits
([e89bfd8](AztecProtocol/aztec-packages@e89bfd8))
* Replace relative paths to noir-protocol-circuits
([fae353e](AztecProtocol/aztec-packages@fae353e))


### Documentation

* Minor comments for private refunds/partial notes
([#7447](AztecProtocol/aztec-packages#7447))
([9bcbb6c](AztecProtocol/aztec-packages@9bcbb6c))
</details>

<details><summary>barretenberg: 0.46.5</summary>

##
[0.46.5](AztecProtocol/aztec-packages@barretenberg-v0.46.4...barretenberg-v0.46.5)
(2024-07-14)


### Features

* Added barrett_reduction implementation into uintx
([#6768](AztecProtocol/aztec-packages#6768))
([abced57](AztecProtocol/aztec-packages@abced57))
* Databus allows arbitrarily many reads per index
([#6524](AztecProtocol/aztec-packages#6524))
([f07200c](AztecProtocol/aztec-packages@f07200c))
* Multiple trace structuring configurations
([#7408](AztecProtocol/aztec-packages#7408))
([e4abe1d](AztecProtocol/aztec-packages@e4abe1d))
* Verify ClientIVC proofs through Bb binary
([#7407](AztecProtocol/aztec-packages#7407))
([3760c64](AztecProtocol/aztec-packages@3760c64))


### Bug Fixes

* Lagrange interpolation
([#7440](AztecProtocol/aztec-packages#7440))
([76bcd72](AztecProtocol/aztec-packages@76bcd72))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-optimisation Type: Optimisation. Making something faster / cheaper / smaller
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants