feat: Faster randomness sampling for field elements #9627

Rumata888 · 2024-10-31T15:26:04Z

Changes the algorithm for converting uint512_ts to fields. The result is equivalent (we can make an even faster version, but without the equivalence). 1.5x faster on the mainframe, 5x in wasm
Before. Native

After. Native:

Before. Wasm:

After. Wasm:

zac-williamson · 2024-10-31T18:30:01Z

barretenberg/cpp/src/barretenberg/benchmark/basics_bench/basics.bench.cpp

+    numeric::RNG& engine = numeric::get_randomness();
+    for (auto _ : state) {
+        state.PauseTiming();
+        size_t num_cycles = 1 << static_cast<size_t>(state.range(0));


minor nit: numeric literals like 1 are usually interpreted as int types. Might make the code ever so slightly more readable by making this 1UL (32 bit unsigned long). Up to you though, this isa very petty comment.

zac-williamson · 2024-10-31T18:30:50Z

barretenberg/cpp/src/barretenberg/benchmark/basics_bench/basics.bench.cpp

@@ -485,4 +499,5 @@ BENCHMARK(sequential_copy)->Unit(kMicrosecond)->DenseRange(20, 25);
 BENCHMARK(uint_multiplication)->Unit(kMicrosecond)->DenseRange(12, 27);
 BENCHMARK(uint_extended_multiplication)->Unit(kMicrosecond)->DenseRange(12, 27);
 BENCHMARK(pippenger)->Unit(kMicrosecond)->DenseRange(16, 20)->Setup(DoPippengerSetup)->Iterations(5);
+BENCHMARK(bn254fr_random)->Unit(kMicrosecond)->DenseRange(10, 16);


when making a zk proof do we not need randomness equal to the circuit size? It might be useful to extend this to 20 to validate something odd doesn't happen for larger circuits

I assume we'll do it in parallel. And right now 2^20 is very slow

zac-williamson · 2024-10-31T18:31:55Z

barretenberg/cpp/src/barretenberg/ecc/curves/bn254/fq.test.cpp

+}
+
+// This test shows that ((lo|hi)% modulus) in uint512_t is equivalent to (lo + 2^256 * hi) in field elements so we
+// don't have to use the slow API


what is the "slow API"? might be useful to make this more explicit so that this comment does not have implied context that future readers might not have

zac-williamson · 2024-10-31T18:41:35Z

barretenberg/cpp/src/barretenberg/ecc/fields/field_impl.hpp

-    uint512_t q(modulus);
-    uint512_t reduced = source % q;
-    return field(reduced.lo);
+    constexpr field pow_2_256 = field(uint256_t(1) << 128).sqr();


How sensitive are we to performance here? If we wanted to maximize performance, I think the following algorithm would be fastest:

generate random u512 t

feed t directly into a Barrett reduction algorithm, producing output t*2^{-256} mod p

multiply the output by 2^{512} mod p to return t * 2^{256} mod p (which is t mod p converted into Montgomery form, which is what the underlying field type expects)

The above requires 2 modular reductions. The current method performs 3, due to converting two random u256 values into field elements (along with additional arithmetic operations)

Very minor point, I don't know how performance sensitive we are to this algorithm

Our old modulo algorithm was super slow, so this is a significant improvement. I tried checking what kind of an improvement would not converting to montgomery for 256-bit chunks make and it was minimal. The problem is that now that the division algorithm is not extremely slow, the bottleneck is sampling the uint512_t.

Also we don't do 2^256 montgomery in wasm. It's different(

thanks for the context about the wasm. Replace 2^{256} with whatever the montgomery R parameter is and I think the above all applies.

But yes I appreciate it doesn't affect performance too much, it's the difference between 3 reductions and 2. It makes sense that generating the random bytes would be the limiting factor after your changes.

zac-williamson

I left some comments about minor details, as well as a potential improvement. But any changes are optional so I have approved. good work!

Rumata888 added 2 commits October 31, 2024 15:24

Benchmark

7dd7494

update

c0c5254

zac-williamson reviewed Oct 31, 2024

View reviewed changes

zac-williamson approved these changes Oct 31, 2024

View reviewed changes

Rumata888 added 2 commits November 1, 2024 12:57

Address comments

b7cf9c5

Merge branch 'master' into my domain

8093e4c

Rumata888 merged commit b98e93f into master Nov 1, 2024
46 checks passed

Rumata888 deleted the is/faster_random branch November 1, 2024 13:25

AztecBot mentioned this pull request Nov 1, 2024

chore(master): Release 0.63.0 #9651

Merged

Rumata888 mentioned this pull request Nov 1, 2024

feat: Faster random sampling #9655

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Faster randomness sampling for field elements #9627

feat: Faster randomness sampling for field elements #9627

Rumata888 commented Oct 31, 2024 •

edited

Loading

zac-williamson Oct 31, 2024

Rumata888 Nov 1, 2024

zac-williamson Oct 31, 2024

Rumata888 Nov 1, 2024

zac-williamson Oct 31, 2024

Rumata888 Nov 1, 2024

zac-williamson Oct 31, 2024 •

edited

Loading

Rumata888 Nov 1, 2024

Rumata888 Nov 1, 2024

zac-williamson Nov 1, 2024

zac-williamson left a comment

feat: Faster randomness sampling for field elements #9627

feat: Faster randomness sampling for field elements #9627

Conversation

Rumata888 commented Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zac-williamson Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zac-williamson left a comment

Choose a reason for hiding this comment

Rumata888 commented Oct 31, 2024 •

edited

Loading

zac-williamson Oct 31, 2024 •

edited

Loading