-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Faster randomness sampling for field elements #9627
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,7 @@ | |
#include "barretenberg/common/op_count.hpp" | ||
#include "barretenberg/common/thread.hpp" | ||
#include "barretenberg/ecc/curves/bn254/bn254.hpp" | ||
#include "barretenberg/numeric/random/engine.hpp" | ||
#include "barretenberg/polynomials/polynomial.hpp" | ||
#include "barretenberg/srs/global_crs.hpp" | ||
#include <benchmark/benchmark.h> | ||
|
@@ -466,6 +467,19 @@ void pippenger(State& state) | |
benchmark::DoNotOptimize(ck->commit(pol)); | ||
} | ||
} | ||
|
||
void bn254fr_random(State& state) | ||
{ | ||
numeric::RNG& engine = numeric::get_randomness(); | ||
for (auto _ : state) { | ||
state.PauseTiming(); | ||
size_t num_cycles = 1 << static_cast<size_t>(state.range(0)); | ||
state.ResumeTiming(); | ||
for (size_t i = 0; i < num_cycles; i++) { | ||
benchmark::DoNotOptimize(fr::random_element(&engine)); | ||
} | ||
} | ||
} | ||
} // namespace | ||
|
||
BENCHMARK(parallel_for_field_element_addition)->Unit(kMicrosecond)->DenseRange(0, MAX_REPETITION_LOG); | ||
|
@@ -485,4 +499,5 @@ BENCHMARK(sequential_copy)->Unit(kMicrosecond)->DenseRange(20, 25); | |
BENCHMARK(uint_multiplication)->Unit(kMicrosecond)->DenseRange(12, 27); | ||
BENCHMARK(uint_extended_multiplication)->Unit(kMicrosecond)->DenseRange(12, 27); | ||
BENCHMARK(pippenger)->Unit(kMicrosecond)->DenseRange(16, 20)->Setup(DoPippengerSetup)->Iterations(5); | ||
BENCHMARK(bn254fr_random)->Unit(kMicrosecond)->DenseRange(10, 16); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when making a zk proof do we not need randomness equal to the circuit size? It might be useful to extend this to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume we'll do it in parallel. And right now 2^20 is very slow |
||
BENCHMARK_MAIN(); |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,6 @@ | ||
#include "fq.hpp" | ||
#include "barretenberg/numeric/random/engine.hpp" | ||
#include "barretenberg/numeric/uint256/uint256.hpp" | ||
#include "barretenberg/serialize/test_helper.hpp" | ||
#include <gtest/gtest.h> | ||
|
||
|
@@ -526,4 +528,17 @@ TEST(fq, NegAndSelfNeg0CmpRegression) | |
a_neg = 0; | ||
a_neg.self_neg(); | ||
EXPECT_EQ((a == a_neg), true); | ||
} | ||
|
||
// This test shows that ((lo|hi)% modulus) in uint512_t is equivalent to (lo + 2^256 * hi) in field elements so we | ||
// don't have to use the slow API | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is the "slow API"? might be useful to make this more explicit so that this comment does not have implied context that future readers might not have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added |
||
TEST(fq, EquivalentRandomness) | ||
{ | ||
auto& engine = numeric::get_debug_randomness(); | ||
uint512_t random_uint512 = engine.get_random_uint512(); | ||
auto random_lo = fq(random_uint512.lo); | ||
auto random_hi = fq(random_uint512.hi); | ||
uint512_t q(fq::modulus); | ||
constexpr auto pow_2_256 = fq(uint256_t(1) << 128).sqr(); | ||
EXPECT_EQ(random_lo + pow_2_256 * random_hi, fq((random_uint512 % q).lo)); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,7 @@ | |
#include <vector> | ||
|
||
#include "./field_declarations.hpp" | ||
#include "barretenberg/numeric/uint256/uint256.hpp" | ||
|
||
namespace bb { | ||
|
||
|
@@ -687,11 +688,10 @@ template <class T> field<T> field<T>::random_element(numeric::RNG* engine) noexc | |
if (engine == nullptr) { | ||
engine = &numeric::get_randomness(); | ||
} | ||
|
||
uint512_t source = engine->get_random_uint512(); | ||
uint512_t q(modulus); | ||
uint512_t reduced = source % q; | ||
return field(reduced.lo); | ||
constexpr field pow_2_256 = field(uint256_t(1) << 128).sqr(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How sensitive are we to performance here? If we wanted to maximize performance, I think the following algorithm would be fastest:
The above requires 2 modular reductions. The current method performs 3, due to converting two random u256 values into Very minor point, I don't know how performance sensitive we are to this algorithm There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Our old modulo algorithm was super slow, so this is a significant improvement. I tried checking what kind of an improvement would not converting to montgomery for 256-bit chunks make and it was minimal. The problem is that now that the division algorithm is not extremely slow, the bottleneck is sampling the uint512_t. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also we don't do 2^256 montgomery in wasm. It's different( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for the context about the wasm. Replace But yes I appreciate it doesn't affect performance too much, it's the difference between 3 reductions and 2. It makes sense that generating the random bytes would be the limiting factor after your changes. |
||
field lo = engine->get_random_uint256(); | ||
field hi = engine->get_random_uint256(); | ||
return lo + (pow_2_256 * hi); | ||
} | ||
|
||
template <class T> constexpr size_t field<T>::primitive_root_log_size() noexcept | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: numeric literals like
1
are usually interpreted asint
types. Might make the code ever so slightly more readable by making this1UL
(32 bit unsigned long). Up to you though, this isa very petty comment.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks