-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Mega memory benchmarks #9858
Conversation
Changes to circuit sizes
🧾 Summary (100% most significant diffs)
Full diff report 👇
|
8c4de71
to
6a13552
Compare
``` -------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------- construct_pk/TraceStructure::E2E_FULL_TEST 1160 ms 1129 ms 1 pk mem=911.716M construct_pk/TraceStructure::CLIENT_IVC_BENCH 365 ms 349 ms 2 pk mem=471.561M ```
This reverts commit c564a56.
600073f
to
530660a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, couple of comments
fill_lookup_block(builder); | ||
|
||
{ | ||
// finalize doesn't populate public inputs block, so copy to verify that the block is being filled well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this needs to be robustified. probably makes sense to just have the builder do it rather than populating a separate public_inputs
object. There are a number of things that are more awkward than they need to be because we maintain this somewhat artificial distinction between the builder and the prover (e.g. structured trace is fundamentally a 'prover' notion). Maybe we should just do away with this division when it makes sense
Builder builder; | ||
builder.blocks.set_fixed_block_sizes(settings); | ||
|
||
fill_ecc_op_block(builder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see the need to populate these blocks with real data. In the structured case I think the actual block sizes are irrelevant because the fixed sizes are what determine memory. In the unstructured case it should only be about the size() of the blocks but even then it doesn't matter if we're storing zeros or something else. This would also make it easy to just populate the blocks with exactly the right size.
vinfo("++Estimating proving key memory++"); | ||
for (auto [polynomial, label] : zip_view(polynomials.get_all(), polynomials.get_labels())) { | ||
uint64_t size = polynomial.size(); | ||
vinfo(label, " num: ", size, " size: ", (size * sizeof(FF)) >> 10, " KiB"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you're not counting shifts in the total memory but I was a bit mislead by this print at first because it makes it look like shifts are utilizing their own mem
// alternative: add to finalize or add a flag to check whether PIs have already been populated | ||
auto builder_copy = builder; | ||
builder_copy.finalize_circuit(/* ensure_nonzero */ false); | ||
DeciderProvingKey::Trace::populate_public_inputs_block(builder_copy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other thing. If you just construct a DeciderProvingKey from this builder_copy, it will now automatically compute the overflow / whether or not there is overflow. The print isn't as detailed as this one because I didn't add the labels (as you now have) but it could be made to be similar to what you have here (and probably should be for the sake of dev experience)
It would be better to actually use Google Bench's memory manager functionality and count allocations. We already have something similar implemented for Tracy. After striking out with that approach for a bit I reverted to just manually counting the size of the biggest vectors. The PR uncovered this issue: some trace structures have unusable capacity, not just due to using fewer than a dyadic number of gates, but also because of coupling of certain gate types #1149 See AztecProtocol/aztec-packages#9858 for logs of benchmarks.
It would be better to actually use Google Bench's memory manager functionality and count allocations. We already have something similar implemented for Tracy. After striking out with that approach for a bit I reverted to just manually counting the size of the biggest vectors.
The PR uncovered this issue: some trace structures have unusable capacity, not just due to using fewer than a dyadic number of gates, but also because of coupling of certain gate types AztecProtocol/barretenberg#1149