-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement copy on write optimization for arrays in brillig #3118
Conversation
Looks like there's an odd value in one of the brillig tests. Anyone have any insight on this? Very possible I added the wrong expected registers for each reference count in some of the brillig tests. |
This is gonna need an update the ACIR serialisation logic in |
I think the approach here is reasonable -- Alvaro who has been thinking about this for a while is away for a week, so he won't be able to chime in at the moment. The original issue was because of the noir-rsa program, could we get a comparison vs master with that program: https://github.com/SetProtocol/noir-rsa ? |
When testing on the noir-rsa repo, I'm currently getting an out of bounds read for brillig memory on this PR for some reason. When handling the Load opcode it is trying to read memory at index 2262, but the length is also only 2262. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only skimmed through the code, I might have misunderstood something. Will make a more complete review later today or early tomorrow 😄
compiler/noirc_evaluator/src/brillig/brillig_gen/brillig_block.rs
Outdated
Show resolved
Hide resolved
RegisterOrMemory::HeapArray(HeapArray { pointer, .. }) => { | ||
self.load_instruction(pointer, variable_pointer); | ||
} | ||
RegisterOrMemory::HeapVector(HeapVector { pointer, size }) => { | ||
RegisterOrMemory::HeapVector(HeapVector { pointer, size, reference_count: _ }) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should read reference_count from stored array and vector variables
RegisterOrMemory::HeapArray(HeapArray { pointer, size, reference_count: _ }) => { | ||
self.store_instruction(variable_pointer, pointer); | ||
let size_constant = self.make_constant(Value::from(size)); | ||
self.store_instruction(size_pointer, size_constant); | ||
self.deallocate_register(size_constant); | ||
} | ||
RegisterOrMemory::HeapVector(HeapVector { pointer, size }) => { | ||
RegisterOrMemory::HeapVector(HeapVector { pointer, size, reference_count: _ }) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
symmetric to the load, I think we need to store those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also would need to update allocate_variable_instruction implementation to take into account that now variables can be up to 3 items (pointer, length and reference count)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand these load and store methods or how I'd extend them to add a reference count. It looks like they only take in a source_variable for the array pointer, and a address_register that we use to store the size? I'm not sure why we'd store the size where the address goes, nor am I sure how this would be extended to store a reference count as well.
I've found that this PR fails the following program at the assert: global M = 5000;
fn main() {
let array = [0; M];
let res = foo(array);
assert(res == M * 2, "foo(array) != M * 2");
}
unconstrained fn foo<N>(mut array: [Field; N]) -> Field {
let copy = array;
let mut sum = 0;
// dep::std::println(array.len());
for i in 0 .. array.len() {
array[i] = 2;
sum += array[i];
}
sum
} Yet, if the println is uncommented, it will then pass. Edit: Nevermind, this was fixed by fixing |
After some debugging, it looks like the |
After more thinking I think the approach taken for brillig_gen won't work. If we add one more register to store the reference count, consider the following SSA:
v0 would get assigned two registers, let's say, R0 and R1. R0 would have the pointer to the items and R1 would have the reference count value. When storing v0 inside v57, we'd store in memory the pointer and the initial RC value (1). When processing inc_rc v0 we'd increase R1 but the reference counter stored in memory in the previous instruction wouldn't get incremented.
|
@sirasistant ah, good point. Instead of storing the rc in a register then, I'll try out the original idea I had which was to store it before the first element of the array. |
Superseded by #3522. |
Description
Problem*
Use of
array_set
in a loop can greatly slow down brillig code since under the hood on master, brillig will currently copy the entire array in order to return a new array with one element changed.Resolves (couldn't find an existing issue for this, do we have one?)
Summary*
This PR adds the copy on write optimization to brillig in order to optimize this usecase. To implement this, arrays in brillig are now reference counted and there is a new
Instruction::IncrementRc
instruction in SSA to increment reference count (1). During anarray_set
in brillig, the reference count is checked. If it is 1, brillig will mutate the existing array instead of copying to a new one. If not, brillig will fall back to creating a new array, although crucially this new array will always have reference count 1, allowing each subsequent set to be a mutated one. This leads to exponential speedup in loops using array_set.The reference count is increased when creating a new variable via
let
and when passing values to functions. The reference count is also (currently) never decreased - so the analysis is rather pessimistic. We could look into changing this in the future, but for now it is sufficient since the firstarray_set
performs a copy and the rest will still be optimized.The following example:
Runs in 58s on master (release) on my machine, and 0.8s with the optimization on this branch. Results may vary of course, and will be slower when compiling nargo on the debug profile.
Note that this test also includes array aliases so we can see the mutations are safe to perform without affecting other values.
Documentation
This PR requires documentation updates when merged.
Additional Context
Regarding (1), a possible alternate design is to encode & expose reference counts directly in SSA rather than keeping them only in brillig. For example, arrays could instead be represented as a tuple of
(reference_count, array)
wherereference_count: &mut Field
. From here, we could have eacharray_set
either translate to an if under the hood:This would greatly bloat the ssa however, so I decided against it. An alternative would be modifying the array_set instruction to accept the reference count as an argument:
Then internally (in brillig, as it is ignored in ssa) we would check the reference count for each array set operation similarly to how is currently done in this PR.
Additionally, this is still a draft because I've left some TODO's / messy bits in the code to clean up. The core approach should be ready for review though or we can discuss whether the alternative approach above would be better.
PR Checklist*
cargo fmt
on default settings.