-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow tree_r_last to be built on gpu #1138
Conversation
Work in progress. Still uses CPU compat mode for tree builders. Todo: verify poseidon standard/strengthened on tests. Param generation bump required? |
Yes, this will require bumping the parameter version. |
@porcuquine Can you confirm if poseidon constraints (tests) should have to be updated with this change (updated neptune, etc)? |
Yes, and they should go down. |
aa19d48
to
758e7d2
Compare
Note: This comes out of draft status when neptune v1 is released. |
986ca9f
to
5900d31
Compare
let mut layer_data = Vec::with_capacity(layers); | ||
for _ in 0..layers { | ||
layer_data.push(Vec::new()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be written as
let mut layer_data = vec![vec![]; layers];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the inner Vec
should probably be allocated with with_capacity
if you can determine their expected length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inner vec is entirely replaced
layer_data[k - 1] = fr_elements; | ||
} | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be
for layer_elements in layer_data.iter_mut() {
// ...
layer_elements.extend(elements.into_iter().map(Into::into));
}
which should avoid some allocations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at this, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you're saying now (along with the above comment). Thanks!
)?; | ||
for fr in fr_elements { | ||
buf.extend(fr_into_bytes(&fr)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let buf: Vec<u8> = fr_elements.iter().flat_map(fr_into_bytes).collect();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I want to avoid this. This looks eerily like the construction that was originally in place that bottlenecked hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some small code improvements, but overall looks good to me
|
||
let flat_tree_data: Vec<_> = tree_data | ||
for fr in fr_elements { | ||
buf.extend(fr_into_bytes(&fr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will still allocate a buffer into which to write the conversion. I think you should add a new version of this function which writes directly into buf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this instance (of tree data), it's taking about a half second on a 32GiB run. The base data flattening is now taking about 2-3 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buf being extended has been pre-allocated above that. I'm not sure if you're saying to avoid that allocation, or the one that extend would do had it not been pre-allocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking about the allocation inside fr_into_bytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Got it, thanks. Given the overall speed improvements, we can wait on that a bit since this is no longer bottlenecking.
feat: attempt to improve gpu tree_c layer retrieval fix: bump parameter version fix: properly delete tree c in the split config case fix: update tests to match new (lower) constraints
feat: update debug logging
feat: add adjustable column_write_batch_size setting
feat: attempt to improve gpu tree_c layer retrieval